Student mental health intelligence

A clean, decision-grade dataset on social media, sleep, and student wellbeing.

Built from 7,500+ records and 18+ features, including engineered indicators for sleep quality, screen intensity, sentiment, and productivity behavior.

Problem

Most available datasets are broad, but not behavior-deep.

They often miss modern usage patterns, compounded risk effects, and engineered features needed for meaningful mental-health analysis.

Typical datasets

Single-variable framing, weak behavioral context, and limited value for predictive or intervention-focused work.

This dataset

Built to model interactions across sleep, screen time, and sentiment with features that increase interpretability and performance.

Solution / Approach

Simple pipeline, high-quality output.

Each stage removes noise and adds analytical meaning.

1

Selenium collection with validation checks

2

BeautifulSoup parsing and schema cleanup

3

Data normalization and quality controls

4

Feature engineering for behavior signals

5

Insight synthesis for real decisions

Dataset Showcase

Focused, model-ready, and research-friendly.

A compact view of what makes this dataset practically useful.

DimensionMindPulseTypical sets
Rows7,500+<2,000 in many public sets
Features18+ with engineeringMostly raw survey fields
SleepHours + pattern qualitySingle self-reported field
Screen timeEstimated intensityOften missing

7,500+

Validated records

18+

Raw + engineered features

Selenium + BS4

Collection and parsing stack

Feature Engineering

Engineered signals that improve relevance.

Concise feature set designed for interpretability and practical model performance.

Sleep quality signals

Debt and regularity measures beyond average sleep hours.

Screen intensity score

Estimated behavioral exposure instead of raw usage time only.

Sentiment-derived indicators

Mood trend features extracted from text and routine signals.

Productivity index

Composite measure linking behavior to academic focus patterns.

Insights

Clear findings with direct strategic value.

Condensed signals to support intervention planning and model development.

2.3x

Higher distress signal when low sleep and high screen intensity combine

67%

Predictive lift after adding engineered behavior features

41%

Improved mood stability in moderate social-use cohorts

3.4h

Average daily social-media exposure across the sample

ML Use Cases

Built for real deployment contexts.

Supports risk prediction, intervention timing, policy simulation, and wellbeing product research.

Impact

Useful to universities, researchers, and policymakers.

A practical asset for better decisions in student mental-health strategy.

University wellness teams can target support with higher precision.

Researchers gain a stronger baseline for reproducible behavioral modeling.

Policy teams get evidence grounded in current digital usage patterns.

Download dataset, code notes, and report.

Clean, open, and ready for serious research workflows.