Student mental health intelligence
A clean, decision-grade dataset on social media, sleep, and student wellbeing.
Built from 7,500+ records and 18+ features, including engineered indicators for sleep quality, screen intensity, sentiment, and productivity behavior.
Problem
Most available datasets are broad, but not behavior-deep.
They often miss modern usage patterns, compounded risk effects, and engineered features needed for meaningful mental-health analysis.
Typical datasets
Single-variable framing, weak behavioral context, and limited value for predictive or intervention-focused work.
This dataset
Built to model interactions across sleep, screen time, and sentiment with features that increase interpretability and performance.
Solution / Approach
Simple pipeline, high-quality output.
Each stage removes noise and adds analytical meaning.
Selenium collection with validation checks
BeautifulSoup parsing and schema cleanup
Data normalization and quality controls
Feature engineering for behavior signals
Insight synthesis for real decisions
Dataset Showcase
Focused, model-ready, and research-friendly.
A compact view of what makes this dataset practically useful.
| Dimension | MindPulse | Typical sets |
|---|---|---|
| Rows | 7,500+ | <2,000 in many public sets |
| Features | 18+ with engineering | Mostly raw survey fields |
| Sleep | Hours + pattern quality | Single self-reported field |
| Screen time | Estimated intensity | Often missing |
7,500+
Validated records
18+
Raw + engineered features
Selenium + BS4
Collection and parsing stack
Feature Engineering
Engineered signals that improve relevance.
Concise feature set designed for interpretability and practical model performance.
Sleep quality signals
Debt and regularity measures beyond average sleep hours.
Screen intensity score
Estimated behavioral exposure instead of raw usage time only.
Sentiment-derived indicators
Mood trend features extracted from text and routine signals.
Productivity index
Composite measure linking behavior to academic focus patterns.
Insights
Clear findings with direct strategic value.
Condensed signals to support intervention planning and model development.
2.3x
Higher distress signal when low sleep and high screen intensity combine
67%
Predictive lift after adding engineered behavior features
41%
Improved mood stability in moderate social-use cohorts
3.4h
Average daily social-media exposure across the sample
ML Use Cases
Built for real deployment contexts.
Supports risk prediction, intervention timing, policy simulation, and wellbeing product research.
Impact
Useful to universities, researchers, and policymakers.
A practical asset for better decisions in student mental-health strategy.
University wellness teams can target support with higher precision.
Researchers gain a stronger baseline for reproducible behavioral modeling.
Policy teams get evidence grounded in current digital usage patterns.
Download dataset, code notes, and report.
Clean, open, and ready for serious research workflows.