Teen_Mental_Health_Analysis

Data Cleaning Report

Objective

The goal of the cleaning process was to prepare the teen mental health dataset for exploratory data analysis and visualization.

Input And Output Files

Input:

data/Teen_Mental_Health_Dataset.csv

Output:

data/Teen_Mental_Health_cleaned.csv

Initial Dataset Profile

Metric Value
Rows 1,200
Columns 13
Missing values 0
Duplicate rows 0

Cleaning Steps

1. Column Name Standardization

Column names were standardized to a consistent snake_case style.

Examples:

2. Duplicate Review

Duplicate rows were checked with:

df.duplicated().sum()

Result:

0 duplicate rows

No duplicate rows needed to be removed.

3. Missing Value Review

Missing values were checked with:

df.isnull().sum()

Result:

0 missing values across all columns

No imputation was required for the teen dataset.

4. Categorical Standardization

Text categories were standardized by stripping whitespace and converting values to lowercase.

Columns cleaned:

Example:

df["gender"] = df["gender"].str.strip().str.lower()

5. Numeric Range Validation

The following checks were performed:

df[df["age"] <= 0]
df[df["daily_social_media_hours"] <= 0]
df[df["sleep_hours"] <= 0]
df[~df["depression_label"].isin([0, 1])]

No invalid values were found based on these checks.

Final Dataset Profile

Metric Value
Rows 1,200
Columns 13
Missing values 0
Duplicate rows 0

Export Command

The cleaned dataset was exported with:

df.to_csv("data/Teen_Mental_Health_cleaned.csv", index=False)

Using index=False prevents pandas from writing the DataFrame index as an extra CSV column.

Cleaning Conclusion

The dataset was already mostly clean. The main cleaning work involved standardizing text categories and validating that the data had no missing values, duplicate rows, or invalid numeric ranges.