Skip to content

Darksteel047/Parkinsons_EDA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

📖 Vocal-Feature-Analysis-for-Parkinsons


Overview

This project performs a comprehensive exploratory data analysis (EDA) on a Parkinson’s disease dataset. The goal is to uncover patterns, relationships, and distinguishing features between Healthy individuals and Parkinson’s patients.

The dataset contains voice measurement features (Jitter, Shimmer, HNR, etc.) collected from subjects. This EDA helps identify key features for further predictive modeling.


📂 Dataset


Analysis Steps and Key Insights:

1️⃣ Data Loading and Cleaning

  • Checked the shape and column info along with their data types.
  • Checked for missing values and duplicate values
  • Checked the target variable distribution.
  • Removed unnecessary identifier columns (e.g., name).
  • Ensured all features are numeric.
  • Stored a clean copy for analysis.

2️⃣ Descriptive Statistics

  • Computed summary of the numerical data for each column using the .describe() method.
  • Identified features with high variance for further analysis.
  • Observed mean differences between Healthy and Parkinson’s groups for various features.
  • Features like MDVP:Fo(Hz) and HNR are lower in Parkinson’s patients.
  • Features like spread1, PPE, and MDVP:Shimmer are higher in Parkinson’s patients.

3️⃣ Univariate Analysis

Ratings Distribution

  • Plotted distributions of key acoustic features (MDVP:Fo(Hz), HNR, spread1, RPDE, PPE, etc.) by status to visualize how voice characteristics differ between healthy and Parkinson’s individuals.

Insights:

  • 📉 MDVP:Fo(Hz) and HNR values are typically lower for Parkinson’s patients.
  • 📈 spread1, RPDE, and PPE are higher for Parkinson’s patients, indicating greater voice signal irregularity.
  • The clear shifts in distributions suggest that these features are strong indicators for distinguishing between the two classes.

4️⃣ Bivariate Analysis

Ratings Distribution

  • Observed differences in median, spread, and outliers using Boxplots.

Insights:

  • Identified top distinguishing features: MDVP:Fo(Hz), HNR (lower in Parkinson’s), RDPE, spread1, PPE (higher in Parkinson’s)
  • Individuals with Parkinson's (Status 1) exhibit significantly lower vocal frequencies (MDVP:Fo, MDVP:Flo) and Harmonics-to-Noise Ratio (HNR). Conversely, measures of vocal irregularity and complexity like PPE, RPDE, and spread1 are markedly higher in the Parkinson's group.
  • Some features (e.g., MDVP:Fhi(Hz)) show medium differences, but high overlap between classes results in being the least effective differentiator.

5️⃣ Correlation Analysis

Ratings Distribution

  • Computed correlation matrix of all numeric features.
  • Visualized using a heatmap with a color gradient.
  • Identified highly correlated features, especially among Jitter and Shimmer measures.
  • Checked correlation with status to determine top predictive features.

Insights:

  • spread1, PPE, MDVP:Fo(Hz), MDVP:Flo(Hz), MDVP: Shimmer, HNR are most correlated with Parkinson’s status.
  • High correlation among Jitter/Shimmer features indicates potential multicollinearity.

5️⃣ Multivariate Analysis

Ratings Distribution

  • Explored feature interactions using pairplots.
  • Observed clustering and separation trends between Healthy and Parkinson’s groups.

Insights:

  • Identified feature pairs that highlight class distinction, e.g., MDVP:Fo(Hz) vs HNR.
  • The diagonal density plots confirm the findings from the boxplots: spread1, PPE, and MDVP:Shimmer are higher for Parkinson's patients, whereas HNR and MDVP:Fo(Hz) are lower.
  • The tight correlation between features like spread1 and PPE indicates multicollinearity.
  • The scatter plots reveal excellent class separation in multi-feature space.

6️⃣ Key Takeaways

  • Features with high mean difference and low overlap are most discriminative.
  • Some features are highly correlated with each other; careful feature selection is needed for modeling.
  • Visual exploration confirms that voice features can distinguish Healthy vs Parkinson’s patients.

🛠️ Tools & Technologies

Category Tools Used
Data Handling pandas, numpy
Visualization matplotlib, seaborn
Environment Jupyter Notebook
Insights & Observations chatgpt,gemini

📊 Visualizations

  • Boxplots for key features by status
  • Correlation heatmap to check feature relationships
  • Pairplots for feature interactions

📄 License

This project is licensed under the MIT License.

Releases

No releases published

Packages

 
 
 

Contributors