Correlation measures the strength and direction of a relationship between two numerical variables.
π It answers questions like:
When X increases, does Y increase or decrease?
How strongly are X and Y related?
π Correlation does NOT mean causation.
Example:
Ice cream sales β and temperature β β correlated
Ice cream sales β does NOT cause temperature β
2οΈβ£ Why Correlation is Important in Data Science
Correlation is used in:
β Exploratory Data Analysis (EDA)
β Feature selection
β Detecting multicollinearity
β Understanding data patterns
β Model simplification
β Business insights
Example:
If two features are highly correlated, one may be removed.
3οΈβ£ Direction of Correlation
β Positive Correlation
Both variables increase together
Example: Height & Weight
π Graph: Upward slope
β Negative Correlation
One increases, the other decreases
Example: Speed & Travel Time
π Graph: Downward slope
βͺ Zero Correlation
No relationship
Example: Shoe size & IQ
π Graph: Random scatter
4οΈβ£ Correlation Coefficient (r)
The correlation coefficient measures correlation numerically.
Range:
-1 β€ r β€ +1
Value of r
Meaning
+1
Perfect positive
-1
Perfect negative
0
No correlation
Β±0.7 to Β±1
Strong
Β±0.3 to Β±0.7
Moderate
Β±0.0 to Β±0.3
Weak
5οΈβ£ Pearson Correlation (Most Common)
π Used for:
Linear relationships
Continuous numerical data
Formula:
β Linear relationship
β No extreme outliers
β Normal distribution (optional but preferred)
Example:
Study hours & exam marks
6οΈβ£ Spearman Rank Correlation
π Used for:
Monotonic (non-linear) relationships
Ranked or ordinal data
Key Idea:
Convert values into ranks
Apply Pearson on ranks
Example:
Customer satisfaction rank vs loyalty rank
7οΈβ£ Kendallβs Tau Correlation
π Used for:
Small datasets
Ordinal data
Robust to ties
Concept:
Counts concordant & discordant pairs
Example:
Ranking similarity between two judges
8οΈβ£ Correlation vs Covariance
Covariance
Correlation
Measures joint variability
Measures strength & direction
Units depend on data
Unit-free
Hard to interpret
Easy to interpret
Range: ββ to +β
Range: β1 to +1
π Correlation = Normalized covariance
9οΈβ£ Correlation Matrix
A correlation matrix shows correlations between multiple variables.
Example:
A
B
C
A
1
0.8
-0.2
B
0.8
1
-0.4
C
-0.2
-0.4
1
π Used in:
Feature selection
Heatmaps
Multivariate EDA
π₯ 10οΈβ£ Multicollinearity
What is it?
When independent variables are highly correlated
Problems:
β Unstable coefficients
β Reduced model interpretability
β Inflated variance
Detection:
Correlation Matrix
VIF (Variance Inflation Factor)
11οΈβ£ Correlation β Causation (Very Important)
Correlation does NOT mean one variable causes the other.
Example:
Crime rate & Ice cream sales are correlated
Both depend on temperature
π Hidden variable = Confounding factor
12οΈβ£ Limitations of Correlation
β Only measures linear relationships (Pearson)
β Sensitive to outliers
β Cannot capture cause-effect
β Misses complex patterns
13οΈβ£ Correlation in Machine Learning
Used in:
Feature elimination
Dimensionality reduction
Data cleaning
Model diagnostics
Example:
Remove one of two features with r > 0.9
14οΈβ£ Real-World Example (Data Science)
π Dataset: House Prices
Feature
Correlation with Price
Area
+0.85
Distance to city
-0.62
Age of house
-0.40
Bedrooms
+0.70
Interpretation:
Area strongly increases price
Distance negatively impacts price
15οΈβ£ Visualizing Correlation
β Scatter plots
β Heatmaps
β Pair plots
16οΈβ£ Summary (Key Takeaways)
β Correlation measures relationship, not causation
β Range is from β1 to +1
β Pearson β Linear
β Spearman β Rank / Non-linear
β Used heavily in EDA & ML
β Helps detect redundancy in features
Top comments (0)