This week, I applied Principal Component Analysis (PCA) and K-means clustering to explore patterns of protests, riots, and violence across U.S. states.
PCA reduced the dataset to two main dimensions (PC1 and PC2), capturing over 96% of the total variation. This made it easier to compare states based on their normalized event rates per million people.
Using the PCA results, I grouped states into three clusters. Most states clustered tightly together, while a few outliers formed their own distinct clusters.
Noticeably:
- Cluster 0 (Oregon) and Cluster 1 (District of Columbia) were positioned far outside the main group in the PCA plot.
- District of Columbia (Cluster 1) had the highest protest rate by far, making it a clear outlier.
- Oregon (Cluster 0) stood out with high levels of both protests and riots, reflected in its extreme PC2 value.
- The rest of the states (Cluster 2) shared more similar patterns and grouped near the origin.
The top 5 states with the highest PC1 values (indicating highest overall event intensity) were:
State | PC1 | PC2 | Cluster |
District of Columbia | 10.76 | -0.69 | 1 |
Oregon | 2.45 | 3.51 | 0 |
Vermont | 1.16 | -0.82 | 2 |
New Mexico | 0.77 | -0.77 | 2 |
New York | 0.64 | 0.44 | 2 |
Be First to Comment