Skip to content

Project 2 – Week 5

This week, I applied Principal Component Analysis (PCA) and K-means clustering to explore patterns of protests, riots, and violence across U.S. states.

PCA reduced the dataset to two main dimensions (PC1 and PC2), capturing over 96% of the total variation. This made it easier to compare states based on their normalized event rates per million people.

Using the PCA results, I grouped states into three clusters. Most states clustered tightly together, while a few outliers formed their own distinct clusters.

Noticeably:

  • Cluster 0 (Oregon) and Cluster 1 (District of Columbia) were positioned far outside the main group in the PCA plot.
  • District of Columbia (Cluster 1) had the highest protest rate by far, making it a clear outlier.
  • Oregon (Cluster 0) stood out with high levels of both protests and riots, reflected in its extreme PC2 value.
  • The rest of the states (Cluster 2) shared more similar patterns and grouped near the origin.

The top 5 states with the highest PC1 values (indicating highest overall event intensity) were:

State PC1 PC2 Cluster
District of Columbia 10.76 -0.69 1
Oregon 2.45 3.51 0
Vermont 1.16 -0.82 2
New Mexico 0.77 -0.77 2
New York 0.64 0.44 2
Published inUncategorized

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Skip to toolbar