Monday, August 26, 2013

SFPD Project

SFPD has publicly available data on all reported crime including descriptions/location/time/etc. I've spent the last few weeks parsing it and trying to package it in useful ways to an incoming PD officer. Here's a few plots.




Monday, August 12, 2013

American Rhetoric Wordcloud

I found a list of 100 great speeches in American history and made a wordcloud using the R package of the same name and some Python to parse the texts with a code snippet I already had. After taking out some of the expected words like "the", "if", "when", ..., this is the result.


For context, "people" and "progress" were mentioned 554 and 52 times respectively.  Political shows 5 or 6 times. That could be a quirk with the wordcloud package, or, more likely, something wrong with how I put together the data in R.
Takeaways?

  • There's a focus on who is involved or affected rather than the what affects them: "people", "human", "man",, "children", "nations", "public", "country", "american", "world", "men" all are big players. What's like "Political", "economic", "security", "policy", "independence", "forces", "freedom", "progress" are all listed but with less frequency. An alternative explanation is that more synonyms exist for the "what" words rather than the "who" words.
  • Interesting enough, "war" and "peace" occupy similar areas. "War" appears 310 times with "peace" behind at around 240.
  • A few of the speeches belong to Martin Luther King and so "black" made the list of 50+ occurrences, at 62. Unfortunately, two words were left out of the plot due to sizing issues. "Black" was one of them. I thought we were past this, R.
  • I wouldn't have imagined temporal diction to play so heavily. "Years", "today", "before", "time", "now", "present", "history" are some examples. A call-to-action is common advice for public speaking, but I imagine that just as important in these speeches would be putting historical weight behind claims. "For years, we blah blah blah" makes a journey out of social change or societal innovations, while "We blah blah blah" does not sound as legitimate and important.


I want to compare American rhetoric against historically evil contenders in the Hitlers and Stalins of the world, just to see what/how much overlaps, but this is enjoyable enough to warrant its own post.