Thursday, September 26, 2013

Predictive Modeling of S.F.P.D. Crime: Daily and Long-Term Tailoring

My last post established that a linear model of crime volume from 2003 to 2013 has a slope of -25 (incorrectly labeled 23 in the image from the last post), and that there is variation in crime volume associated with the day of the week. This post details how I intend to include this into the model.

My approach is to shift the means of the normal distributions that I am pulling from by applying a number of modifiers, which will incorporate the specific trends I have mentioned.

As a demonstration, assume that the daily decrease in crime is ~1%, Wednesday has 5% lesscrime than a normal day and Tuesday has 5% more crime than a normal day. Tailoring the model to include these tendencies requires shifting the means of these normal distributions. As a concrete example, let's consider thefts. The mean is ~89 per day.




Through the week, the mean of the distribution selected will be changed like this:
MondayTuesdayWednesday
Mean=89Mean=89[(1-0.01)^1+(0.05)]Mean=89[(1-0.01)^2+(-0.05)]

The term inside of the brackets is the percentage modifier. The first term accounts for 1% decrease, compounding daily, hence the geometric term.  The second term accounts for the day of the week. I'll explain the origin of each later.


Long-Term Decrease:
Sequentially taking monthly percentage changes in crime volume from 2003 to 2013 produces a series of 120 "returns".The geometric mean comes to -0.212%. Bringing this to our example of an ~89 mean, day one will have 89 crimes. Day two will have 89*(1+Geometric Mean=-0.00212) = 88.81. Day three will have day two's mean, 88.81*(1+Geometric Mean = -0.00212)=88.62. This can be simplified by 89(1+GeoMean)^(Day), where "Day" is the number of the days from the start. Over a year period, the last number for Day would be 365.
The long-term decrease will be accounted for by a term of (0.997877)^(Day of Forecast).

Day of Week Modifier:
This is a lesson in documenting code. I don't remember how I got to these results, but here's what I have noted. Listed in order from Monday to Sunday, -3.3%, 0.4%, 3.5%, 0.3%, 6.6%, 0.4%, -7.8%. The numbers work, in the sense that they sum pretty close to 0 so that they will shift the distributions without affecting the totals. This is important because historical volume informs my model, and I want to stay true to that while tailoring to make forecasts account for the day of the week.
Edit:
Alright, I went back. Here's how I got those numbers.
Each year, take the number of crimes occurring on each day of the week and find the mean. For example, Monday through Friday have 100 crimes each day and Saturday, Sunday have 200 each. The mean is 128. Daily volume can be represented as a percentage above/below the mean. Saturday, Sunday are 56% above it while Monday through Friday are -21%. Summing these percentages gives zero, meaning that the predicted means overly an annual period should be consistent with historical annual means.
Daily Modifier Shift:
Monday, -0.034
Tuesday, 0.002
Wednesday, 0.033
Thursday, 0.001
Friday, 0.067
Saturday, 0.006
Sunday, -0.074







No comments:

Post a Comment