Sunday, September 29, 2013

Predictive Modeling of S.F.P.D. Crime: Falling Short

Turning the crank in Python and R, here's my first long term forecast incorporating the tailoring detailed in the last few posts. I didn't spend much time on the plot, hence the funky axis units. The left of the red line is the actual daily volumes from 2003-2013, while after the red line is the forecast volumes. 


So far, a long-term decreasing trend and a day of the week modifier has been included. By inspection, the long-term decrease seems to be doing what it needs to because there is a clear decreasing trend. What about the day of the week modifier?

Percentile results show that the forecast volumes are lower and more tightly distributed than historical volumes. Forecasts are in italics and historical results are in regular font.

Wednesday
 Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  202.0   339.0   372.0   377.6   414.0   666.0 
  Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  277.0   326.2   344.0   344.7   362.0   422.0 
Thursday
Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  197.0   352.0   379.0   382.9   413.0   576.0 
Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    249     311     331     331     349     406 
Friday
 Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    6.0   350.0   388.0   388.8   429.0   552.0 
 Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  292.0   337.0   353.0   356.0   371.8   427.0 
Saturday
Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  149.0   325.0   363.0   361.6   393.0   566.0 
Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  272.0   316.0   334.0   335.4   355.0   394.0 
Sunday
 Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    2.0   313.0   340.0   341.3   369.0   588.0 
Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  237.0   291.8   308.0   309.6   326.2   376.0 
Monday
 Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  190.0   329.0   362.0   362.6   395.0   522.0 
 Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  256.0   298.0   317.5   318.5   338.0   395.0 
Tuesday
Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  210.0   347.0   376.0   379.4   409.0   553.0 
 Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  274.0   311.8   328.0   333.9   352.0   421.0 



In both cases, Friday is the peak day for historical and forecast crime while Sunday is the minimum. At first glance, the modifier looks like it has successfully shifted the daily volumes to better fit historical patterns. 

Takeaways:
  • The forecast volumes hug the mean more closely than historical volumes.
    More technically, switching from a normal distribution to a more fat tailed distribution is probably warranted. A fatter tail in the probability distribution will pull more "extreme" values more often, and so forecast volumes will spread away from the mean. When I incorporate the Monte Carlo aspect, this should lead to more accurate percentile results.
  • The mean of the forecast volumes are lower than the historical means.
    I have added a decreasing trend, so volumes would be expected to be lower. Additionally, the historical means include the last ten years of crime and so reflect the higher volumes early on in the sample. Comparing instead with a more recent set like the three year period from 2009-2012. Turning the crank on that, the means increase so something seems amiss as that is not consistent from inspection.

No comments:

Post a Comment