Saturday, March 15, 2014

Horse Racing: Introductory Analysis

I want to start getting my head around the basic information that could influence outcomes.

I have these race-level fields sloppily copy-pasted from the R terminal. I've bolded the fields that scream to me as being relevant.
"Track"           "Date"            "Race Number"     "Race Type"     
 [5] "HorseType"       "ClaimPriceHigh"  "ClaimPriceLow"   "Race Length"   
 [9] "Record Time"     "Record Date"     "Purse"           "Plus"          
[13] "Available Money" "Valueist"        "Value2nd"        "Value3rd"      
[17] "Weather"         "Track"           "Time"            "Start"         
[21] "2FTime1"         "2FTime2"         "3FTime1"         "3FTime2"       
[25] "3FTime3"         "4FTime1"         "4FTime2"         "4FTime3"       
[29] "4FTime4"         "t_Final"         "2SplT1"          "2SplT2"        
[33] "3SplT1"          "3SplT2"          "3SplT3"          "4SplT1"        
[37] "4SplT2"          "4SplT3"          "4SplT4"          "Run-up"        
[41] "WPS_Pool"        "ID"


Horse Type:

Of the 41 unique strings with horse type information, here's those that occur more than 10x.
Most horses are three or more years old. Most are fillies. This information might need to be coerced into similar groups to be useful but until I know what all of this means, I don't want to lose information.

 I've only seen fillies run since I've been out so I'm assuming this is the norm. The highest frequency entry at the top is vague about whether they are male/female so I'm going to assume female unless otherwise specified.

                                    For Thoroughbred Three Year Old and Upward
                                                                    59
          For Thoroughbred Three Year Old and Upward Fillies and Mares
                                                                    27
                               For Thoroughbred Three Year Old Fillies
                                                                    23
                             For Thoroughbred Four Year Old and Upward
                                                                    23
                                         For Thoroughbred Two Year Old
                                                                    21
                                       For Thoroughbred Three Year Old
                                                                    20
                                 For Thoroughbred Two Year Old Fillies
                                                                    16
                    For Thoroughbred Three Year Old and Upward (NW2 L)
                                                                    15
  For Thoroughbred Three Year Old and Upward Fillies and Mares (NW2 L)
                                                                    14
For Thoroughbred Three Year Old and Upward Fillies and Mares (NW2 L X)
                                                                    12
           For Thoroughbred Four Year Old and Upward Fillies and Mares
                                                                    11


For Claim Price:

As an incentive against running superior horses, horses can be bought at the "claim price" after each race (at least, that's my understanding). This keeps a competitive~ish market. I'm not sure if this name is a relic of days past or not and it warrants further investigation. Most of the races are under 10k with a decent showing from 35-40k.



Race Length:

 For conversion purposes, one mile is eight furlongs. Most races are significantly shorter than a mile. I'd guess that the shorter races have more variable outcomes as it's easy to see that the odds out horses lose their pace in the longer races but compete well in the shorter runs.

                                                        Six Furlongs On The All Weather Track
                                                                                                 111
                                                                   One Mile On The All Weather Track
                                                                                                  71
                                                 Five And One Half Furlongs On The All Weather Track
                                                                                                  68
                                                                                One Mile On The Turf
                                                                                                  47
                                                             One And One Sixteenth Miles On The Turf
                                                                                                  30
                                                One And One Sixteenth Miles On The All Weather Track
                                                                                                  17
                                                              Five Furlongs On The All Weather Track
                                                                                                  12
 
Win, Place, Show Values:

 
  
A more relevant range of interest occurs from 20k down (as the majority of races win values lie in that range). For this price range, win amounts look linear with a favorable slope for 3rd. As the price range goes up, the results skew nonlinear with 3rd offering less % value. Incentive for a conservative strategy, maybe? Worth noting even if I don't have the racing strategies down yet.



Start:

90% of races are flagged as having "a good start for all". The horse in PP1 starts worst most frequently at 2% but this isn't really fair given that the outside PP changes so I would have to compare race group by race group and I don't want to do that right now.



So that's a start. More to come.

The Bukowski Sequence:

Like any other 17 year old guy, Bukowski was a really fun read. Different era, different perspective. Slums, women, booze, fights, all make for a potent youth catnip. It got me back into reading so I have been grateful for that. More recently, and partially due to a location change, I've chased down another Bukowski fixture: the horses. As I like uncertainty and probalistic estimates, I figured I'd throw my hat into the ring.

First step was to acquire the dataset. Copy, pastes and the following script helped out with that:

Monday, December 9, 2013

The Benefits of Aggressive Driving: First Result

I've included the ability to change lanes. The aggressive car checks how much room it has in front of it in the current lane. If an obstruction is present within a distance, it checks the next lane for obstructions. If none exists in the second lane, it'll change lanes and hopefully improve its situation and elapsed course time.

I ran this program 500 times with the following conditions:
1. the top speed of the aggressive car is 33% faster than the other cars.
2. the acceleration of the aggressive car is twice that of the other cars.
3. the aggressive car will change lanes to improve its situation.
4. the number of cars in the simulation are 22, which makes the density look something like this.




The differences in elapsed time between the aggressive car and a test car starting at the same speed and position are in the histogram below.

According to this simulation, an aggressive driver usually benefits from his strategy but the degree of the benefit varies substantially. By percentage, the aggressive car finishes the course on average 22% faster than the test car.

Next step is to vary the number of obstacle cars and see how that shifts the distribution.

Thursday, December 5, 2013

The Benefits of Aggressive Driving: Simulation Employing Python OOP

Two weeks ago on the way home from work, an excessively aggressive driver was dodging through traffic behind me. It was night, but like all obnoxious drivers, the headlights were of the luminous, distracting blue-white ilk. Jumping lanes, aggressive acceleration, higher top speed. At the next light, we were lined up with three or so cars in front of both of us, on a two lane road.

On the green, I accelerated gently and kept my pace at the speed of traffic. The blue-white headlight car ,jumped on the bumper of the car ahead, accelerating aggressively and quickly jumping lanes (to no advantage). At the next red light, he was only one car ahead despite his strategy.

Got me thinking. Does an aggressive driving strategy pay off on surface streets?




Using the pygame module (the Python equivalent of Java's processing), I've modeled a surface street as six stoplight objects spread at random distances apart. The function that turns the crank here is screen.get_at, which finds the RGB color scheme at a specified (x,y) location on the grid. Each car object (white rectangles) looks at its current location and ahead of it to find potential obstacles and modifies its speed to avoid crashes. 
"Stoplights" are implemented through the redzones, which slow the speed of the car. If the car reaches the end of the redzone, it stops. The green or red lines to the left of the lane indicate the light status.

The cars with the small blue rectangles are the "racing" cars. They start at the same location and speed, but have different initialization values for acceleration and max speed. At the end of the course, the time elapsed from start to finish is recorded.

The results of five simulations:
(slow left lane car, fast right lane car)
[279, 282], [257, 290], [203, 316], [208, 298], [239, 291]
how much faster?
[1% faster, 12% faster, 56% faster, 43% faster, 22% faster]

I still need to include the ability of the car to change lanes. Comparing a stupid driver with a stupid and aggressive driver isn't very interesting. 

The end questions I want to answer:
1. How much does varying traffic volume, speed differential between fast and slow cars and the length of red lights affect the course time?
2. Establish a metric that weighs the value of quickly completing the course against moderate acceleration and top speed, and find a strategy for optimizing that metric.



Here is the code. I'll clean it up later.

Tuesday, December 3, 2013

Postmortem on General Moly Speculation





I keep an eye on the molybdenum market. The newsfeed contained both "December" and "General Moly", which got me thinking about an earlier post on GMO futures expiring in December 2013.

Relevant summary:
  • General Moly (GMO) is a development stage mining company. They want to dig up molybdenum. They have the land to do so and had the money, until...
  • The financing required for mine construction fell through due to an unfortunately timed detaining of a Chinese bank chairman.
  • The stock price fell accordingly.
My assumption was that if GMO was able to secure financing elsewhere, the stock price would rebound. Futures for GMO expiring in September and December amounted to a bet on whether GMO would be able to secure financing. To date, GMO has not secured financing, and the futures expired worthless.

Mt. Hope, the focal point of GMO, has proven and probable reserves of ~1.5 billion pounds according to GMO's website (and a health dose of copper). At current prices of $10/lb, GMO is sitting on $15B in molybdenum that isn't going anywhere, for better and for worse.

Thursday, November 21, 2013

Aggregating Aggregation: "Exception Handling or: How I learned to Stop Worrying and Let R Handle Imperfections"

The code for parsing Indeed.com by keyword and area in the previous post has some primitive features. It looks for exact patterns, and based on the usual syntax of Indeed HTML, subtracts a fixed number of characters to make a usable link which I can re-paste.
I think the error I have experienced comes from this fixed number subtraction taking away/leaving a relevant/irrelevant character. It works for most links, but not all. Ideally, I would just skip the exceptions and let the loop continue through.

R does this through tryCatch(), which has a confusing documentation. Reading through this posting from the Working With Data blog helped me in writing a function that loads as many pages of Indeed as specified.





tryCatch takes your code and runs it. If an error or warning occurs, it redirects to the warning/error "handler", below the tryCatch{}. In redirecting, it doesn't stop the main loop with an error so my problem seems avoided although I don't understand what happens entirely. I ran through 5 pages of Indeed postings for analyst, and it spit out 50 instances of the expected output (as shown in the previous post).

Now I need to figure out what kind of output I want this to give, how I can speed up the execution, and automate a list of user inputs for both the keywords searched and multiple search terms, ie, "analyst","physics",... rather than the single "analyst".

but the legs are operational!