Thursday, November 21, 2013

Aggregating Aggregation: "Exception Handling or: How I learned to Stop Worrying and Let R Handle Imperfections"

The code for parsing Indeed.com by keyword and area in the previous post has some primitive features. It looks for exact patterns, and based on the usual syntax of Indeed HTML, subtracts a fixed number of characters to make a usable link which I can re-paste.
I think the error I have experienced comes from this fixed number subtraction taking away/leaving a relevant/irrelevant character. It works for most links, but not all. Ideally, I would just skip the exceptions and let the loop continue through.

R does this through tryCatch(), which has a confusing documentation. Reading through this posting from the Working With Data blog helped me in writing a function that loads as many pages of Indeed as specified.





tryCatch takes your code and runs it. If an error or warning occurs, it redirects to the warning/error "handler", below the tryCatch{}. In redirecting, it doesn't stop the main loop with an error so my problem seems avoided although I don't understand what happens entirely. I ran through 5 pages of Indeed postings for analyst, and it spit out 50 instances of the expected output (as shown in the previous post).

Now I need to figure out what kind of output I want this to give, how I can speed up the execution, and automate a list of user inputs for both the keywords searched and multiple search terms, ie, "analyst","physics",... rather than the single "analyst".

but the legs are operational!

No comments:

Post a Comment