Pass the parser please

I've been tinkering with a new search engine. The first trick, was to build a parser to turn a recipe into an orderly format. The good side: all I needed to parse out is the ingredients. The bad side: funky recipes are not prone to parsing. My site doesn't really store recipes: it just organizes the data so that searches are relevant and the result link to the offsite resource.

When I looked to see if anyone else was doing it, I found a recipe XML format (which is very easy to parse). It had some issues with that format, so I modified it a little.

Then, I looked for a natural language parser. One site, Recipezaar, toiled under this task for two years (http://www.decafbad.com/blog/2003/11/14/the_recipe_web). I got version 0.1 in two weeks: a parser for foodtv.ca recipes. All of the sites are a little bit different, so I'm going to build a modified parser for each site I hope to crawl.

If you want to add a recipe resource, please feel free: http://dewolfe001.dotgeek.org/EmptyCupboard/addsite.php

If you want to see a recipe of mine and its recipe XML counterpart, be my guest.

Comments

Popular posts from this blog

A-Morning Cancelled... again