Patent applications and grants are all public domain and searchable using a variety of sources (Google Patent Search, USPTO, NBER, & several proprietary sites). The problem with these sources is that they are unusable for conducting detailed patent research. Although they can help you find and read patents, and offer full-text and abstract search, they cannot compile patent statistics, or calculate indicators. It’s even impossible to generate and export a list of patent numbers based on your search criteria. In order to calculate these types of data required for meaningful research and patent management, a complete database is needed in an SQL server (MS-SQL server or MySQL for example). And while the USPTO data is available for download in a few formats, it’s no small task to download these files, parse them according to their changing format and then assemble a database.
Luckily, someone has done the saintly task of not only compiling a complete instruction manual on how this should be done, but also providing python scripts and MySQL scripts to import, parse and build the holy grail of patent analysis. That someone is Qiyuan Liu (刘启元) a MS student at the Graduate School of Library and Information Science. After reading the construction documentation provided by Liu, I was in awe at the bugs encountered. That being said, it’s a shame that the USPTO data remains in the shabby condition that it is.
Here is a link to the tutorial on how to build a USPTO research database for statistical analysis:
My simple point is that it’s time for the USPTO to clean up it’s act and pay someone like Mr. Liu to improve their website. By all means your search results should be available in CSV format, or tab-delimited to include patent numbers. That way, researchers can take advantage of the full-text bibliographic search capabilities of the USPTO site itself, and then export that list of patent numbers to and SQL database for further analysis.
Anyhow, many thanks to Mr. Liu for providing such an excellent resource for building a USPTO database!