The USPTO Patent Litigation Dataset: Open Source, Extensive Docket and Patent Number Data
https://ift.tt/3oXWhTr
Guest Post by Prof. Ted Sichelman, University of San Diego School of Law
Many online services provide district court patent litigation dockets, documents, and associated patent numbers. However, none of these services offer comprehensive, hand-coded patent numbers and case types, plus full dockets and key documents (complaints, summary judgments, verdicts), downloadable in bulk at no charge and with no license restrictions.
After more than three years of extensive automated and manual review of patent dockets, the USPTO’s Office of the Chief Economist —in conjunction with researchers from the University of San Diego’s Center for IP Law & Markets (myself) and Northwestern Law School (David L. Schwartz)—have completed that very goal, expanding upon the patent litigation dataset the USPTO had released in 2015.
Currently, the dataset (available here) includes:
- Dockets: The complete docket for every lawsuit filed in district courts tagged as a patent action in PACER (and many other patent cases tagged under non-patent PACER codes) from the first patent case logged in PACER through the end of 2016 (over 80,000 case dockets).
- Attorneys & Parties: Full lists of parties by type (e.g., plaintiff, defendant, intervener, etc.) and their attorneys, with full contact information for the attorneys gathered from public records.
- Patent Numbers: Comprehensive, hand-coded patent numbers by a team over 30 law students from all electronically available complaints in PACER in cases filed from 2003 through the end of 2016.
- Based on testing against several of the leading commercial services, plus publicly available data from the Stanford NPE Litigation Database, the dataset’s patent number information is substantially more complete and accurate than any of these services (which often use automated methods for determining patents-in-suit).
- Case Types: Every case in PACER filed from 2003 through the end of 2016 is identified with one of 15 fine-grained case types, including patent infringement (non-declaratory judgement [DJ]), DJ on non-infringement and invalidity, DJ on non-infringement only, DJ on invalidity only, false marking, inventorship, malpractice, regulatory challenge, and others.
In the next few months, the USPTO will make available:
- Documents: Initial complaints, summary judgment orders, and verdicts (bench and jury) that are electronically available for all patent cases in PACER filed from 2003 through the end of 2016.
The data is downloadable only in bulk and is not searchable at the USPTO website. However, it is relatively straightforward to download and search the patent number, case type, and attorney data, in Microsoft Excel or other database and statistical packages.
Importantly, there are no licensing restrictions whatsoever on the use of the data, and the research team and USPTO expect that commercial and non-commercial services will add the information to their search interfaces in the coming year. Additionally, the research team is hoping to update all of the data for patent cases filed through the end of 2020 sometime next year. Further down the road, we hope to code cases for outcomes and add appeals by supplementing Jason Rantanen’s comprehensive Compendium of Federal Circuit Decisions with full dockets and key documents.
In examining litigation trends, many researchers across the academic, public, and private sectors have used proprietary datasets, which generally could not be disclosed to other researchers for study replication and testing. Hopefully, academics and others will now use the USPTO’s fully open dataset for studies on the U.S. patent litigation system to allow for meaningful review of empirical studies.
Documentation on the database is available here and at the USPTO webpage. Anyone interested in using the data is also welcome to contact me (tsichelman@sandiego.edu) with technical and other questions.
legal_stuff
via Patent Law Blog (Patently-O) https://patentlyo.com
December 16, 2020 at 11:41AM