Notice - Changes to bulk download files

The PatentsView Team has begun a long-term effort of reparsing the patents long text data to address formatting errors, missing data and to retain line breaks. In order to be consistent, we have updated the format of the bulk download files. The fields are still tab separated ("\t"), but now all non-numeric fields are enclosed with double quotes.

The double quotes occurring within the text fields are escaped according to RFC-4180 (another double quote). See detailed list of changes here: Release Notes

Additionally, we are releasing a set of Python and R scripts to assist with reading in the bulk download files. These files are posted on GitHub: https://github.com/CSSIP-AIR/PatentsView-Code-Snippets/tree/master/03_bulk_download_read_in

Please contact the PatentsView Team at contact@patentsview.org if you have any questions or suggestions.

Data Download Tables

Table Name Description # of Rows Origin Data Last Updated
inventor_genderzip: 7.9 MiB, tsv: 64.7 MiB Gender assignment of disambiguated inventor. Methods Report 1,698,216 raw June 10, 2020
applicationzip: 87.7 MiB, tsv: 391.9 MiB Information on the applications for granted patent 7,330,227 raw June 10, 2020
assigneezip: 13.7 MiB, tsv: 30.3 MiB Disambiguated assignee data 488,265 disamb June 10, 2020
botaniczip: 566.6 KiB, tsv: 1.1 MiB Botanic information for plant patents 16,168 raw June 10, 2020
brf_sum_textzip: 14.2 GiB, tsv: 55.4 GiB Brief summary text 6,567,133 raw June 10, 2020
claimzip: 12.3 GiB, tsv: 39.2 GiB Full text of patent claims, including dependency and sequence 101,535,737 raw
cpc_currentzip: 1.4 GiB, tsv: 3.8 GiB Current CPC classification data for all patents (applied retrospectively to all patents) 43,047,816 raw (from separate classification files) June 10, 2020
cpc_groupzip: 21.5 KiB, tsv: 67.8 KiB Lookup table of current CPC groups 673 raw (from separate classification files) June 10, 2020
cpc_subgroupzip: 5.4 MiB, tsv: 60.9 MiB Lookup table of current CPC subgroups 260,269 raw (from separate classification files) June 10, 2020
cpc_subsectionzip: 3.2 KiB, tsv: 7.9 KiB Lookup table of current CPC subsections 137 raw (from separate classification files) June 10, 2020
detail_desc_text39.40 GB Detailed patent description text 6,260,847 raw
draw_desc_textzip: 4.6 GiB, tsv: 12.3 GiB Drawing description text 78,383,486 raw June 10, 2020
foreign_priorityzip: 117.5 MiB, tsv: 280.4 MiB Foreign priority data 3,415,996 raw June 10, 2020
figureszip: 156.6 MiB, tsv: 277.5 MiB Number of figures and sheets 6,808,933 raw June 10, 2020
foreigncitationzip: 972.8 MiB, tsv: 2.4 GiB Citations made to foreign patents by US patents 29,011,345 raw June 10, 2020
government_interestzip: 4.6 MiB, tsv: 32.7 MiB Raw government interest statements on all patents (where available) 148,115 raw June 10, 2020
government_organizationzip: 5.7 KiB, tsv: 32.9 KiB Organization names and related agency hierarchy parsed from the government interest statements on all patents (where available) 288 processed June 10, 2020
inventorzip: 45.0 MiB, tsv: 122.0 MiB Disambiguated inventor data 3,934,993 disamb June 10, 2020
ipcrzip: 537.8 MiB, tsv: 1.6 GiB International Patent Classification data for all patents (as of publication date) 16,363,372 raw June 10, 2020
lawyerzip: 5.5 MiB, tsv: 11.8 MiB Disambiguated lawyer data 171,525 disamb June 10, 2020
locationzip: 5.8 MiB, tsv: 12.1 MiB Disambiguated location data, including latitude and longitude 143,854 disamb June 10, 2020
location_assigneezip: 11.6 MiB, tsv: 39.8 MiB Metadata table for many-to-many relationships 631,571 disamb (linking table) June 10, 2020
location_inventorzip: 29.7 MiB, tsv: 268.5 MiB Metadata table for many-to-many relationships 5,513,196 disamb (linking table) June 10, 2020
mainclasszip: 2.4 KiB, tsv: 7.1 KiB Lookup table of original USPC main classes (as of patent publication date) 1,239 raw June 10, 2020
mainclass_currentzip: 7.5 KiB, tsv: 21.5 KiB Lookup table of current USPC main technology classes (applied retrospectively to all patents) 511 raw (from separate classification files) June 10, 2020
nberzip: 115.3 MiB, tsv: 228.9 MiB NBER classification data for all patents up to May 2015 5,105,938 raw (from separate classification files) June 10, 2020
nber_categoryzip: 208.0 B, tsv: 92.0 B Lookup table for NBER categories 7 raw (from separate classification files) June 10, 2020
nber_subcategoryzip: 611.0 B, tsv: 906.0 B Lookup table for NBER subcategories 38 raw (from separate classification files) June 10, 2020
non_inventor_applicantzip: 217.4 MiB, tsv: 462.2 MiB Non-inventor applicant information 4,123,669 raw June 10, 2020
otherreferencezip: 3.4 GiB, tsv: 7.0 GiB Non-patent citations mentioned in patents (e.g. articles, papers, etc.) 41,515,879 raw June 10, 2020
patentzip: 1.4 GiB, tsv: 5.4 GiB Data on granted patents 7,330,227 raw June 10, 2020
patent_assigneezip: 153.5 MiB, tsv: 476.3 MiB Metadata table for many-to-many relationships 6,568,539 disamb (linking table) June 10, 2020
patent_contractawardnumberzip: 1.3 MiB, tsv: 4.3 MiB Contract or award numbers parsed from the government interest statements on all patents (where available) 177,867 processed June 10, 2020
patent_govintorgzip: 596.9 KiB, tsv: 2.2 MiB Metadata table with patent-to-organization relationships linked to the government_organization table 178,554 processed June 10, 2020
patent_inventorzip: 324.8 MiB, tsv: 1.0 GiB Metadata table for many-to-many relationships 17,699,849 disamb (linking table) June 10, 2020
patent_lawyerzip: 113.2 MiB, tsv: 354.1 MiB Metadata table for many-to-many relationships 8,297,695 disamb (linking table) June 10, 2020
pct_datazip: 51.1 MiB, tsv: 144.7 MiB PCT data 1,453,801 raw June 10, 2020
persistent_assignee_disambigzip: 671.1 MiB, tsv: 1.3 GiB Persistant Assignee Disambiguation 6,568,539 raw June 10, 2020
persistent_inventor_disambigzip: 555.8 MiB, tsv: 2.8 GiB Persistant Inventor Disambiguation 17,699,849 raw June 10, 2020
rawassigneezip: 438.2 MiB, tsv: 837.1 MiB Raw assignee information as it appears in the source text and XML files 6,568,539 raw June 10, 2020
rawexaminerzip: 326.9 MiB, tsv: 703.7 MiB Raw examiner information 9,970,204 raw June 10, 2020
rawinventorzip: 937.7 MiB, tsv: 1.9 GiB Raw inventor information as it appears in the source text and XML files 17,699,849 raw June 10, 2020
rawlawyerzip: 432.8 MiB, tsv: 874.4 MiB Raw lawyer information as it appears in the source text and XML files 8,317,054 raw June 10, 2020
rawlocationzip: 1.2 GiB, tsv: 2.8 GiB Raw location data for inventors and assignees, as it appears in xml and text source files 28,382,931 raw June 10, 2020
rel_app_textzip: 192.6 MiB, tsv: 815.0 MiB Related applications text 1,857,166 raw June 10, 2020
subclasszip: 599.3 KiB, tsv: 2.6 MiB Lookup table of original USPC subclasses (as of patent publication date) 272,492 raw June 10, 2020
subclass_currentzip: 2.1 MiB, tsv: 7.3 MiB Lookup table of current USPC subclasses (applied retrospectively to all patents) 168,049 raw (from separate classification files) June 10, 2020
us_term_of_grantzip: 85.9 MiB, tsv: 201.9 MiB U.S. term of grant data 3,549,398 raw June 10, 2020
usapplicationcitationzip: 1.5 GiB, tsv: 3.9 GiB Citations made to US patent applications by US patents 40,550,104 raw June 10, 2020
uspatentcitationzip: 4.1 GiB, tsv: 10.4 GiB Citations made to US granted patents by US patents 108,913,837 raw June 10, 2020
uspczip: 490.3 MiB, tsv: 963.2 MiB USPC classification data for all patents 18,053,911 raw June 10, 2020
uspc_currentzip: 619.0 MiB, tsv: 1.2 GiB Current USPC classification data for all patents up to May 2015 22,852,959 raw (from separate classification files) June 10, 2020
usreldoczip: 354.4 MiB, tsv: 1.0 GiB U.S. related documents (post-2005 patents only) 10,619,573 raw June 10, 2020
wipozip: 27.0 MiB, tsv: 153.2 MiB WIPO technology fields for all patents 9,621,182 raw (from separate classification files) June 10, 2020
wipo_fieldzip: 1.5 KiB, tsv: 3.7 KiB500 bytes Lookup table of WIPO technology fields 71 raw (from separate classification files) June 10, 2020

The PatentsView database is sourced from USPTO-provided text and XML data on published patent applications (2001-most recent update) and granted patents (1976-most recent update). The current PatentsView database MySQL dump is available for download, upon request. The patent applications database, which contains all granted and non-granted applications, is also available upon request. After March, 2016, the applications database will not contain the same inventor IDs as the PatentsView database. Only inventors on granted applications can be matched between the PatentsView and applications databases via a granted application ID.

This work was created through a government contract funded by the Office of Chief Economist in the US Patent and Trademark Office. Users are free to use, share, or adapt the material for any purpose, subject to the standards of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/).

Attribution should be given to PatentsView (www.patentsview.org) for use, distribution, or derivative works.

From the PatentsView database, simple assignee and lawyer disambiguations are performed, and the patents are geocoded with a location-based disambiguation. Data are then fed into the inventor disambiguation algorithm in order to identify clusters of inventor names that are determined to be the same individual. Because the disambiguation of inventor identities is an ongoing effort, there are likely to be errors observable in the PatentsView data tables. The team welcomes feedback as we continue to improve our disambiguation methodology.

For more information, visit the Methods and Sources section of the website.