Notice - Changes to bulk download files

The PatentsView Team has begun a long-term effort of reparsing the patents long text data to address formatting errors, missing data and to retain line breaks. In order to be consistent, we have updated the format of the bulk download files. The fields are still tab separated ("\t"), but now all non-numeric fields are enclosed with double quotes.

The double quotes occurring within the text fields are escaped according to RFC-4180 (another double quote). See detailed list of changes here: Release Notes

Additionally, we are releasing a set of Python and R scripts to assist with reading in the bulk download files. These files are posted on GitHub: https://github.com/CSSIP-AIR/PatentsView-Code-Snippets/tree/master/03_bulk_download_read_in

Please contact the PatentsView Team at contact@patentsview.org if you have any questions or suggestions.

Data Download Tables

Table Name Description # of Rows Origin Data Last Updated
inventor_genderzip: 17.8 MiB, tsv: 90.3 MiB Gender assignment of disambiguated inventor. Methods Report 3,623,158 processed August 27, 2020
applicationzip: 89.0 MiB, tsv: 397.3 MiB Information on the applications for granted patent 7,430,874 raw August 27, 2020
assigneezip: 28.3 MiB, tsv: 62.7 MiB Disambiguated assignee data 1,013,119 disamb August 27, 2020
botaniczip: 577.5 KiB, tsv: 1.1 MiB Botanic information for plant patents 16,465 raw August 27, 2020
brf_sum_textzip: 0.0 B, tsv: 55.4 GiB Brief summary text 6,567,133 raw August 27, 2020
claimzip: 12.3 GiB, tsv: 39.2 GiB Full text of patent claims, including dependency and sequence 101,535,737 raw
cpc_currentzip: 720.9 MiB, tsv: 3.6 GiB Current CPC classification data for all patents (applied retrospectively to all patents) 40,945,066 raw (from separate classification files) September 05, 2020
cpc_groupzip: 21.5 KiB, tsv: 67.8 KiB Lookup table of current CPC groups 673 raw (from separate classification files) August 27, 2020
cpc_subgroupzip: 5.4 MiB, tsv: 60.9 MiB Lookup table of current CPC subgroups 260,388 raw (from separate classification files) August 27, 2020
cpc_subsectionzip: 3.2 KiB, tsv: 7.9 KiB Lookup table of current CPC subsections 137 raw (from separate classification files) August 27, 2020
detail_desc_text39.40 GB Detailed patent description text 6,260,847 raw
draw_desc_textzip: 4.7 GiB, tsv: 12.3 GiB Drawing description text 78,383,486 raw August 27, 2020
foreign_priorityzip: 119.0 MiB, tsv: 284.2 MiB Foreign priority data 3,461,029 raw August 27, 2020
figureszip: 158.9 MiB, tsv: 281.6 MiB Number of figures and sheets 6,906,688 raw August 27, 2020
foreigncitationzip: 994.3 MiB, tsv: 2.5 GiB Citations made to foreign patents by US patents 29,719,902 raw August 27, 2020
government_interestzip: 4.7 MiB, tsv: 33.2 MiB Raw government interest statements on all patents (where available) 150,114 raw August 27, 2020
government_organizationzip: 5.7 KiB, tsv: 33.2 KiB Organization names and related agency hierarchy parsed from the government interest statements on all patents (where available) 290 processed August 27, 2020
inventorzip: 45.5 MiB, tsv: 123.3 MiB Disambiguated inventor data 3,977,357 disamb August 27, 2020
ipcrzip: 555.1 MiB, tsv: 1.6 GiB International Patent Classification data for all patents (as of publication date) 16,877,462 raw August 27, 2020
lawyerzip: 5.6 MiB, tsv: 12.0 MiB Disambiguated lawyer data 173,849 disamb August 27, 2020
locationzip: 5.9 MiB, tsv: 12.1 MiB Disambiguated location data, including latitude and longitude 144,270 disamb August 27, 2020
location_assigneezip: 23.0 MiB, tsv: 84.0 MiB Metadata table for many-to-many relationships 1,334,556 disamb (linking table) August 27, 2020
location_inventorzip: 30.0 MiB, tsv: 271.7 MiB Metadata table for many-to-many relationships 5,579,126 disamb (linking table) August 27, 2020
mainclasszip: 2.4 KiB, tsv: 7.1 KiB Lookup table of original USPC main classes (as of patent publication date) 1,239 raw August 27, 2020
mainclass_currentzip: 7.5 KiB, tsv: 21.5 KiB Lookup table of current USPC main technology classes (applied retrospectively to all patents) 511 raw (from separate classification files) August 27, 2020
nberzip: 115.3 MiB, tsv: 228.9 MiB NBER classification data for all patents up to May 2015 5,105,938 raw (from separate classification files) August 27, 2020
nber_categoryzip: 208.0 B, tsv: 92.0 B Lookup table for NBER categories 7 raw (from separate classification files) August 27, 2020
nber_subcategoryzip: 611.0 B, tsv: 906.0 B Lookup table for NBER subcategories 38 raw (from separate classification files) August 27, 2020
non_inventor_applicantzip: 223.5 MiB, tsv: 475.4 MiB Non-inventor applicant information 4,234,889 raw August 27, 2020
otherreferencezip: 3.5 GiB, tsv: 7.2 GiB Non-patent citations mentioned in patents (e.g. articles, papers, etc.) 42,597,887 raw August 27, 2020
patentzip: 1.4 GiB, tsv: 5.5 GiB Data on granted patents 7,430,874 raw August 27, 2020
patent_assigneezip: 204.6 MiB, tsv: 492.4 MiB Metadata table for many-to-many relationships 6,789,245 disamb (linking table) August 27, 2020
patent_contractawardnumberzip: 1.4 MiB, tsv: 4.4 MiB Contract or award numbers parsed from the government interest statements on all patents (where available) 180,675 processed August 27, 2020
patent_govintorgzip: 606.3 KiB, tsv: 2.3 MiB Metadata table with patent-to-organization relationships linked to the government_organization table 181,443 processed August 27, 2020
patent_inventorzip: 440.0 MiB, tsv: 1.0 GiB Metadata table for many-to-many relationships 17,991,899 disamb (linking table) August 27, 2020
patent_lawyerzip: 115.9 MiB, tsv: 362.5 MiB Metadata table for many-to-many relationships 8,430,261 disamb (linking table) August 27, 2020
pct_datazip: 52.3 MiB, tsv: 148.3 MiB PCT data 1,490,120 raw August 27, 2020
persistent_assignee_disambigzip: 734.4 MiB, tsv: 1.3 GiB Persistant Assignee Disambiguation 6,789,245 raw August 27, 2020
persistent_inventor_disambigzip: 527.9 MiB, tsv: 2.5 GiB Persistant Inventor Disambiguation 17,991,899 raw August 27, 2020
rawassigneezip: 455.0 MiB, tsv: 867.8 MiB Raw assignee information as it appears in the source text and XML files 6,789,245 raw August 27, 2020
rawexaminerzip: 331.1 MiB, tsv: 712.5 MiB Raw examiner information 10,091,698 raw August 27, 2020
rawinventorzip: 1008.4 MiB, tsv: 1.9 GiB Raw inventor information as it appears in the source text and XML files 17,991,899 raw August 27, 2020
rawlawyerzip: 440.3 MiB, tsv: 889.6 MiB Raw lawyer information as it appears in the source text and XML files 8,430,276 raw August 27, 2020
rawlocationzip: 1.3 GiB, tsv: 2.9 GiB Raw location data for inventors and assignees, as it appears in xml and text source files 29,032,922 raw August 27, 2020
rel_app_textzip: 199.0 MiB, tsv: 845.3 MiB Related applications text 1,905,356 raw August 27, 2020
subclasszip: 599.3 KiB, tsv: 2.6 MiB Lookup table of original USPC subclasses (as of patent publication date) 272,503 raw August 27, 2020
subclass_currentzip: 2.1 MiB, tsv: 7.3 MiB Lookup table of current USPC subclasses (applied retrospectively to all patents) 168,049 raw (from separate classification files) August 27, 2020
us_term_of_grantzip: 87.4 MiB, tsv: 205.9 MiB U.S. term of grant data 3,614,702 raw August 27, 2020
usapplicationcitationzip: 1.6 GiB, tsv: 4.0 GiB Citations made to US patent applications by US patents 42,268,759 raw August 27, 2020
uspatentcitationzip: 4.1 GiB, tsv: 10.6 GiB Citations made to US granted patents by US patents 111,026,440 raw August 27, 2020
uspczip: 490.4 MiB, tsv: 963.4 MiB USPC classification data for all patents 18,056,787 raw August 27, 2020
uspc_currentzip: 619.0 MiB, tsv: 1.2 GiB Current USPC classification data for all patents up to May 2015 22,852,959 raw (from separate classification files) August 27, 2020
usreldoczip: 364.0 MiB, tsv: 1.1 GiB U.S. related documents (post-2005 patents only) 10,906,090 raw August 27, 2020
wipozip: 25.6 MiB, tsv: 145.4 MiB WIPO technology fields for all patents 9,122,913 raw (from separate classification files) September 05, 2020
wipo_fieldzip: 1.5 KiB, tsv: 3.7 KiB500 bytes Lookup table of WIPO technology fields 71 raw (from separate classification files) August 27, 2020

The PatentsView database is sourced from USPTO-provided text and XML data on published patent applications (2001-most recent update) and granted patents (1976-most recent update). The current PatentsView database MySQL dump is available for download, upon request. The patent applications database, currently only in beta format, contains all granted and non-granted applications, is also available upon request. The database currently does not contain all years of data or any of the disambiguated elements.

This work was created through a government contract funded by the Office of Chief Economist in the US Patent and Trademark Office. Users are free to use, share, or adapt the material for any purpose, subject to the standards of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/).

Attribution should be given to PatentsView (www.patentsview.org) for use, distribution, or derivative works.

From the PatentsView database, simple assignee and lawyer disambiguations are performed, and the patents are geocoded with a location-based disambiguation. Data are then fed into the inventor disambiguation algorithm in order to identify clusters of inventor names that are determined to be the same individual. Because the disambiguation of inventor identities is an ongoing effort, there are likely to be errors observable in the PatentsView data tables. The team welcomes feedback as we continue to improve our disambiguation methodology.

For more information, visit the Methods and Sources section of the website.