Classifying patents based on their semantic content

In this paper, we extend some usual techniques of classification resulting from a large-scale data-mining and network approach. This new technology, which in particular is designed to be suitable to big data, is used to construct an open consolidated database from raw data on 4 million patents taken from the US patent office from 1976 onward. To build the pattern network , not only do we look at each patent title, but we also examine their full abstract and extract the relevant keywords accordingly. We refer to this classification as semantic approach in contrast with the more common technological approach which consists in taking the topology when considering US Patent office technological classes. Moreover, we document that both approaches have highly different topological measures and strong statistical evidence that they feature a different model. This suggests that our method is a useful tool to extract endogenous information.

Domaines

Géographie

Liste complète des métadonnées

Format du dépôt	Fichier
Type de dépôt	Article dans une revue
Titre	en Classifying patents based on their semantic content
Résumé	en In this paper, we extend some usual techniques of classification resulting from a large-scale data-mining and network approach. This new technology, which in particular is designed to be suitable to big data, is used to construct an open consolidated database from raw data on 4 million patents taken from the US patent office from 1976 onward. To build the pattern network , not only do we look at each patent title, but we also examine their full abstract and extract the relevant keywords accordingly. We refer to this classification as semantic approach in contrast with the more common technological approach which consists in taking the topology when considering US Patent office technological classes. Moreover, we document that both approaches have highly different topological measures and strong statistical evidence that they feature a different model. This suggests that our method is a useful tool to extract endogenous information.
Auteur(s)	Antonin Bergeaud ^{1, 2} , Yoann Potiron ³ , Juste Raimbault ⁴ 1 PSE - Paris School of Economics ( 301309 ) - 48 boulevard Jourdan 75014 Paris - France Université Paris 1 Panthéon-Sorbonne ( 7550 ) ; École normale supérieure - Paris ( 59704 ) ; Université Paris Sciences et Lettres ( 564132 ) ; École des hautes études en sciences sociales ( 99539 ) ; École des Ponts ParisTech ( 301545 ) ; Centre National de la Recherche Scientifique ( 441569 ) ; Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement ( 577435 ) 2 PJSE - Paris Jourdan Sciences Economiques ( 578027 ) - 48 boulevard Jourdan 75014 Paris - France Université Paris 1 Panthéon-Sorbonne UMR8545 ( 7550 ) ; École normale supérieure - Paris ( 59704 ) ; Université Paris Sciences et Lettres ( 564132 ) ; École des hautes études en sciences sociales ( 99539 ) ; École des Ponts ParisTech ( 301545 ) ; Centre National de la Recherche Scientifique ( 441569 ) ; Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement UMR1393 ( 577435 ) 3 Keio University [Tokyo] ( 232340 ) - Keiō Gijuku Daigaku, Tokyo - Japon 4 LVMT - Laboratoire Ville, Mobilité, Transport ( 222115 ) - 6 et 8 avenue Blaise Pascal - Cité Descartes, Champs sur Marne - F-77447 Marne la Vallée Cedex 2 - France Institut Français des Sciences et Technologies des Transports, de l'Aménagement et des Réseaux UMRT9403 ( 222110 ) ; Université Paris-Est Marne-la-Vallée ( 301243 ) ; École des Ponts ParisTech ( 301545 )
Licence	Paternité
Public visé	Scientifique
Page/Identifiant	e0176310
Langue du document	Anglais
Nom de la revue	PLoS ONE (ISSN : 1932-6203, ISSN électronique : 1932-6203) Public Library of Science Publié par Public Library of Science http://www.plosone.org/
Vulgarisation	Non
Comité de lecture	Oui
Audience	Internationale
Date de publication	2017-04-26
Volume	12
Numéro	4
Domaine(s)	Sciences de l'Homme et Société/Géographie
DOI	10.1371/journal.pone.0176310
Pubmed Id	28445550
UT key WOS	000400309200044

Fichier principal

journal.pone.0176310.pdf ( 10.94 Mo )

Origine : Fichiers éditeurs autorisés sur une archive ouverte
Licence :

Paternité - CC BY 4.0

Juste Raimbault : Connectez-vous pour contacter le contributeur

https://shs.hal.science/halshs-01788574

Soumis le : mercredi 9 mai 2018 à 11:08:41

Dernière modification le : vendredi 19 avril 2024 à 16:18:58

Archivage à long terme le : lundi 24 septembre 2018 à 10:37:42

Dates et versions

halshs-01788574, version 1 (09-05-2018)

Licence

Paternité - CC BY 4.0

Identifiants

HAL Id : halshs-01788574 , version 1
DOI : 10.1371/journal.pone.0176310
PUBMED : 28445550
WOS : 000400309200044

Citer

Antonin Bergeaud, Yoann Potiron, Juste Raimbault. Classifying patents based on their semantic content. PLoS ONE, 2017, 12 (4), pp.e0176310. ⟨10.1371/journal.pone.0176310⟩. ⟨halshs-01788574⟩

Exporter

BibTeX TEI Dublin Core DC Terms EndNote Datacite

Collections

UNIV-PARIS1 ENS-PARIS ENPC PJSE PSE CNRS EHESS PARISTECH ENPC-LVMT IFSTTAR PSL INRAE PSE-POST-PRINT UNIV-EIFFEL JSE2024

247 Consultations

210 Téléchargements

Dernière date de mise à jour le 07/04/2024