The Agnostic Structure of Data Science Methods - HAL-SHS - Sciences de l'Homme et de la Société Accéder directement au contenu
Article Dans Une Revue Lato Sensu, revue de la Société de philosophie des sciences Année : 2021

The Agnostic Structure of Data Science Methods

Résumé

In this paper we argue that data science is a coherent approach to empirical problems that, in its most general form, does not build understanding about phenomena. We start by exploring the broad structure of mathematization methods in data science, organized around the belief that if enough and sufficiently diverse data are collected regarding a certain phenomenon, it is possible to answer all relevant questions about it. We call this belief ‘the microarray paradigm’ and the approach to empirical phenomena based on it `agnostic science’. Not all computational methods dealing with large data sets are properly within the domain of agnostic science, and we give an example of an algorithm, PageRank, that relies on large data processing, but such that the significance of its output are readily intelligible. Within the new type of mathematization at work in agnostic science, mathematical methods are not selected because of any particular relevance for a problem at hand. Rather, mathematical methods are applied to a specific problem only on the basis of their ability to reorganize the data for further analysis and the intrinsic richness of their mathematical structure. We refer to this type of mathematization as `forcing’. We then show that optimization methods are used in data science by forcing them on problems. This is particularly significant since virtually all methods of data science can be reinterpreted as types of optimization methods. In particular, we argue that deep learning neural networks are best understood within the context of forcing optimality. We finally explore the broader question of the appropriateness of data science methods in solving problems. We argue that this question should not be interpreted as a search for a correspondence between phenomena and specific solutions found by data science methods. Rather, it is the internal structure of data science methods that is open to forms of understanding. As an example, we offer an analysis of ensemble methods, where distinct data science methods are combined in the search for the solution of a problem and we speculate on the general structure of the data sets that are most appropriate for such methods.
Fichier principal
Vignette du fichier
TheAgnosticStructure.pdf (311.84 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

halshs-03122389 , version 1 (27-01-2021)

Identifiants

  • HAL Id : halshs-03122389 , version 1

Citer

Domenico Napoletani, Marco Panza, Daniele Struppa. The Agnostic Structure of Data Science Methods. Lato Sensu, revue de la Société de philosophie des sciences, In press. ⟨halshs-03122389⟩
114 Consultations
161 Téléchargements

Partager

Gmail Facebook X LinkedIn More