README.MD
TL;DR: A list of essential reads for one reason or another. Not necessarily an endorsement.
Broader Audience
-
50 years of Data Science
David Donoho, 2017. [link] [Video: Related Lecture]
Abstract. More than 50 years ago, John Tukey called for a reformation of academic statistics. In “The Future of Data Analysis,” he pointed to the existence of an as-yet unrecognized science, whose subject of interest was learning from data, or “data analysis.” Ten to 20 years ago, John Chambers, Jeff Wu, Bill Cleveland, and Leo Breiman independently once again urged academic statistics to expand its boundaries beyond the classical domain of theoretical statistics; Chambers called for more emphasis on data preparation and presentation rather than statistical modeling; and Breiman called for emphasis on prediction rather than inference. Cleveland and Wu even suggested the catchy name “data science” for this envisioned field. […] The now-contemplated field of data science amounts to a superset of the fields of statistics and machine learning, which adds some technology for “scaling up” to “big data.” This chosen superset is motivated by commercial rather than intellectual developments. Choosing in this way is likely to miss out on the really important intellectual event of the next 50 years. […] Drawing on work by Tukey, Cleveland, Chambers, and Breiman, I present a vision of data science based on the activities of people who are “learning from data,” and I describe an academic field dedicated to improving that activity in an evidence-based manner. This new field is a better academic enlargement of statistics and machine learning than today’s data science initiatives, while being able to accommodate the same short-term goals. -
The decline of unfettered research
Andrew Odlyzko, 1995. [link]
Abstract. We are going through a period of technological change that is unprecedented in extent and speed. The success of corporations and even nations depends more than ever on rapid adoption of new technologies and operating methods. It is widely acknowledged that science made this transformation possible. At the same time, scientific research is under stress, with pressures to change, to turn away from investigation of fundamental scientific problems, and to focus on short-term projects. The aim of this essay is to discuss the reasons for this paradox, and especially for the decline of unfettered research. -
Statistical Modeling: The Two Cultures
Leo Breiman, 2001. [link]
Abstract. There are two cultures in the use of statistical modeling to reach conclusions from data. One assumes that the data are generated by a given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown. The statistical community has been committed to the almost exclusive use of data models. This commitment has led to irrelevant theory, questionable conclusions, and has kept statisticians from working on a large range of interesting current problems. Algorithmic modeling, both in theory and practice, has developed rapidly in fields outside statistics. It can be used both on large complex data sets and as a more accurate and informative alternative to data modeling on smaller data sets. If our goal as a field is to use data to solve problems, then we need to move away from exclusive dependence on data models and adopt a more diverse set of tools. -
NIPS 2017 test-of-time speech
Ali Rahimi and Ben Recht, 2017. [Transcript] [Video: NIPS 2017 presentation] [Addendum]
Abstract. (There is no official abstract. Here are my two cents.) As part of their acceptance speech for the test-of-time award Ali and Ben meditate on the changes in the ML community and the need for rigor. -
Awful AI
David Dao, 2017. [link]
Abstract. Awful AI is a curated list to track current scary usages of AI - hoping to raise awareness to its misuses in society. Artificial intelligence in its current state is unfair, easily susceptible to attacks and notoriously difficult to control. Nevertheless, more and more concerning the uses of AI technology are appearing in the wild. This list aims to track all of them. We hope that Awful AI can be a platform to spur discussion for the development of possible contestational technology (to fight back!).