Press seeks contributions to the ‘Very Short History of Data Science’
What’s The Big Data?, a blog that charts “the evolving IT landscape”, has appealed for contributions to its attempt to describe the origin and evolution of data science as a discipline and a profession.
Its post entitled A Very Short History of Data Science reports various milestones that mark the evolution of the term ‘data science’, and attempts to define the term and related developments.
The blogger behind What’s The Big Data? is Gil Press, a marketing and research consultant who worked previously at DEC and EMC. Press recognises how data science draws upon numerous longer-established disciplines, straddling and blurring the lines between them. This has spawned successive attempts to define the term concisely.
The earliest reference to ‘data science’ listed by Press is the 1974 book Concise Survey of Computer Methods in Sweden and the United States by Danish computer scientist Peter Naur. Data science, suggested Naur, is “the science of dealing with data, once they have been established, while the relation of the data to what they represent is delegated to other fields and sciences.”
Three years later the International Association for Statistical Computing (IASC) was founded as a section of the International Statistical Institute (ISI). IASC’s mission is “to link traditional statistical methodology, modern computer technology, and the knowledge of domain experts in order to convert data into information and knowledge.” By 1996 the International Federation of Classification Societies used the term ‘data science’ in the title of its conference Data science, classification, and related methods.
In 2001 Bell Labs’ William S. Cleveland published Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics (pdf format, opens in new window) whose aim was “to enlarge the major areas of technical work of the field of statistics. Because the plan is ambitious and implies substantial change, the altered field will be called ‘data science.’”
The following years saw the launch of the Data Science Journal and the Journal of Data Science; the latter explained that “By ‘Data Science’ we mean almost everything that has something to do with data: Collecting, analyzing, modeling … yet the most important part is its applications — all sorts of applications.”
By 2005 the focus on professionalisation was increasing. In its report Long-lived Digital Data Collections: Enabling Research and Education in the 21st Century the US National Science Board recommended that “working in partnership with collection managers and the community at large”, it “should act to develop and mature the career path for data scientists and to ensure that the research enterprise includes a sufficient number of high-quality data scientists.”
This theme was picked up three years later in the UK by the JISC report Skills, Role & Career Structure of Data Scientists & Curators: Assessment of Current Practice & Future Needs, which made recommendations “on the role and career development of data scientists … and the associated supply of specialist data curation skills to the research community”. The JISC reported that data scientists “may be involved in creative enquiry and analysis, enabling others to work with digital data, and developments in database technology.”
In 2009 the Interagency Working Group on Digital Data to the Committee on Science of the National Science and Technology Council published Harnessing the Power of Digital Data for Science and Society (pdf format, opens in new window) , which recognised that data scientists “are key to the current and future success of the scientific enterprise. However, these individuals often receive little recognition for their contributions and have limited career paths”. It reported that “Critical challenges in achieving our strategic vision include providing an effective pipeline of data professionals to ensure that the needs and opportunities of the future can be met and providing these professionals with appropriate rewards and recognition.”
The same year, Flowing Data’s Nathan Yau commented that “computational information design edges closer to reality. We’re seeing data scientists – people who can do it all – emerge from the rest of the pack.” And those people began to congregate at the Data Scientist group on LinkedIn.
Mainstream recognition that the data scientist is “a new kind of professional … who combines the skills of software programmer, statistician and storyteller/artist to extract the nuggets of gold hidden under mountains of data” came in the February 2010 Economist interview with Kenneth Cukier, Data, data everywhere. Three months later Mike Loukides added entrepreneurship to the data scientist’s skillset in What is Data Science?.
In September that year Drew Conway attempted to “ simplify the discussion” of what skills data scientists bring together with his Data Science Venn Diagram. But not everyone accepted this neat approach. Pete Warden posted a blog outlining Why the term ‘data science’ is flawed but useful which observed “that the recent abundance of data has sparked something new in the world, and when I look around I see people with shared characteristics who don’t fit into traditional categories … They also seem to start by looking at what the data can tell them, and then picking interesting threads to follow, rather than the traditional scientist’s approach of choosing the problem first and then finding data to shed light on it.”
A few months later Harlan Harris commented that “Data Science is defined by its practitioners, that it’s a career path rather than a category of activities … it seems that people who consider themselves Data Scientists typically have eclectic career paths, that might in some ways seem not to make much sense.”
DJ Patil, Press’s most recent listee, contrasts the role of research scientist at tech companies with that of the data scientist. According to Patil “It might take years for lab research to affect key products, if it ever did. Instead, the focus of our teams was to work on data applications that would have an immediate and massive impact on the business. The term that seemed to fit best was data scientist: those who use both data and science to create something new.”
Leave a Reply
You must be logged in to post a comment.