Big Data: A Revolution That Will Transform How We Live, Work, and Think Author: | Language: English | ISBN:
B00D3MZDDQ | Format: PDF
Big Data: A Revolution That Will Transform How We Live, Work, and Think Description
Oxford professor and author Viktor Mayer-Sch?nberger joins Economist data editor and commentator Kenneth Cukier to deliver insight into the hottest trend in technology. "Big data" makes it possible to instantly analyze and draw conclusions from vast stores of information, enabling revolutionary breakthroughs in business, health, politics, and education. But big data also raises troubling social and privacy concerns sure to be a major talking point in the years ahead.
- Audible Audio Edition
- Listening Length: 8 hours and 36 minutes
- Program Type: Audiobook
- Version: Unabridged
- Publisher: Recorded Books
- Audible.com Release Date: May 30, 2013
- Whispersync for Voice: Ready
- Language: English
- ASIN: B00D3MZDDQ
The precise definition of what constitutes big data does not exist, it is a term used to refer to the capture of enormous amounts of different types of data that often seems to be unrelated. Yet, that imprecise definition is part of the strength of using big data to make better decisions.
In the days when only small samples could be taken for analysis due to the cost, it was critical that everything be done right, the items in the sample must be randomly chosen and care had to be taken to eliminate any extreme outliers that would skew the result. This also meant that the models had to be very well constructed, for if the model was not applicable, the final results could be worthless or even have negative consequences.
The concept of big data basically means that all the data is examined to look for common characteristics. Outliers are included and are of less significance for they will be drowned out by the enormous number of data points in the middle. One of the examples of the use of big data is the prediction of high fevers in infants. Rather than developing a model for the events that would include many assumptions, not all of which are correct, the immediate history of the children that develop high fevers is examined. All of the vital signs and other data collected about the infants are then examined to determine if there are any common indicators that could be used as predictors. The data analysts are not trying to establish causality, only traits present before the events.
Doing this means that only the data matters, emotion and experience are almost insignificant. The authors describe many examples of where big data has been used to predict and prioritize; one of the most interesting examples is the development of translation software.
The book opens by relating how Google, on its own initiative, devised a means to track the spread and intensity of flu prior to the 2009 flu season. Their methodology began by comparing the 50 million most common American search terms with CDC data on the spread of seasonal flu between 2003 and 2008. Google's software found a combination of search terms that, appropriately weighted, strongly correlated with official data. However, unlike the CDC, Google was able to make those assessments in real time, not a week or two later.
Oren Etzioni, frustrated to learn that many passengers booking a flight after he had, were able to pay less - contrary to conventional wisdom. He then 'scraped' information from a travel website from a 41-day period to forecast whether a price was a good deal or not, founding Farecast to offer this new ability. Etzioni next went on to improve the system by digesting data from a travel stie that covered most American commercial routes for a year - nearly 200 billion flight-price records. Before expanding to hotel rooms, concert tickets and used cars, Microsoft snapped up his firm ($110 million) and incorporated it into it Bing.
New processing technologies like open-source Hadoop allow managing far larger quantities of data. Hadoop uses a computational paradigm named MapReduce (by Google) to divide an application into many small fragments, each of which may be executed on any computer node in a cluster. Visa was able to reduce processing time for two years worth of data (73 billion transactions) from 1 month to 13 minutes using Hadoop.
The authors define 'big data' as things that can be done on a large scale that cannot be done on a smaller one, and see it as offering a major transformation.
Big Data: A Revolution That Will Transform How We Live, Work, and Think Preview
Link
Please Wait...