big data - case study collectionCase StudyCollection7AmazingCompaniesThat ReallyGet Big DataBernard Marr1

big data - case study collectionBig Data is a big thing and this casestudy collection will give you a goodoverview of how some companies reallyleverage big data to drive businessperformance. They range from industrygiants like Google, Amazon, Facebook,GE, and Microsoft, to smaller businesseswhich have put big data at the centre oftheir business model, like Kaggle andCornerstone.This case study collection is based onarticles published by Bernard Marr on hisLinkedIn Influencer blog.Brought toyou by thebestsellingauthor of.Copyright 2015 Bernard Marr1

1GoogleBig data and big business go hand in hand – this is the first ina series where I will examine the different uses that the world’sleading corporations are making of the endless amount of digitalinformation the world is producing every day.Google has not only significantly influenced the way we can nowanalyse big data (think MapReduce, BigQuery, etc.) – but they areprobably more responsible than anyone else for making it part of oureveryday lives. I believe that many of the innovative things Google isdoing today, most companies will do in years to come.Many people, particularly those who didn’t get online until thiscentury had started, will have had their first direct experience ofmanipulating big data through Google. Although these days Google’sbig data innovation goes well beyond basic search, it’s still their corebusiness. They process 3.5 billion requests per day, and each requestqueries a database of 20 billion web pages.2

big data - case study collectionThis is refreshed daily, as Google’s bots crawl the web, copying downwhat they see and taking it back to be stored in Google’s indexdatabase. What pushed Google in front of other search engines hasbeen its ability to analyse wider data sets for their search.Initially it was PageRank which included information about sitesthat linked to a particular site in the index, to help take a measureof that site’s importance in the grand scheme of things. Previouslyleading search engines worked almost entirely on the principle ofmatching relevant keywords in the search query to sites containingthose words. PageRank revolutionized search by incorporating otherelements alongside keyword analysis.Their aim has always been to make as much of the world’s informationavailable to as many people as possible (and get rich trying, ofcourse ) and the way Google search works has been constantlyrevised and updated to keep up with this mission.Moving further away from keyword-based search and towardssemantic search is the current aim. This involves analysing not justthe “objects” (words) in the query, but the connection between them,to determine what it means as accurately as possible.To this end, Google throws a whole heap of other informationinto the mix. Starting in 2007 it launched Universal Search,which pulls in data from hundreds of sources including languagedatabases, weather forecasts and historical data, financial data, travelinformation, currency exchange rates, sports statistics and a databaseof mathematical functions.It continued to evolve in 2012 into the Knowledge Graph, which3

big data - case study collectiondisplays information on the subject of the search from a wide rangeof resources directly into the search results.It then mixes what it knows about you from your previous searchhistory (if you are signed in), which can include information aboutyour location, as well as data from your Google profile and Gmailmessages, to come up with its best guess at what you are looking for.The ultimate aim is undoubtedly to build the kind of machinewe have become used to seeing in science fiction for decades – acomputer which you can have a conversation with in your nativetongue, and which will answer you with precisely the informationyou want.Search is by no means all of what Google does, though. After all,it’s free, right? And Google is one of the most profitable businesseson the planet. That profit comes from what it gets in return for itssearches – information about you.Google builds up vast amounts of data about the people using it.Essentially it then matches up companies with potential customers,through its Adsense algorithm. The companies pay handsomelyfor these introductions, which appear as adverts in the customers’browsers.In 2010 it launched BigQuery, its commercial service for allowingcompanies to store and analyse big data sets on its cloud platforms.Companies pay for the storage space and computer time taken inrunning the queries.Another big data project Google is working on is the self-drivingcar. Using and generating massive amounts of data from sensors,4

big data - case study collectioncameras, tracking devices and coupling this with on-board and realtime data analysis from Google Maps, Streetview and other sourcesallows the Google car to safely drive on the roads without any inputfrom a human driver.Perhaps the most astounding use Google have found for theirenormous data though, is predicting the future.In 2008 the company published a paper in the science journal Natureclaiming that their technology had the capability to detect outbreaksof flu with more accuracy than current medical techniques fordetecting the spread of epidemics.The results were controversial – debate continues over the accuracyof the predictions. But the incident unveiled the possibility of “crowdprediction”, which in my opinion is likely to be a reality in the futureas analytics becomes more sophisticated.Google may not quite yet be ready to predict the future – but itsposition as a main player and innovator in the big data space seemslike a safe bet.5

2GEGeneral Electric – a literal powerhouse of a corporation involvedin virtually every area of industry, has been laying the foundationsof what it grandly calls the Industrial Internet for some time now.But what exactly is it? Here’s a basic overview of the ideas which theyare hoping will transform industry, and how it’s all built around bigdata.If you’ve heard about the Internet of Things which I’ve written aboutpreviously click here , a simple way to think of the industrialinternet is as a subset of that, which includes all the data-gathering,communicating and analysis done in industry.In essence, the idea is that all the separate machines and tools whichmake an industry possible will be “smart” – connected, data-enabledand constantly reporting their status to each other in ways as creativeas their engineers and data scientists can devise.6

big data - case study collectionThis will increase efficiency by allowing every aspect of an industrialoperation to be monitored and tweaked for optimal performance,and reduce down-time – machinery will break down less often if weknow exactly the best time to replace a worn part.Data is behind this transformation, specifically the new tools thattechnology is giving us to record and analyse every aspect of amachine’s operation. And GE is certainly not data poor – accordingto Wikipedia, its 2005 tax return extended across 24,000 pages whenprinted out.And pioneering is deeply engrained in its corporate culture – beingestablished by Thomas Edison, as well as being the first privatecompany in the world to own its own computer system, in the 1960s.So of all the industrial giants of the pre-online world, it isn’t surprisingthat they are blazing a trail into the brave new world of big data.GE generates power at its plants which is used to drive themanufacturing that goes on in its factories, and its financial divisionsenable the multi-million transactions involved when they are boughtand sold. With fingers in this many pies, it’s clearly in the position togenerate, analyse and act on a great deal of data.Sensors embedded in their power turbines, jet engines and hospitalscanners will collect the data – it’s estimated that one typical gasturbine will generate 500Gb of data every day. And if that data can beused to improve efficiency by just 1% across five of their key sectorsthat they sell to, those sectors stand to make combined savings of 300 billion.With those kinds of savings within sight, it isn’t surprising that GE7

big data - case study collectionis investing heavily. In 2012 they announced 1 billion was beinginvested over four years in their state-of-the-art analytics centre inSan Ramon, California, in order to attract pioneering data talent tolay the software foundations of the Industrial Internet.In aviation, they are aiming to improve fuel economy, maintenancecosts, reduction in delays and cancellations and optimize flightscheduling – while also improving safety.Abu Dhabi-based Etihad Airways was the first to deploy their TalerisIntelligent Operations technology, developed in partnership withAccenture.Huge amounts of data are recorded from every aircraft and everyaspect of ground operations, which is reported in real-time andtargeted specifically to recovering from disruption, and returning toregular schedule.And last year it launched its Hadoop click here based databasesystem to allow its industrial customers to move its data to the cloud.It claims it has built the first infrastructure which is solid enough tomeet the demands of big industry, and works with its GE Predictivityservice to allow real-time automated analysis. This means machinescan order new parts for themselves and expensive downtimeminimized – GE estimates that its contractors lose an average of 8million per year due to unplanned downtime.Green industries are benefitting too – its 22,000 wind turbines acrossthe globe are rigged with sensors which stream constant data to thecloud, which operators can use to remotely fine-tune the pitch,speed, and direction the blades are facing, to capture as much of theenergy from the wind as possible.8

big data - case study collectionEach turbine will speak to others around it, too – allowing automatedresponses such as adapting their behaviour to mimic more efficientneighbours, and pooling of resources (i.e wind speed monitors) ifthe device on one turbine should fail.Their data gathering extends into homes too – millions are fittedwith their smart meters which record data on power consumption,which is analysed together with weather and even social media datato predict when power cuts or shortages will occur.GE has come further and faster into the world of big data than mostof its old-school tech competitors. It’s clear they believe the financialincentive is there – chairman and CEO Jeff Immelt estimates thatthey could add 10 trillion to 15 trillion to the world’s economyover the next two decades. In industry, where everything includingresources is finite, efficiency is of utmost importance – and GE aredemonstrating with the Industrial Internet that they believe big datais the key to unlocking its potential.9

3CornerstoneEmployees are a both a business’s greatest asset and its greatestexpense. So hitting on the right formula for selecting them, andkeeping them in place, is absolutely essential. One companyoffering unique solutions to help others tackle this challengeis Cornerstone. I will give a brief overview of what they do, andwhy it’s an important – but controversial – example of big dataanalysis driving business growth.Cornerstone is a software tool which helps assess and understandemployees and candidates by crunching half a billion data points oneverything from gas prices, unemployment rates and social mediause.Clients such as Xerox use it to predict, for example, how long anemployee is likely to stay in his or her job, and remarkable insightsgleaned include the fact that in some careers, such as call centrework, employees with criminal records perform better than thosewithout.10

big data - case study collectionIts prowess has made Cornerstone into a huge success, with salesgrowing by 150% from 2012 to 2013 and the software being put touse by 20 of the Fortune 100 companies.The “data points” are measurements taken from employees workingacross 18 industries in 13 different countries, providing informationon everything from how long they take to travel to work, to howoften they speak to their managers. Data collection methods includethe controversial “smart badges” that monitor employee movementsand track which employees interact with each other.Cornerstone has certainly caused positive change in companiesusing it – Bank of America reportedly improved performancemetrics by 23% and decreased stress levels (measured by analysingworker’s voices) by 19%, simply by allowing more staff to take theirbreaks together.And Xerox reduced call centre turnover by 20% by applying analyticsto prospective candidates – finding among other things that creativepeople were more likely to remain with the company for the 6months necessary to recoup the 6,000 cost of their training thaninquisitive people.So far data gathering and analysis has focused mainly on customerfacing members of staff, who in larger organizations will tend to bethose with less responsibility and decision-making power. Couldeven greater benefits be taken by applying the same principles to themovers and shakers in the boardroom, who hold the keys to widerreaching business change? Certainly some companies are starting tothink that way.11

big data - case study collectionThe director of research and strategy at one firm that uses thesoftware – David Lathrop of Steelcase – told the Financial Timesthis year that improving the performance of top executives hasa “disproportionate effect on the company”. Although he did notdisclose precise details of methods or results, much research is beingcarried out in the name of finding exactly what it is that makes highfliers tick. This will inevitably find its way into analytical projects atbig companies which spend millions hiring executives.Crunching employee data at this level plainly has the opportunity tobring huge benefits, but it could also prove disastrous if a companygets it wrong.Failing to take proper consideration of individuals’ rights to privacyin some jurisdictions (eg Europe) can lead to severe legal penalties.In my opinion, any company thinking about carrying out datagathering and analysis for these purposes needs to take great care.In workplaces where morale is low or relationships between workersand managers are not good, it could very easily be seen as a case oftaking snooping too far.Interestingly, Cornerstone’s privacy policy makes it clear thatinformation on applicants is provided to them by their clients,including names, work history and contact details. How many peopleknow that simply by applying for a job with one of these clients, theirpersonal data will be made available for analysis? It appears thatCornerstone absolves itself of responsibility here by declaring itself a“mere data processor” – putting the onus on the client businesses togain permission to distribute their applicants’ and employees’ data.12

big data - case study collectionIt is vitally important that staff are made aware of precisely what datais being gathered from them, and what it is being used for. Everyone(and certainly those running the operation) needs to be aware thatthe purpose is to increase overall company efficiency, rather thanassess or monitor individual members of staff.With more than half of human resources departments reporting anincrease in data analytics since 2010, according to a report by theEconomist Intelligence Unit, it’s obvious that like it or not, it’s hereto stay. Companies that use it well, with respect for their employees’privacy and an understanding of the vital principle mentionedabove, are likely to prosper. Those who don’t – be warned!13

4MicrosoftSince it was founded in 1975 by Bill Gates and Paul Allen,Microsoft has been a key player in just about every majoradvance in the use of computers, at home and in business.Just as it anticipated the rise of the personal computer, the graphicaloperating system and the internet, it wasn’t taken by surprise by thedawn of the big data era. It might not always be the principle sourceof innovation, but it has always excelled at bringing innovation to themasses, and packaging it into a user-friendly product (even thoughmany would argue against this).It has caused controversy along the way, though, and at one timewas called an “abusive monopoly” by the US Department of Justice,over its packaging of Internet Explorer with Windows operatingsystems. And in 2004 it was fined over 600m by the EuropeanUnion following anti-trust action.14

big data - case study collectionThe company’s fortunes have wavered in recent years – notably, theywere slow to come up with a solid plan for capturing a significantshare of the booming mobile market, causing them to lose ground(and brand recognition) to competitors Apple and Google.However it remains a market leader in business and home computeroperating systems, office productivity software, web browsers, gamesconsoles and search – Bing having overtaken Yahoo as the secondmost-used search engine.It is now angling to become a key player in big data, too – offeringa suite of services and tools including data hosting and analyticsservices based on Hadoop to businesses.But Microsoft had a substantial head-start over the competition – infact their first forays into the world of big data started way beforeeven the first version of MS-DOS. Gates and Allen’s first businessventure, two years before Microsoft, a service providing realtime reports for traffic engineers using data from roadside trafficcounters. It’s clear that the founders of what would grow into theworld’s biggest software company knew how important information(specifically, getting the right information to the right people, at theright time) would become in the digital age.Microsoft competed in the search engine wars from the beginning,rebranding its engine along the way from MSN Search, to WindowsLive Search and Live Search before finally arriving at Bing in 2009.Although most of the changes it brought in appeared designed to apethe undisputed champion of search Google (such as incorporatingvarious indexes, public records and relevant paid advertising into itsresults) there are differences. Bing places more importance on howwell-shared information is on social networks when ranking it, aswell as geographical locations associated with the data.15

big data - case study collectionMicrosoft’s Kinect device for the Xbox aims to capture more datathan ever from our own living rooms. It uses an array of sensors tocapture minute movements and is already able to monitor and recordthe heart rate of users, as well as activity levels. Patent applicationssuggest there are plans for much wider use, including monitoringthe behaviour of television viewers, to provide a more interactivewatching experience. The move fits in with Microsoft’s strategy ofrebranding the Xbox – generally thought of as a games console –into an intelligent living room activity hub which monitors, recordsand adapts to users’ behaviour. No, you are not the only person whofinds that idea a little bit scary!In the business-to-business market, where Microsoft made its firstfortunes with its OS and office software, it is now throwing all of itsconsiderable weight into big data-related services for enterprise.Like Google with its Adwords, Bing Ads provides pay-per-clickadvertising services which are targeted at a precise audience segment,identified through data collected about our browsing habits.And like competitors Google and Amazon it offers its own “bigdata in a box” solutions, combining open-source with proprietarysoftware to offer large-scale data analytics operations to businessesof all sizes.Its Analytics Platform System marries Hadoop with its industrystandard SQL Server database management technology, while itsubiquitous Office 365 will soon make data analytics available toan even wider audience, with the inclusion of PowerBI – addingbasic analytics functions to the world’s most widely used officeproductivity software.16

big data - case study collectionIt is also looking to stake its claim on the Internet of Things withAzure Intelligent Systems Service. This is a cloud-based frameworkbuilt to handle streaming information from the growing number ofonline-enabled industrial and domestic devices, from manufacturingmachinery to bathroom scales.It may have missed a trick with mobile – prompting many prematuredeclarations that Microsoft was falling behind the competition – butits keen embrace of data and analytics services show that it is still akey player.When CEO Satya Nadella took up his post at the start of this year heemailed all employees letting them know he expected huge changein the industry, and the wider world, very soon, prompted by “anever-growing network of connected devices, incredible computingcapacity from the cloud, insights from big data and intelligence frommachine learning.”So it’s clear that Microsoft aims to put big data at the heart of itsbusiness activities for the foreseeable future, and provide (relatively)simple software solutions to help the rest of us do the same.17

5KaggleIf you’re looking for a company which seems to embody all theprinciples of big data entrepreneurship under one roof, thenlook no further than Kaggle.Crowd sourcing, predictive modelling, gamification – Kaggle has itall - and has worked out how to turn a profit from them.The San Francisco-based business awards cash prizes to its teams of“citizen scientists” who compete to untangle big data challenges ofall shapes and sizes.And it isn’t just businesses which are benefitting – by applyingthe concept of crowd-sourcing to data analytics, they are helpingto further scientific and medical research. Their projects includelooking deep into the cosmos for traces of dark matter, and furtheringresearch into HIV treatment.18

big data - case study collectionChief scientist at Google (which has itself benefitted from Kaggle’sresearch) and Kaggle investor, Hal Varian, describes it as “a way toorganize the brainpower of the world’s most talented data scientistsand make it accessible to organizations of every size.”And that’s certainly an intriguing aim – as well as a highly profitableone – in a world where businesses of all sizes are beginning to cottonon to the benefits of big data. Even if every company could afford toset up its own data analytics department, there aren’t nearly enoughpeople trained to do the job to go around!As with all emerging sciences, there is a shortage of trained datascientists at the moment – but Kaggle has 150,000 of them, ready tofarm out to the highest bidder.As well as charging companies they work with (including Amazon,Facebook, Microsoft and Wikipedia) up to 300 per hour forconsultancy work, the company organizes competitions – which iswhere the gamification comes in.I’ve written about gamification before – and Kaggle works along thesame lines, with the theory being that it is easier to get people totake part in something if it is presented to them as a challenge orcompetition of some sort.Current challenges include assisting with schizophrenia diagnosisby identifying the condition from MRA neuroimaging data, andfinding the Higgs Boson amidst the mountains of data collected byCERN’s Atlas particle physics experiments.They are open to anybody to take part in, and all the information (aswell as the necessary data sets can be found at Kaggle’s website.19

big data - case study collectionAlthough it is frequently reported that they have “over 100,000 datascientists”, these are actually registered users and competitors ratherthan employees. There are no qualification or experience barriers toregistering as a Kaggle data scientist, previous winners have rangedfrom data science academics and professionals to enthusiastic,knowledgeable amateurs. However certain competitions areoccasionally reserved for “masters” – those who have shown theyhave the right stuff through their previous work with Kaggle.The company also recruit its own staff to work on internal projects.In fact they are advertising for recruits now – and although norequirements are listed, other than that applicants be “experienced”,two questions on the application form ask for the mean and standarddeviation of two sets of numbers.The concept is undoubtedly inspired by earlier pioneering workin crowd-sourcing data analysis, such as the Search For Extraterrestrial Intelligence at Home (SETI @home) project, and acompetition organized by Netflix in 2009 offering 1 million to theperson who came up with a better algorithm for providing movierecommendations.Kaggle has taken those idea and expanded on them, basically – it actsas the middle man, with companies or organizations bringing theirproblems, and Kaggle packaging them into competitions, gatheringthe contestants and sharing out the rewards.The data itself is often simulated – and contestants are challenged tocome up with methods or algorithms which are more efficient thanexisting methods at solving the problem in hand. Using simulateddata means that issues surrounding access to sensitive data can be20

big data - case study collectionsidestepped. Once that is done, the reward – currently up to 30,000,although occasionally much larger for the top projects – is paid.One of its best known success stories was the Heritage Health Prize,which awarded 3 million last year to the winning entrant, whosealgorithm most accurately predicted which patients would be admittedto hospital in the coming 12 months, from a set of medical data.They also offer the Kaggle In Class service – an academic spin-off ofthe main brand which offers free data processing tools and simulatedchallenges. It is intended for use in schools and colleges struggling tomeet the challenges of training the first generations of professionaldata scientists.Of course like anything new it isn’t without its critics. In particular,questions have been asked about how valuable the research it leadsto actually is – often, they say, the biggest challenges in data analysisrevolve around what data is needed, and what questions should beasked. Kaggle’s pre-packaged competitions take this element out ofthe equation. The crowdsourced data scientists might be workingon the solution to a particular problem – but is it the correct one?And might there be more relevant data elsewhere, other than thatsupplied in the competition package?This might be a fundamental limitation to the competition model,until data collection and distribution evolves to the point where itcan be made available to contestants in real-time, and then of coursethere will be serious privacy and data protection issues to hurdle.But as it stands today, Kaggle is one of the more forward-thinkinginnovations in big data, and has done much to raise awareness of thepower that crowd sourcing data analysis can bring to businesses andorganizations of all sizes.21

6FacebookFacebook – it’s the world’s biggest social network by a hugemargin, and most of us are used to using it to share details ofour everyday lives with our friends and families. It’s no secretnow that we’re also sharing it with their advertisers, but thathasn’t put most of us off using it! So here’s a brief rundown ofhow Facebook has been one of the most successful companiesin the world at gathering our data and turning it into profit –and why some think its business practices sometimes overstepthe mark.Recently, Facebook has been causing a stir amongst those interestedin online privacy and data protection. The latest accusations arethat is has been carrying out unethical psychological research –effectively experimenting on its users without their permission.Critics have said that by attempting to alter people’s moods byshowing them specific posts with either a positive or negative vibe,22

big data - case study collectionand then measuring their response, several ethical guidelines havebeen broken.The truth though, is that Facebook (and the internet at large) ismaking its own rules as it goes along. Putting 1.25 billion people –that’s getting on for one fifth of the world’s population, if we pretendfor a second that none of the accounts are duplicates – within amouse click of each other was always going to have far reachingconsequences. And with hindsight it was a bit silly to have everexpected it to be manageable within established social and legalboundaries.Of course those of us who love social media believe the potentialbenefits far outweigh the hazards. Putting aside how much easier itmakes keeping in touch with our friends and family, there’s clearlya lot to be learned from studying the data generated during thatcommunication. And gathering data from us is the foundation ofFacebook’s business model.Don’t forget though - although it now seems to be dipping its toesinto psychological experiments, Facebook’s main motivation forcollecting and analysing our data has always been to sell us adverts.Advertisers benefit from highly detailed profiles users build up overtime as they use the site – meaning their messages can be targetedprecisely at “women over 40 who love books” or “men under 25living in the UK who love football”.The huge and speedy success of Facebook was prompted by itssimple interface and, somewhat ironically given how things havedeveloped, emphasis on user privacy. This helped it quickly becomemore popular than other early social networks such as Myspace and23

big data - case study collectionBebo. But with hindsight, it’s clear to see it was always gunning forbigger targets.A big difference between Google and Facebook is that Google’sinformation on who we are is often a “best guess” based on whatsites we are visiting. From the start, Facebook explicitly asks us whowe are, where we live and what we are interested in. Yes, Googleeventually started

big data - case study collection 1 Big Data is a big thing and this case study collection will give you a good overview of how some companies really leverage big data to drive business performance. They range from industry giants like Google, Amazon, Facebook, GE, and Microsoft, to smaller businesses