Transcription

FROM BIG DATA TO BETTER DECISIONSThe ultimate guide to business intelligence today.

Introduction: What You’re Going to Learn.3Ch. 1: A Flood of Data, and How BI Addresses It.5Ch. 2: The Business Intelligence Market.7Ch. 3: The BI Process Step 1 - Ingestion.11Ch. 4: The BI Process Step 2 - Analysis.15Ch. 5: The BI Process Step 3 - Delivery.20Ch. 6: The Benefits of BI.23Ch. 7: The Challenges of BI.27Ch. 8: The Future of BI.30

THE BI GUIDE: WHAT YOU’RE GOING TO LEARN.From sales opportunities to supply chain logistics, and from accounting software to social media stats,your organization is bursting at the seams with data. Business intelligence (BI) is the combination oftools, processes, and skills that help turn that vast amount of data into digestible information.With information coming from every part of your organization, everyone needs better access todata to do their job well. Chances are, you need it, too. It’s why you’re reading this guide.Browse our list of chapters to go straight to information specific to your needs, or feel free to readit cover to cover for a holistic view at how you can use BI to shape your work.BI HAS OUTGROWNSPREADSHEETS ANDDATA WAREHOUSES.Before business intelligence was “businessintelligence,” it was nothing but numbers writtenon spreadsheets (the actual paper variety). But astechnology grew, little changed with how businessleaders consumed information—they moved frompaper to digital, and when the volume got to bebig enough, they moved the data from desktopspreadsheets to a massive table known as adatabase. In the end, the results were the same: staticinformation presented in a document, maybe with afew graphs thrown in for good measure.But you don’t need new ways to replicate antiquatedbusiness practices. If you can get real-time updateson obscure college friends’ lives through social media,then you should be able to access information fromyour business anytime, anywhere.Now it’s time to learn how to access the right data atthe right time. Keep reading—you’re in the right place.Five reasons toread this guide.You see BI as importantand want more info aspart of your professionaldevelopment.You’d like to pursue acareer in BI.You’ve got the gist of BIand want to brush up onsome details.You want to know whatthe BI team in yourorganization really dealswith day to day.You’ve heard so manybuzzwords—“big data,” “data science,” “businessanalytics,” “predictive analytics,” “BI,” etc.—andwant to know what all thefuss is about.

WHY YOU SHOULD READ THIS GUIDE.Data is on everybody’s minds—from executives pushing their teams to take advantage of all thedata the business collects, to consumers worrying about sharing too much of their personal lives.This guide cuts through the buzzwords and the technical jargon to give you an overview of businessintelligence—the tools, processes and skills that help us harness the data explosion to make betterand faster decisions. A state-of-the-art BI environment ensures the shortest and most reliable pathfrom data to decisions that make your business more successful.

CHAPTER ONEA flood of data, and how BI addresses it.A FLOOD OF DATA.We are living in a data deluge. The amount of new data created annually will grow ten-fold between2013 and 2020, according to IDC, from 4.4 trillion gigabytes to 44 trillion gigabytes.If you can swim in this flood of data, you win. According to MIT researchers, companies that excel indata-driven decision-making are 5% more productive and 6% more profitable than their competitors,on average. A study by IDC found that users of big data and analytics that use diverse data sources,diverse analytical tools, and diverse metrics were five times more likely to exceed expectations fortheir projects than those who don’t.DATA WITH NO ANALYSIS HAS NO VALUE.Navigating the flood of data is much easier said than done. IDC predicts thatcompanies will continue to waste 80% of customer data they have collected. Morebroadly, IDC estimates that in 2013 only 22% of all data in the world was useful (i.e.,could be analyzed) and less than 5% of that was actually analyzed. A University of Texas at Austin study put these general estimates in a businesscontext: it found that for the median Fortune 1000 company, a 10% increase in theusability of its data translates to an increase of 2.01 billion in annual revenues anda 10% increase in remote accessibility to data translates into an additional 65.67million in net income per year.

2 BILLION ON THE LINE, BUT NOTHING NEW FROM BI.With 2 billion on the line, CIOs have reported in Gartner’s surveys that business intelligence hasbeen a top priority for the last nine years. CEOs are also getting on the bandwagon, demandingmore and more access to more and more data.While the need for timely, accurate, and accessible business intelligence is greater than ever, theuse of business intelligence tools has plateaued at about 20%-25% of business users in a typicalorganization over the past few years.“As Gartner recently observed, “despite the strong interest in BI and analytics, confusion aroundbig data is inhibiting spending on BI and analytics software.”The frustration is widespread, according to surveys conducted by businessintelligence.com: Only 25% of CEOs say their reports contain the information they need and want. 44% of executives say that too many of their critical decisions were based on incomplete orinaccurate data. 75% of vice presidents surveyed said that they were dissatisfied with their access to the data theyneed, and 69% were not happy with the speed of information delivery.These data management challenges are compounded by bloated solutions, complex deployments,and overly complicated user interfaces. The emergence of new tools and technologies forharnessing the data deluge, aimed at solving these issues, may actually slow down the adoptionand widespread use of business intelligence.Want to learn more? Read the executive brief, “The big BI disappointment: Troubling gapsbetween BI expectations and reality.”

WHAT BUSINESS LEADERS NEED FROM BI.Today’s leaders no longer make decisions based primarily on intuition. Instead, making decisionstoday is a team sport, involving all the relevant people in the organization, and taking advantage ofnew technologies to collect and analyze all the relevant data. In this all-hands-on-data environment,decision makers expect: Data from all relevant sources in one place. Real-time data, not having to wait for an analyst todeliver it or wait for IT to respond. Data that is accessible anytime and anywhere, onany mobile device. Data that represents one version of the truth Self-service data and analysis, reducing thereliance on experts. Data and analytics that help predict what’s coming.Today’s business intelligence is embedded in alllevels of the organization, allowing anyone thatneeds to make a decision—operational, tactical,or strategic decision—to make it based on thebest data available. Business Intelligence is thecombination of tools, processes and skills thathelp us turn the data deluge into better and fasterdecisions.Want to learn more? Read the analystreport, “7 Steps to Making Big DataAccessible to Executives.”

CHAPTER TWOThe business intelligence market.HOW BIG IS THE BI MARKET?Gartner estimates that the worldwide business intelligence and analytics market was 14.4 billion in2013, growing at 8% annually. Assessing the larger market for business analytics, IDC estimates ithad reached 104.1 billion in 2013, at a growth rate of 10.8%. The big data segment of this market was 12.6 billion in 2013, with a growth rate of 27%.The business intelligence market is dominated by a few large players—SAP, Oracle, IBM, SAS,Microsoft, Teradata—accounting for about 70% of worldwide revenues. The balance of the market isaccounted for by hundreds of small players, including numerous new startups, most of them focusedon one or two segments of the market. Established business intelligence-focused companies includeActuate, Information Builders, Panorama, MicroStrategy, QlikTech, Tableau Software, and TibcoSoftware. New startups include Alteryx, Birst, Domo, Good Data, and SiSense.HOW IS THE BI MARKET CHANGING?In older tools—and even in most current solutions—BI tells you what happened in a specific segmentof your business. With how quickly business is moving today, that kind of BI is as problematic asdriving down the freeway by looking only in your rear-view mirror.With new technology and new expectations, BI is moving toward a more predictive model that showsyou what will happen. New BI systems are now beginning to show how all the various parts of yourorganization work together to produce an outcome, and business leaders can finally see the bigpicture and make faster, better-informed decisions.

This transformation started over a decade ago as more and more firms started to compete on thebasis of statistical analysis and data management prowess. It’s what drove today’s online giants likeNetflix, Google, and Amazon—each with a reputation for mastering data, measurement, testing, andanalysis—to be what seem like unstoppable forces.In response to these giants’ success—who barely existed 20 years ago—many establishedcompanies now invest in statisticians and operations research personnel, build business analyticsdepartments, weave modeling, prediction, and forecasting into their processes, and acquire newhardware and software tools to support these activities.252015105020132014Traditional BIC2015ICloud BI2016Mobile BI20172018Social BIGlobal intelligence market size, by technologies, 2013–2018 ( billion)Sources: Gartner, Redwood CapitalHOW ARE ORGANIZATIONS MEETING THE DEMAND?More recently, another new layer of the business intelligence market has emerged and becomeknown by the somewhat misleading name of big data. Again the main culprits were online firmssuch as Google, Yahoo, and LinkedIn but this time the new layer of the market was created aroundthe new technologies (e.g., Hadoop), and the new roles (e.g., data scientists) that were invented bythese companies to support data-driven decision-making and turn their data into revenue streams.Now, every organization has to reconcile itself to the rapid growth of available data, the competitivepressures to excel in data mining and analysis, and the increasing need to bring these capabilities toall levels of the organization.

But no economy had enough trained talent—data scientists, analysts, systems managers, etc.—tomeet the sudden demand, prompting a burst of new technologies meant to fill the void. Thus,investment in BI tools and technologies is primarily driven today by the trend towards wider adoptionof BI, giving end-users easy-to-use tools for accessing, viewing, analyzing and manipulating data.This “democratization of business intelligence” or “self-service BI” is accompanied by growinginvestments in embedding BI capabilities in various business processes and applications. Thesenew applications, leveraging new data types and new types of analysis, are increasingly installed onmobile devices, drawing on data that resides in the cloud, supporting users anywhere, anytime.““Major changes are imminent to the world of BI and analytics including the dominance ofdata discovery techniques, wider use of real-time streaming event data and the eventualacceleration in BI and analytics spending when big data finally matures.”—GartnerIn developing their business intelligence capabilities, organizations have always had the option tobuy outside services to supplement their own in-house activities. They could buy specialized skills,consulting, or even specific data from data aggregators. This segment of the market, now called“data- as-a-service,” has recently grown rapidly with the emergence of new players providing dataservices with embedded BI and analytic capabilities. To alleviate the analytics and data science talentshortage, some vendors focus on providing the required skills on a project-by-project basis.WHAT ARE THE KEY COMPONENTS OF THE BI MARKET?The BI market is typically segmented according to product functionality such as “query andreporting,” “online analytical processing (OLAP),” and “dashboards.” It is easier, however, tounderstand the BI market if we look at the process of business intelligence or the steps requiredto get from data to decisions. In a nutshell, the process of business intelligence has three steps:Ingestion, Analysis, and Delivery.

““Experts often possess more data than judgment.” —Colin PowellIt starts with the ingestion of data—identifying the right datasources and preparing the data for analysis; continuing throughthe analysis stage, including processing the data and applyinganalytical models to the data; and concluding with delivery—presenting the results of the analysis in easy-to-consumemanner and at the most convenient point of consumption forthe user.These three steps will be covered in detail in the followingthree chapters.See the infographic: “The WorldNeeds Data Scientists.”

CHAPTER THREE: THE BI PROCESSStep 1 - IngestionINGESTION:The process of business intelligence starts with identifying the data sources and the type ofdata that can support specific decisions and business objectives. Once you have the data, youneed to make sure it is ready for processing and analysis.HOW IS THE BI MARKET CHANGING?Before an organization can take in data, business leaders need to understand where it’s comingfrom, what format it’s in, and how to turn raw data into something useful.Here are some of the basics:The data for business intelligence comes from a variety of sources, internal and external to yourcompany. Internal sources include engineering and manufacturing processes, Enterprise ResourcePlanning (ERP) systems, sales force automation and customer relationship management (CRM)software, and financial and accounting activities. External sources include supply-chain and logisticssystems, business and distribution partners, social networks, websites, location/GPS systems, mobileand stationary sensors, and click streams. There are also many “open data” sources on the Web thatmake data collected by government agencies, non-profits and b usinesses available at no charge.DEALING WITH DATA STRUCTURE.The data coming from these disparate sources is in many types and formats, including rows andcolumns in traditional databases, images, text documents, video, PowerPoint and HTML files, email

messages, sensor data, web-based transactions, and IT systems logs. These data types are usuallyclassified into three broad categories: Structured, semi-structured, and unstructured data:Structured Data(e.g., the numbers in a customer invoice) can be easilyordered in the rows and columns of a traditional databasetable (e.g., customer account number, invoiced amount) orsome other type of database with a defined structure. head title Q3 Repo meta http-equ link rel ’’’shor script type ’’p /head Semi-Structured Data(e.g., HTML or email files) conforms to a partial structure ora standard format and contains specific markers that give itsome type of organization.Unstructured Data(e.g., an image) is not organized in any pre-defined manner.THE STRUCTURING OF DATA: A HISTORY.“Structured” and “unstructured” are somewhat misleading terms. All forms of human communicationshave some structure (e.g., language), and machine-generated data typically has a structure becauseit is designed to have one. What we have is a continuum that extends from a highly rigid structure,which is defined before the processing and mining of the data to highly flexible structure that isdefined after the processing and mining of the data.The “highly rigid” end of the continuum gave rise in the 1970s to technologies such as relationaldatabases that exploited the structure imposed on the data. The focus on “structured” data, (i.e., datawith predefined structure), continued until the 2000s. At that point, online search and web analyticscompanies started digging into “unstructured” data, (i.e., data without a predefined structure). Newtechniques are now available that take in data that has loose structure (e.g., log files) or implicitstructure (e.g., natural language) and extract that structure rapidly and at scale, making it available foranalysis in a time frame where it is still useful.

DATA PREPARATION.Given the variety of sources and types of data, a lot of work needs to go into preparing the databefore it is stored and analyzed. The data could be of varying quality (e.g., an address may bemissing a ZIP code or may contain a spelling mistake), may not be consistently recorded in the samemanner in different sources, and may have a different format. All of these issues are dealt with andthe inconsistencies and imperfections of the data reconciled in the process of data preparation. Itis usually referred to as the Extract-Transform-Load (ETL) process, where the data is taken from itssource, changed to fit certain rules or standards, and then moved to where it is stored, typically adata warehouse.Following the Garbage In, Garbage Out (GIGO)principle, the “cleansing” of the data turns out to be oneof the most crucial steps in the BI process and requirescareful attention. This has become very importantrecently with the rise in the quantity and variety of datasources and it is often said that 80% of a data scientist’stime is spent on cleaning the data. Being the new kidson the data mining block, data scientists have recentlyinvented new terms for it such as “data munging” and“data wrangling.”““It is a very sad t hingthat nowadaysthere is so littleuseless information.”—Oscar WildeBut cleaning the data is a small part of a very large process. The proliferation of data sourcesrequires that data scientists find ways to reconcile all those data sources to each other in a processcalled “data integration.” Data integration refers to tools that are part of the ETL process and helpcombine data from different sources to ensure a single, unified representation of the data.MANAGEMENT, GOVERNANCE, AND VALUE.The rules and standards for cleaning, transforming, and integrating the data are defined in what iscalled Master Data Management (MDM). Master data is the standard description of people, things,places or concepts that are important to the business, (e.g., customers, p roducts, or sales regions).Master Data Management is the combination of tools and processes that create and maintainconsistent, accurate and comprehensive lists of master data.

Data GovernancePeople» Master DataManagement Processes» Cleaning» Transforming» IntegratingTools» ETL» ERPMaster Data Management is an important componentand one of the key deliverables of a muchlarger, often enterprise-wide activity, called DataGovernance. It is an umbrella term which includesall the people, processes, and tools required tocreate a consistent and appropriate handling andmanagement of an organization’s data. You will findData Governance in action especially in businessactivities that require compliance with governmentregulation (e.g., financial services). More attention isbeing paid today to Data Governance in a variety ofindustries, however, with the increased privacy andsecurity concerns regarding consumer data (and themishandling of it in many cases).THE BUSINESS IMPACT.Finally, the increase in the quantity and variety of data sources has been linked to an increasedneed to support BI tasks in near real time or real time, leading to faster decisions. The goal ofwhat is known as Complex Event Processing (CEP) is to identify meaningful events that may serveas opportunities or threats to the organization and respond to them as quickly as possible. CEPrepresents a unique data preparation challenge in that it is based on real-time data and as such is notpart of the established ETL process.Businesses today track and process streams of data about events that may impact their fortunesin the near- or long-term. These may be internal events such as sales leads, customer orders orcustomer service calls; external events such as news items, text messages, social media posts,stock market feeds, traffic reports, or weather reports; and the events may signal a change ofstate, when a measurement exceeds a predefined threshold of time, temperature, or other value.Streamlining all the data streams by integrating them into one coherent and manageable body ofdata is key to streamlined data processing and analysis which is the next step in the process ofbusiness intelligence.

CHAPTER FOUR: THE BI PROCESSStep 2 - AnalysisANALYSIS:Integrated, standardized and “clean” data is stored and processed in databases or otherspecialized data m anagement systems and analyzed by applying statistical models andmethods to the data.DATA STORAGE AND PROCESSING.The Extract-Transform-Load (ETL) process typically loads the data into a data warehouse which is aspecialized database used for data storage, reporting, and analysis. Traditionally, the database ofchoice for these tasks has been of the relational database management systems (RDBMS) v arietywith a popular query language called SQL (for structured query language).ExtractSourcesTransformETL ServerLoadData WarehouseRelational databases use tables to store information.The data is represented as columns (fields)and rows (records) in a table. With a relationaldatabase, the user can easily find specificinformation (e.g., a customer’s address), sort thedata based on any field (e.g., customer’s name,address, type of purchase, etc.) and generatereports that contain only certain fields from eachrecord (e.g., a record may contain all the data for aspecific customer).

With a relational database, the user can quickly compare information because of the arrangement ofdata in columns. The relational database model takes advantage of this uniformity to build completelynew tables out of required information from existing tables. In other words, it uses the relationship ofsimilar data to increase the speed and versatility of the database.Want to learn more? Check out the executive brief, “The Data Warehouse Dilemma.”ONLINE ANALYTICAL PROCESSING (OLAP)A more specialized type of databases or data storage and processing systems is Online AnalyticalProcessing (OLAP) tools. They expose the multidimensional view of data to applications and enableBI operations such as consolidation, drill-down, filtering, and slicing and dicing. Databases configuredfor OLAP use a multidimensional data model, allowing for complex analytical and ad hoc queries withrapid execution time.““Information that is imperfectly acquired,is generally as imperfectly retained,” —William Playfair, inventor of the pie andbar charts, 1786OLAP databases have typically runon disk-based storage. Recently,however, as the cost of computermemory continues to decrease,analytics processing is more and moreperformed in-memory, (i.e., over data)that resides in computer memoryrather than on a hard drive. This resultsin faster analysis and greater flexibilityin using data from a variety of sources.NoSQL AS A STOP-GAP.In the early 2000s, a new type of database has started to gain popularity as it facilitated the storageand retrieval of data that is not organized the in tables used by relational databases. Collectivelycalled NoSQL, the new databases of this non-relational type successfully managed “unstructured”data such as documents and graphs.

SQLNoSQLProgramming language thatforms the basis for relationaldatabase solutions.Broad class ofdata management.Data is stored in a singlestructure for consistnecyin operations.A distributed file systemstores objects across a poolof commodity resources.Specific instructions are usedto query and manipulate datain a defined table.Different algorithms are usedto query and manipulate databased on the solution.Source: CompTIA, “Big Data Insights & Opportunities,” Sept. 2013.But even this new type of databasescould not deal effectively with the rapidlygrowing Web and the requirements ofsearch engines. Google, the company atthe forefront of indexing and analyzingthe Web, invented a completely newapproach to storing and processingunstructured data.Want to learn more? Check out theanalyst report, “What Business LeadersHate about Big Data.”MAPREDUCEThe third approach, called MapReduce, solved the problem of waiting a long time to read lots of data(later to be called “big data”) from disk drives. It did it by distributing the data over many commodityservers and their disk drives and then reading and writing the data in parallel. This new approach (oftendescribed as a “framework”) to storing and processing data was developed further as the open-sourceproject Hadoop and has become the foundational technology for managing big data.MapReduce is a batch query processor, (i.e., it runs over the entire dataset) and it does so at reasonablespeeds. As such, MapReduce is a good fit for applications where the data, typically unstructured data(i.e., it does not conform to a predefined schema or structure), is written once and read many times.In contrast, relational databases are good for structured data that is continuously updated. Today, thedifferences between relational databases and MapReduce/Hadoop are blurring as many vendors bringto the market data management solutions that combine attributes of both approaches.DATA ANALYSIS – DESCRIPTIVE.Once the data is stored and processed in an optimal data management solution based on thebusiness need and the type of decisions supported by the data, it is ready for analysis. Analytics (orBusiness Analytics) is the application of descriptive, diagnostic, predictive and prescriptive modelsto data in order to answer specific questions or discover new insights. Analysis techniques rangefrom historical reporting telling the decision maker what happened recently to looking at the future,predicting what is going to happen and recommending the best course of action.

An example of descriptive and diagnosticmodeling in widespread use is the concept ofKey Performance Indicators (KPIs). KPIs definea set of values against which the performanceof the entire organization, a business unit orfunction, or specific projects or employees ismeasured on a regular basis. By establishingKPIs, the business defines for its variousconstituencies what “success” means anda set of clear priorities. The periodicalassessments of the performance of thebusiness against its performance indicatorsoften lead to identifying potential problemsand areas for improvement.““Big data is nothing without its littlebrother—traditional KPIs.”—Bernard MarrDATA ANALYSIS – PREDICTIVE.Predictive and prescriptive modeling makes use of statistical methods that identify trends andrecurring patterns in a set of data. Largely known today as predictive analytics, these methods canbe applied to any type of unknown whether it is in the past, present or future. Predictive analyticsuncovers the relationships between what explains the situation we are trying to understand and asimilar outcome, based on observed p ast occurrences.Lately there has been more emphasis onanalyzing the future than the past and the newtools and techniques supporting this shift aresometimes referred to as advanced analytics.This future-orientation and the growing use ofnew tools for optimization and simulation havebeen spurred by the arrival of big data and itspractitioners—data scientists.Watch the webinar, “Choosing the right BIsolution: Overcoming 5 common concerns.”DATA ANALYSIS – MACHINE LEARNING.The new discipline of data science, touted by the Harvard Business Review as the “sexiest job of the21st century,” is analytics on steroids. It combines statistics with computer science and knowledge ofa specific business domain. Data scientists typically approach data without a pre-conceived notionof what could be found in it and analyze it to discover the “unknown unknowns” (as opposed to the“known unknowns”)—what we don’t know we don’t know.

Another important aspect of data scientists’ work is to automate the analysis of data, using computertechnology. They do so primarily by using machine learning techniques.Machine learning is a branch of artificial intelligenceand is best thought of as the application of computertechnology to learning. Similar to our basic learningprocess, the computer is “trained” by data which islabeled or classified based on previous outcomes,and its software algorithms “learn” how to predictthe classification of new data that is not labeled orclassified. For example, after a period of training inwhich the computer is presented with spam andnon-spam email messages, a good machine learningprogram will successfully identify and predict whichemail message is spam and which is not withouthuman intervention.Even when machine learning methods are used,humans still call the shots regarding what they want themachine to learn and what data should be used. In thenext and final step in the business intelligence process,humans play an even bigger role as the recipients ofthe analysis on which they base their decisions.Definition ofMachine Learning:“A scientific discipline thatdeals with the constructionand study of algorithmsthat can learn from data.Such algorithms operateby building a model basedon inputs and using thatto make pre

The business intelligence market is dominated by a few large players—SAP, Oracle, IBM, SAS, Microsoft, Teradata—accounting for about 70% of worldwide revenues. The balance of the market is . Traditional BIC loud