Transcription

Drive AnalyticInnovationThrough SAS and Open SourceIntegration

PURPOSE OFTHIS E-BOOKThis e-book is intended for those who want tolearn about how to use SAS with open sourceto drive analytic value and achieve trusteddecisions. Whether you are a SAS user interestedin dabbling in open source or an open sourceuser who wants to work with SAS, this e-book willhelp you get started.

CONTENT4WHAT DRIVES ANALYTIC INNOVATION TODAY?17MANAGING MODELS10INTEGRATION ACROSS THE ANALYTICAL LIFE CYCLE19DEPLOYING MODELS2116BUILDING MODELSNEXT STEPS

WHAT DRIVES ANALYTICINNOVATION TODAY?

D R I V E A N A LY T I C I N N O V AT I O N T H R O U G H S A S A N D O P E N S O U R C E I N T E G R AT I O NW H AT D R I V E S A N A LY T I C I N N O V AT I O N T O D AY ?AI & ANALYTICSCHALLENGES TODAYFour main challenges that influence business outcomes,increase the cost of analytics and slow the pace of innovation.Data: In many organizations, we need more collaboration between businesses,analytic teams, application developers and IT operations. These teams often workwith data in silos and end up duplicating efforts, failing to integrate, or missingopportunities to deliver value from data.In addition to siloed efforts, data scientists are also faced with ever-increasingvolumes and speeds of data. And the reality is they’re expected to answerquestions just as fast as - or faster than - before.It’s important that we’re using the right data and the right techniques to ensureoptimal outcomes.» AI & ANALYTICS» SEAMLESS INTEGRATION WITHCHALLENGES TODAY OPEN SOURCE AS SUCCESS FACTORWHAT DRIVES ANALYTICINNOVATION TODAYINTEGRATION ACROSS THEANALYTICAL LIFE CYCLETechnology: What we learned in college is that data doesn’tmagically appear in these analytically ready formats. There’s a ton oftransformation work required to take relational data and put it intoformats that support model development.It comes down to how quickly you can explore and identify the rightalgorithms, and ultimately train one or more models to achieve youranalytic goals.Okay, so you’ve created a model. But this won’t affect businessoutcomes until it’s deployed and integrated into an operationalenvironment.The longer this entire process takes, the greater the risk that yourmodels in production are working on assumptions in the data that areno longer valid.If the market conditions have shifted, how quickly can you determineif your models are deteriorating? How quickly can you retrain? Howquickly can you redeploy?Data scientists must answer all of these questions and more.» WHY USE AN AI, ANALYTICAL ANDDATA MANAGEMENT PLATFORM?» HOW DATA SCIENTISTSWILL BENEFITBUILDING MODELSMANAGING MODELSDEPLOYING MODELSNEXT STEPS

D R I V E A N A LY T I C I N N O V AT I O N T H R O U G H S A S A N D O P E N S O U R C E I N T E G R AT I O NW H AT D R I V E S A N A LY T I C I N N O V AT I O N T O D AY ?Process: When it comes to development, data scientists are usuallynot interested in spending time with mundane activities such asdocumentation, lineage, traceability, versioning, or explainability.These provide visibility into what models are doing and how theyare performing.These factors increase the likelihood of a failed model experiment.One that never makes it into production.Siloed, duplicative data;Increasing speed andvolume of dataBut putting in place an effective process to ensure governance andtransparency will increase your overall efficiency, productivity, andultimately repeatability.People: An effective process only matters if you can put people ina position to be successful and give them access to the right toolsto support innovation. Data scientists must be able to rapidly adaptand refresh these tools over time in a consistent way allowing themto spend more time applying the tools to achieve value rather thanmaintaining the tools.» AI & ANALYTICS» SEAMLESS INTEGRATION WITHCHALLENGES TODAY OPEN SOURCE AS SUCCESS FACTORWHAT DRIVES ANALYTICINNOVATION TODAYINTEGRATION ACROSS THEANALYTICAL LIFE CYCLETECHNOLOGYDATAVariety of tools;ScalabilityOBSTACLESPROCESSPEOPLESpecialized skillsets;Recruiting costsGovernance;Repeatability andtransparency» WHY USE AN AI, ANALYTICAL ANDDATA MANAGEMENT PLATFORM?» HOW DATA SCIENTISTSWILL BENEFITBUILDING MODELSMANAGING MODELSDEPLOYING MODELSNEXT STEPS

D R I V E A N A LY T I C I N N O V AT I O N T H R O U G H S A S A N D O P E N S O U R C E I N T E G R AT I O NW H AT D R I V E S A N A LY T I C I N N O V AT I O N T O D AY ?SEAMLESS INTEGRATION WITHOPEN SOURCE AS SUCCESS FACTORHowever, this can create option fatigue, resulting in an inconsistentlandscape that makes it difficult to scale analytics. There is increasingrecognition among companies that it may be helpful to draw in othertechnology to integrate open source and other software – such asanalytics platforms like SAS Viya – to create interoperability and utilityfrom open source.» AI & ANALYTICS» SEAMLESS INTEGRATION WITHCHALLENGES TODAY OPEN SOURCE AS SUCCESS FACTORWHAT DRIVES ANALYTICINNOVATION TODAYINTEGRATION ACROSS THEANALYTICAL LIFE CYCLEACCELERATEINNOVATIONCOSTThis can span languages like SAS, Python or R, integrated developmentenvironments, deployment technologies, virtual machines, Kubernetesand more.EXPECTATIONS AND BUSINESS NEEDSSPEEDThere’s a vibrant ecosystem of choices available fordata scientists.OPERATIONALIZEANALYTICSACCESS AND UNDERSTANDTHE DATA QUICKERMANAGE MODEL INVENTORYBUILD BETTER MODELS, FASTERMONITOR MODELEFFECTIVENESSSCALE TO ENTERPRISE LEVELCENTRALIZED ANALYTICSGOVERNANCECOLLABORATE SEAMLESSLYACROSS TEAMSENABLE REAL-TIMEDECISIONING» WHY USE AN AI, ANALYTICAL ANDDATA MANAGEMENT PLATFORM?» HOW DATA SCIENTISTSWILL BENEFITBUILDING MODELSMANAGING MODELSDEPLOYING MODELSNEXT STEPS

D R I V E A N A LY T I C I N N O V AT I O N T H R O U G H S A S A N D O P E N S O U R C E I N T E G R AT I O NW H AT D R I V E S A N A LY T I C I N N O V AT I O N T O D AY ?WHY USE AN AI, ANALYTICAL ANDDATA MANAGEMENT PLATFORM?In her article “Six trends that will influence Data Science practitioners’priorities in 2021” Marinela Profi, Product Marketing Manager for DataScience at SAS, mentions “an ongoing reliance on open source, and afocus on its integration” as one of the trends in 2021.and seamlessly integrated life cycle, open source is creating massivechallenges around coordination, integration and as a result deliveringbusiness value.A critical success factor for these platforms is not limiting the languagesthat data scientists or IT developers can use, including open source. Theyalso need to integrate via open APIs and ensure endless scalability.There is increasing recognition among companies that it may behelpful to draw in other technology to integrate open source andother software – such as analytics platforms like Viya – to createinteroperability and utility from open source.Organizations are depending on open source (like Python and R). AndIt’s proven to be a valuable tool for specific analytical tasks. However,when it comes to building a long-term analytic strategy - a self-sustainingNo matter which technologies you choose, you must work onsimplifying and automating this process. You need to figure this outfor your own ecosystem. And when you do, the payoff is huge.» AI & ANALYTICS» SEAMLESS INTEGRATION WITHCHALLENGES TODAY OPEN SOURCE AS SUCCESS FACTORWHAT DRIVES ANALYTICINNOVATION TODAYINTEGRATION ACROSS THEANALYTICAL LIFE CYCLE» WHY USE AN AI, ANALYTICAL ANDDATA MANAGEMENT PLATFORM?» HOW DATA SCIENTISTSWILL BENEFITBUILDING MODELSMANAGING MODELSDEPLOYING MODELSNEXT STEPS

D R I V E A N A LY T I C I N N O V AT I O N T H R O U G H S A S A N D O P E N S O U R C E I N T E G R AT I O NW H AT D R I V E S A N A LY T I C I N N O V AT I O N T O D AY ?BENEFITS OF USING OPEN-SOURCE WITH SAS:HOW DATASCIENTISTSWILL BENEFIT Harness the cloud-native high-performancearchitecture of SAS Viya Massively distributed parallel processingfor endless scalability API first development strategy coupled withcontainerized microservices architecture Composite AI to combine different AI andanalytics techniques in the same environment(i.e. computer vision and optimization) Data Lineage and Auditability so you cansee how data moves through the entiresystem Automated feature engineering with MLpowered data preparation Monitor open-source and SAS modelsin same repository and build customizedperformance reportsData scientists, according to interviews and expert estimates, spend50 percent to 80 percent of their time mired in the mundane labor ofcollecting and preparing unruly digital data before it can be exploredfor useful nuggets. And, what’s more, one they have built the model,deployment can be another nightmare. Use Python or R directly with SAS or integrateSAS into applications using REST APIs Business rule and formal decisioningintegration Run open-source models without recoding Go from code to point-and-click, and thenback to code if desiredCitizen data scientists will bring their work to the business faster, beingappreciated by their organization and recognized as innovators. Deploy SAS, R, Python models in batch,streaming, cloud or edge device Container Deployment of SAS and opensource models Ensure explainability of your data scienceprojects through natural language-poweredbuilt-in report Use automation in continuous integrationand continuous deployment pipelines tomanage code artifacts» AI & ANALYTICS» SEAMLESS INTEGRATION WITHCHALLENGES TODAY OPEN SOURCE AS SUCCESS FACTORWHAT DRIVES ANALYTICINNOVATION TODAYINTEGRATION ACROSS THEANALYTICAL LIFE CYCLE» WHY USE AN AI, ANALYTICAL ANDDATA MANAGEMENT PLATFORM?» HOW DATA SCIENTISTSWILL BENEFITBUILDING MODELSMANAGING MODELSDEPLOYING MODELSNEXT STEPS

INTEGRATIONACROSS THEANALYTICALLIFE CYCLE

D R I V E A N A LY T I C I N N O V AT I O N T H R O U G H S A S A N D O P E N S O U R C E I N T E G R AT I O NI N T E G R AT I O N A C R O S S T H E A N A LY T I C A L L I F E C Y C L EINTEGRATION ACROSS THEANALYTICAL LIFE CYCLEAs the analytical life cycle is a broad topic, for therest of this booklet, we will focus on a specificperspective: The Model Life Cycle.The Model Life Cycle is where we see the mostinteractions and integration possibilities betweenopen source technology and SAS technology,and hence will be our focal point of discussion.The Model Life cycle is a methodological processthat data scientists and practitioners apply tobuild, manage and deploy analytical models togenerate business value.» MODEL LIFE CYCLEPROCESS FLOWWHAT DRIVES ANALYTICINNOVATION TODAY?INTEGRATION ACROSS THEANALYTICAL LIFE CYCLE» INTEGRATION INMODELLINGBUILDING MODELS»HOW DOESINTEGRATION OCCUR?MANAGING MODELS» HOW DO I INTEGRATE INTHE MODEL LIFE CYCLE?DEPLOYING MODELSNEXT STEPS

D R I V E A N A LY T I C I N N O V AT I O N T H R O U G H S A S A N D O P E N S O U R C E I N T E G R AT I O NI N T E G R AT I O N A C R O S S T H E A N A LY T I C A L L I F E C Y C L EMODEL LIFE CYCLEPROCESS FLOWBuild amodelNote: Modelling is an iterative process.Continuous model building, testing andmonitoring is needed for any healthymodel life cycle.Improvethe model» MODEL LIFE CYCLEPROCESS FLOWWHAT DRIVES ANALYTICINNOVATION TODAY?INTEGRATION ACROSS THEANALYTICAL LIFE CYCLE» INTEGRATION INMODELLINGBUILDING MODELSDeploy thebest modelCompare toother models»HOW DOESINTEGRATION OCCUR?MANAGING MODELSTest, validate,monitor modelperformance, etc.» HOW DO I INTEGRATE INTHE MODEL LIFE CYCLE?DEPLOYING MODELSNEXT STEPS

D R I V E A N A LY T I C I N N O V AT I O N T H R O U G H S A S A N D O P E N S O U R C E I N T E G R AT I O NI N T E G R AT I O N A C R O S S T H E A N A LY T I C A L L I F E C Y C L EINTEGRATIONIN MODELINGEmpowering Open IntegrationIntegration in modelling provides morecapabilities and flexibility to users. The nextquestion is, how can integration be achieved?To answer this, we will look at two importantareas before getting into the more technicalaspects of integration.» MODEL LIFE CYCLEPROCESS FLOWWHAT DRIVES ANALYTICINNOVATION TODAY?INTEGRATION ACROSS THEANALYTICAL LIFE CYCLEEmpowering Open Integration With Viya:Introduction and Overview1How does integration occur?2How do I integrate in the model life cycle?» INTEGRATION INMODELLINGBUILDING MODELS»HOW DOESINTEGRATION OCCUR?MANAGING MODELS» HOW DO I INTEGRATE INTHE MODEL LIFE CYCLE?DEPLOYING MODELSNEXT STEPS

D R I V E A N A LY T I C I N N O V AT I O N T H R O U G H S A S A N D O P E N S O U R C E I N T E G R AT I O NI N T E G R AT I O N A C R O S S T H E A N A LY T I C A L L I F E C Y C L EHOW DOESINTEGRATIONOCCUR?OPEN SOURCE TO SASFor users who have an open source background looking to explore SAS capabilitiesright from the open source interface.SAS TO OPEN SOURCEThere are two approaches to integration:going from open source to SAS and from SASto open source. The benefit of having twooptions is that it opens and extends analyticsin your organization to multiple types of users,regardless of their technical background.» MODEL LIFE CYCLEPROCESS FLOWWHAT DRIVES ANALYTICINNOVATION TODAY?INTEGRATION ACROSS THEANALYTICAL LIFE CYCLEVALUEUsers can share and collaborateirrespective of which approachthey’ve taken.For users who want to utilize both SAS andopen source assets in SAS’ visual interface.» INTEGRATION INMODELLINGBUILDING MODELS»HOW DOESINTEGRATION OCCUR?MANAGING MODELS» HOW DO I INTEGRATE INTHE MODEL LIFE CYCLE?DEPLOYING MODELSNEXT STEPS

D R I V E A N A LY T I C I N N O V AT I O N T H R O U G H S A S A N D O P E N S O U R C E I N T E G R AT I O NI N T E G R AT I O N A C R O S S T H E A N A LY T I C A L L I F E C Y C L EHOW DO IINTEGRATE INTHE MODELLIFE CYCLE?OPEN SOURCE TO SASBUILD MODELSBuild models (opensource and SAS) from anopen source interfaceSAS TO OPEN SOURCEWHAT DRIVES ANALYTICINNOVATION TODAY?INTEGRATION ACROSS THEANALYTICAL LIFE CYCLEMANAGE MODELSRegister and managemodels to a SAS modelrepository from an opensource interface/ STA RT H E R E !BUILD MODELSBuild open sourcemodels in a SAS modelpipeline, accessible inthe visual interfaceSAS can integrate with open source technologiesat any point of the model life cycle via APIs. Yourintegration path is dependent on how you preferto integrate, and where in the model life cycleyou would like to integrate.» MODEL LIFE CYCLEPROCESS FLOW/ STA RT H E R E !» INTEGRATION INMODELLINGBUILDING MODELSMANAGE MODELSDEPLOY MODELSPublish models into a range ofSAS and open source executionengines for batch, single call orreal-time processingImport open sourcemodels into a SAS modelrepository for modelmanagement»HOW DOESINTEGRATION OCCUR?MANAGING MODELS» HOW DO I INTEGRATE INTHE MODEL LIFE CYCLE?DEPLOYING MODELSNEXT STEPS

D R I V E A N A LY T I C I N N O V AT I O N T H R O U G H S A S A N D O P E N S O U R C E I N T E G R AT I O NBUILDING MODELSOpen Source to SASBUILDINGMODELSSAS to Open SourceBuilding Models 1:Open Source to SAS via SWAT and DLPyBuilding Models 2:SAS to Open Source via Model StudioWHAT DRIVES ANALYTICINNOVATION TODAY?INTEGRATION ACROSS THEANALYTICAL LIFE CYCLEBUILDING MODELSMANAGING MODELSDEPLOYING MODELSNEXT STEPS

D R I V E A N A LY T I C I N N O V AT I O N T H R O U G H S A S A N D O P E N S O U R C E I N T E G R AT I O NMANAGING MODELSOpen Source to SASMANAGINGMODELSSAS to Open SourceManaging Models 1:Open Source to SAS via SAS Ctl/PzmmManaging Models 2:SAS to Open Source via Model Manager» I HAVE MY MODELRUNNING! WHAT’S NEXT?WHAT DRIVES ANALYTICINNOVATION TODAY?INTEGRATION ACROSS THEANALYTICAL LIFE CYCLEBUILDING MODELSMANAGING MODELSDEPLOYING MODELSNEXT STEPS

D R I V E A N A LY T I C I N N O V AT I O N T H R O U G H S A S A N D O P E N S O U R C E I N T E G R AT I O NMANAGING MODELSI HAVE MYMODEL RUNNING,WHAT’S NEXT?Once you have a Python or R model created, youcan then register those models to SAS ModelManager to compare, evaluate and monitor theperformance of the models before publishingthem to a test or production environment.It ensures you are consistentlyrunning your best model at anygiven time to minimize the impactof model decay on business.Can deploy models with just a fewclicks, both in batch and in realtime for quicker value realization.Allows for complete traceabilityand analytics governance througha centralized model repository,and version control for highergovernance of a model workflow.Note: To manage models in SAS, youwould need to license SAS ModelManager on your environment.Model management is an important step afterbuilding models. The benefits of using SASModel Manager include:» I HAVE MY MODELRUNNING! WHAT’S NEXT?WHAT DRIVES ANALYTICINNOVATION TODAY?INTEGRATION ACROSS THEANALYTICAL LIFE CYCLEBUILDING MODELSMANAGING MODELSDEPLOYING MODELSNEXT STEPS

DEPLOYINGMODELS

D R I V E A N A LY T I C I N N O V AT I O N T H R O U G H S A S A N D O P E N S O U R C E I N T E G R AT I O NDEPLOYING MODELSMY MODELIS READY!WHAT’S NEXT?Understanding the opensource ecosystemNow that you have your model running, it’stime to deploy it. Deploying your modelsimply means to have your models be used in aproduction environment.Deploying Models:SAS Cloud Analytic ServicesDeploying Models:SAS Micro Analytic ServiceThis is usually done by the DevOps or IT teamin your organization.In these videos, we will demonstrate deploymentmethods using SAS tools, which will makedeployment for both SAS and open sourcemodels more robust for production environmentusing a combination of SAS and opensource technologies.Deploying Models:Docker/KubernetesDeploying Models:SAS Event Stream Processing» MY MODEL IS READY!WHAT’S NEXT?WHAT DRIVES ANALYTICINNOVATION TODAY?INTEGRATION ACROSS THEANALYTICAL LIFE CYCLEBUILDING MODELSMANAGING MODELSDEPLOYING MODELSNEXT STEPS

D R I V E A N A LY T I C I N N O V AT I O N T H R O U G H S A S A N D O P E N S O U R C E I N T E G R AT I O NNEXT STEPSWhat is the complexity and/or volumegrowth of your data?What is the mix of skillsets in your team/organization?NEXT STEPSWhat are the analytics problems that youare trying to solve? Are they big? Arethey complex? Are they urgent?By this point, you should have a good graspof why SAS integrates with open source, thebenefits of doing so, and the various ways inwhich you can do it.How does your IT environment operate?Will you generate more technical debts? However, to have a successful integration, youneed to give careful consideration of whether anintegration is needed in your context, and if so,to what extent.For further information please visit thesas.com/viya page.Here are some sample questions to ask:Copyright 2021, SAS Institute Inc. All rights reserved. 112134 G149360.0521WHAT DRIVES ANALYTICINNOVATION TODAY?INTEGRATION ACROSS THEANALYTICAL LIFE CYCLEBUILDING MODELSMANAGING MODELSDEPLOYING MODELSNEXT STEPS

Import open source models into a SAS model repository for model management Build models (open source and SAS) from an open source interface Register and manage models to a SAS model repository from an open source interface Publish models into a range of SAS and open source executi