Transcription

Big Data EcosystemReference ArchitectureOrit Levin, MicrosoftJuly 18th, 2013

RA Objectives Audience: Useful to industry, policy makers, and users (or “data owners”) Scope: Encompasses the whole data life cycle Focus: Exposes projected “interoperability surfaces” Assists in identifying security and privacy issues Addresses radically different Big Data use cases An ecosystem comprised of independent stakeholders (e.g., advertising industry)A stand-alone Enterprise Data WarehouseOutsourcing of selected or all data transformations to SaaS providersOutsourcing of data storage and/or computing to IaaS providersEtc. Agnostic to any specific technologies Shows the mapping to NIST CC RA (slide # 4)7/10/2013Microsoft2

Big Data Ecosystem RAIndividual Data TransferBig Data TransferSelected Data Storage and RetrievalBig Data Storage and RetrievalData SourcesData onMatchingConditioningData InfrastructureStorage &RetrievalPIIPseudoanonymizedManagementData TransformationSecurityVELOCITYAnonymizedData MiningData UsageNetwork Operators / Telecom7/10/2013Industries / BusinessesGovernment (incl. health & financial institutions)MicrosoftAcademia3

An Example of Cloud Computing Usage in BigData EcosystemIndividual Data TransferBig Data TransferSelected Data Storage and RetrievalBig Data Storage and RetrievalData SourcesData ObjectsVOLUMEVARIETYVELOCITYData TransformationData InfrastructureData Cloud Provider/ Service LayerData MiningData UsageNetwork Operators / Telecom7/10/2013Industries / BusinessesGovernment (incl. health & financial institutions)MicrosoftAcademia4

ControlUse Case: AdvertisingIndividual Data TransferOffline SourcesOnline SourcesBig Data TransferData Subject / PersonUI: Do Not Track (DNT)NetworksEnd User devices incl. OS(mobile phones, etc.)DPICollectionAnalytic CookieDMP Container Tagor Pixel requestApplications (search,publishers, etc.)Match CookieDataManagementPlatforms(DMPs)Internal RecordsPublic Records (commons,government, etc.)DMP CookieMatch Container Tagor Pixel requestAppl. with customers(communications, socialnetwork, etc.Online Data AggregatorPIIDe-identifiedAggregatedWeb BrowsersHTTP: DNTNetworkOperatorsOther devices (Smart Grid,surveillance, scientific, etc.)1st Party2nd Party3rd PartyIndustries /BusinessesMatch/Bridge ServiceGovernment, health,financial institutions,academiaContextualData CollectionOffline Data AggregatorBehavioralData CreationData MiningPerson AttributionUsersAdvertising Industry AgencyAdvertiser5

Use Case: Enterprise Data WarehouseIndividual Data TransferBig Data TransferSelected Data Storage and RetrievalBig Data Storage and RetrievalData SourcesData ObjectsFilesData TransformationData InfrastructureCentral DataWarehouseExtraction, Transformation, and Loading(ETL)Online AnalyticalProcessing (OLAP)ArchivesOperationalData StoreManaged ReportEnvironment (MRE)Staging AreaData Mining /Knowledge Discovery in Databases (KDD)ManualManagementOnline Transaction Processing(OLTP) SystemsSecurityMS Office DocumentsData UsageSubject Data Mart7/10/2013Regional Data MartDepartment Data MartApplication Data MartMicrosoftFunctional Data Mart6

Data Mining Matching Collection Big Data Transfer Data Transformation Data Infrastructure Storage & Retrieval y t C oning Anonymized Pseudo-anonymized PII Selected Data Storage and Retrieval Big Data Storage and Retrieval VARIETY VOLUME VELOCITY Aggregation 7/10/2013 M i c r o s o f t 3 Big Data Ecosystem RA