
Transcription
Overview NIST Big Data Working Group ActivitiesandBig Data Architecture Framework (BDAF) by UvAYuri DemchenkoSNE Group, University of AmsterdamBig Data Analytics Interest Group17 September 2013, 2nd RDA Plenary
Outline Overview NIST Big Data Working Group (NBD-WG) activities anddeliverables Proposed Big Data Architecture Framework (BDAF)– Data Models and Big Data Lifecycle– Big Data Infrastructure (BDI) Discussion: Liaison and information exchange with NIST BD-WGDisclaimer: Presented here information about NIST Big Data Working Group(NBD-WG) and images from the NBD-WG working documents are not officialposition of the NBD-WG and are solely the authors opinion.17 September 2013, RDA-CWGBDANIST BD-WG and UvA BDAFSlide 2
NIST Big Data Working Group (NBD-WG) Deliverables target – September 2013– 26 September – initial draft documents– 30 September – Workshop and F2F meeting Activities: Conference calls every day 17-19:00 (CET) bysubgroup - ig Data Definition and TaxonomiesRequirements (chair: Geoffrey Fox, Indiana Univ)Big Data SecurityReference ArchitectureTechnology Roadmap BigdataWG mailing list and useful documents– Input documents http://bigdatawg.nist.gov/show InputDoc2.php– Big Data Reference Architecturehttp://bigdatawg.nist.gov/ uploadfiles/M0226 v2 1885676266.docx– Requirements for 21 usecaseshttp://bigdatawg.nist.gov/ uploadfiles/M0224 v1 1076079077.xlsx17 September 2013, RDACWG-BDANIST BD-WG and UvA BDAF3
NIST Proposed Reference Architecture (beforeJuly 2013) Obviously not data centricDoesn’t make data (lifecycle) management clear[ref] NIST Big Data WG mailing list discussionhttp://bigdatawg.nist.gov/ uploadfiles/M0010 v1 6762570643.pdf17 September 2013, RDACWG-BDANIST BD-WG and UvA BDAF4
Big Data Ecosystem Reference Architecture (ByMicrosoft) [ref] – Initial contribution July 2013[ref] Big Data Ecosystem Reference Architecture (Microsoft)http://bigdatawg.nist.gov/ uploadfiles/M0015 v1 1596737703.docx17 September 2013, RDACWG-BDANIST BD-WG and UvA BDAF5
NIST Reference Architecture version 0.0(August 2013)VOLUMEVARIETYDataUsage17 September 2013, RDACWG-BDALifecycle ManagementSystem ManagementSecurity & Privacy ManagementHardware (Storage, Networking, etc.) ManagementUsage Service ationCapability Service AbstractionData Service AbstractionTransitional FrameworkCapability ManagementVELOCITYCloud C. FrameworkDataSourcesRETRIEVEREPORTRENDERINGNIST BD-WG and UvA BDAF6
NIST Reference Architecture version 0.1(September 2013)Capabilities ProviderBig Data FrameworkSWDATAData Service AbstractionScalable onAccessDATASWCapabilities Service Abstraction(analytic tools, etc.)Transformation ProviderSystem Service AbstractionSWUsage Service AbstractionDATASystem Manager or Vertical OrchestratorI N F O R M AT I O N F L O W / V A L U E C H A I NData ProviderLegacy ApplicationsScalable Platforms(databases, etc.)Legacy PlatformsScalableInfrastructures (VMcluster, etc.)Legacy InfrastructuresHardware(Storage,Networking, etc.)Data ConsumerKEY:Service UseDATABig DataInformation FlowSWSW Tools andAlgorithms Transfer17 September 2013, RDACWG-BDAI T S TA C K / V A L U E C H A I NNIST BD-WG and UvA BDAF7
Big Data Architecture Framework (BDAF)by the University of Amsterdam Big Data definition: from 5 1Vs to 5 parts Big Data Architecture Framework (BDAF)components17 September 2013, RDACWG-BDANIST BD-WG and UvA BDAF8
Improved: 5 1 V’s of Big DataVolumeVariety kedDynamic TerabytesRecords/ArchTables, FilesDistributed BatchReal/near-timeProcessesStreams6 Vs ofBig DataValue Changing data Changing model LinkageVariabilityVelocity CorrelationsStatisticalEventsHypotheticalGeneric Big DataProperties Volume Variety VelocityAcquired Properties(after entering system) Value Veracity VariabilityTrustworthinessAuthenticityOrigin, ReputationAvailabilityAccountabilityVeracity17 September 2013, RDACWG-BDANIST BD-WG and UvA BDAF9
Big Data Definition: From 5 1V to 5 Parts (1)(1) Big Data Properties: 5V– Volume, Variety, Velocity, Value, Veracity– Additionally: Data Dynamicity (Variability)(2) New Data Models– Data Lifecycle and Variability– Data linking, provenance and referral integrity(3) New Analytics–Real-time/streaming analytics, interactive and machine learning analytics(4) New Infrastructure and Tools––––High performance Computing, Storage, NetworkHeterogeneous multi-provider services integrationNew Data Centric (multi-stakeholder) service modelsNew Data Centric security models for trusted infrastructure and data processingand storage(5) Source and Target– High velocity/speed data capture from variety of sensors and data sources– Data delivery to different visualisation and actionable systems and consumers– Full digitised input and output, (ubiquitous) sensor networks, full digital control17 September 2013, RDACWG-BDANIST BD-WG and UvA BDAF10
Big Data Definition: From 5V to 5 Parts (2)Refining Gartner definition Big Data (Data Intensive) Technologies are targeting to process (1) highvolume, high-velocity, high-variety data (sets/assets) to extract intendeddata value and ensure high-veracity of original data and obtainedinformation that demand cost-effective, innovative forms of data andinformation processing (analytics) for enhanced insight, decision making,and processes control; all of those demand (should be supported by) newdata models (supporting all data states and stages during the whole datalifecycle) and new infrastructure services and tools that allows alsoobtaining (and processing data) from a variety of sources (includingsensor networks) and delivering data in a variety of forms to different dataand information consumers and devices.(1) Big Data Properties: 5V(2) New Data Models(3) New Analytics(4) New Infrastructure and Tools(5) Source and Target17 September 2013, RDACWG-BDANIST BD-WG and UvA BDAF11
Big Data Nature: Origin and consumers (target)Big Data Origin Science Telecom Industry Business Living Environment,Cities Social media andnetworks Healthcare17 September 2013, RDACWG-BDABig Data Target Use Scientific discovery New technologies Manufacturing,processes, transport Personal services,campaigns Living environmentsupport Healthcare supportNIST BD-WG and UvA BDAF12
Big Data Nature: Origin and consumers t,Infrastruct,UtilityHealthcaresupportScience - Telecom Industry -- Business - Livingenvironment,Cities Social media,networks - -Healthcare -- Rich information on usecases is available from the NIST document storehttp://bigdatawg.nist.gov/show InputDoc.php17 September 2013, RDACWG-BDANIST BD-WG and UvA BDAF13
Moving to Data-Centric Models and Technologies Current IT and communication technologies arehost based or host centric– Any communication or processing are bound to host/computer thatruns software– Especially in security: all security models are host/client based Big Data requires new data-centric models––––Data location, search, accessData variability and lifecycleData integrity and identificationData centric security and access control17 September 2013, RDACWG-BDANIST BD-WG and UvA BDAF14
Defining Big Data Architecture Framework Existing attempts don’t converge to consistent view: ODCA, TMF, NIST– See http://bigdatawg.nist.gov/ uploadfiles/M0055 v1 7606723276.pdf Big Data Architecture Framework (BDAF) by UvAArchitecture Framework and Components for the Big Data Ecosystem.Draft Version -2013-02-techreport-bdafdraft02.pdf Architecture vs Ecosystem– Big Data undergo a number of transformations during their lifecycle– Big Data fuel the whole transformation chain Data sources and data consumers, target data usage– Multi-dimensional relations between Data models and data driven processes Infrastructure components and data centric services Architecture vs Architecture Framework (Stack)– Separates concerns and factors Control and Management functions, orthogonal factors– Architecture Framework components are inter-related17 September 2013, RDACWG-BDANIST BD-WG and UvA BDAF15
Big Data Architecture Framework (BDAF)for Big Data Ecosystem (BDE)(1) Data Models, Structures, Types– Data formats, non/relational, file systems, etc.(2) Big Data Management– Big Data Lifecycle (Management) Model Big Data transformation/staging– Provenance, Curation, Archiving(3) Big Data Analytics and Tools– Big Data Applications Target use, presentation, visualisation(4) Big Data Infrastructure (BDI)– Storage, Compute, (High Performance Computing,) Network– Sensor network, target/actionable devices– Big Data Operational support(5) Big Data Security– Data security in-rest, in-move, trusted processing environments17 September 2013, RDACWG-BDANIST BD-WG and UvA BDAF16
Big Data Architecture Framework (BDAF) –Aggregated – Relations between components (2)Col: Used ByRow: RequiresThisDataModelsStructrsData Models& StructuresDataManagmnt& LifecycleBigDataInfrastr &OperationsBigDataBigDataAnalytics & SecurityApplicatn DataManagmnt &Lifecycle BigDataInfrastruct &Operations BigDataAnalytics &Applications BigDataSecurity 17 September 2013, RDACWG-BDANIST BD-WG and UvA BDAF 17
onDataDelivery,VisualisationConsumerBig Data Ecosystem: Data, Lifecycle,InfrastructureBig Data Target/Customer: Actionable/Usable DataTarget users, processes, objects, behavior, Big Data Source/Origin (sensor, experiment, logdata, behavioral data)Big Data Analytic/ToolsStorageGeneralPurposeData alytics DB,In memory,operstional)Datacategories: metadata,(un)structured, identifiable(non)identifiableIntercloud multi-provider heterogeneous InfrastructureSecurity Infrastructure17 September 2013, RDACWG-BDANetwork itoringNIST BD-WG and UvA BDAF18
Big Data Infrastructure and Analytic ToolsBig Data Target/Customer: Actionable/Usable DataTarget users, processes, objects, behavior, etc.Big Data Source/Origin (sensor, experiment, logdata, behavioral data)Big Data Analytic/ToolsAnalytics:Refinery, Linking, FusionAnalytics :Realtime, Interactive,Batch, StreamingStorageGeneralPurposeData ManagementComputeGeneralPurposeAnalytics ApplicationsLink AnalysisCluster AnalysisEntity ResolutionComplex isedDatabasesArchivesDatacategories: metadata,(un)structured, identifiable(non)identifiableIntercloud multi-provider heterogeneous InfrastructureSecurity Infrastructure17 September 2013, RDACWG-BDANetwork itoringNIST BD-WG and UvA BDAF19
Data Transformation/Lifecycle ModelCommon Data Model? Data Variety and Variability Semantic InteroperabilityData Model (1)Data Model (1)Data Model (4)Data (inter)linking? Persistent ID Identification Privacy, OpacityData ataDelivery,VisualisationConsumerData AnaliticsApplicationDataSourceData Model (3)Data repurposing,Analitics re-factoring,Secondary processing Does Data Model changes along lifecycle or data evolution?Identifying and linking data– Persistent identifier– Traceability vs Opacity– Referral integrity17 September 2013, RDACWG-BDANIST BD-WG and UvA BDAF20
Scientific Data Lifecycle Management (SDLM)ModelData Lifecycle Model in e-ScienceUserResearcherData discoveryData Curation(including retirement and clean up)DatarecyclingRaw DataExperimentalDataProject/ExperimentPlanningData collectionandfilteringStructuredScientificDataData analysisDBData archivingDataRe-purposeData linkageto papersData sharing/Data publishingData Re-purposeData Linkage Issues Persistent Identifiers (PID) ORCID (Open Researcher andContributor ID) Lined DataData Clean up and Retirement Ownership and authority Data DetainmentNIST BD-WG and UvA BDAFEnd of projectOpenPublicUseData Links17 September 2013, RDACWG-BDADataarchivingMetadata &Mngnt21
Evolutional/Hierarchical Data ModelActionable DataPapers/ReportsArchival DataUsable DataProcessed Data (for target use)Processed Data (for target use)Processed Data (for target use)Classified/Structured DataClassified/Structured DataClassified/Structured DataRaw Data Common Data Model?Data interlinking?Fits to Graph data type?Metadata17 September 2013, RDACWG-BDA ReferralsControl informationPolicyData patternsNIST BD-WG and UvA BDAF22
Additional Information Existing proposed Big Data architectures17 September 2013, RDACWG-BDANIST BD-WG and UvA BDAF23
Industry Initiatives to define Big Data (Architecture) Open Data Center Alliance (ODCA) Information as aService (INFOaaS) TMF Big Data Analytics Reference Architecture Research Data Alliance (RDA)– All data related aspects, but not Infrastructure and tools LexisNexis HPCC Systems17 September 2013, RDACWG-BDANIST BD-WG and UvA BDAF24
ODCA INFOaaS – Information as a Service Using integrated/unifiedstorage– New DB/storagetechnologies allowstoring data during alllifecycle[ref] Open Data Center AllianceMaster Usage model: Informationas a Service, Rev ormation as a Service Master Usage Model Rev1.0.pdf17 September 2013, RDACWG-BDANIST BD-WG and UvA BDAF25
ODCA Example INFOaaS Architecture Core Data and InformationComponentsData Integration and DistributionComponents17 September 2013, RDACWG-BDA Presentation and Information DeliveryComponentsControl and Support ComponentsNIST BD-WG and UvA BDAF26
TMF Big Data Analytics Architecture[ref] TR202 Big DataAnalytics ReferenceModel. Version 1.9, April2013.17 September 2013, RDACWG-BDANIST BD-WG and UvA BDAF27
LexisNexis Vision for Data AnalyticsSupercomputer (DAS) [ref][ref] HPCC Systems: Introduction to HPCC (High Performance Computer Cluster), Author: A.M.Middleton, LexisNexis Risk Solutions, Date: May 24, 201117 September 2013, RDACWG-BDANIST BD-WG and UvA BDAF28
LexisNexis HPCC SystemArchitectureECL – Enterprise Data ControlLanguageTHOR Processing Cluster (DataRefinery)Roxie Rapid Data Delivery Engine[ref] HPCC Systems: Introductionto HPCC (High PerformanceComputer Cluster), Author: A.M.Middleton, LexisNexis RiskSolutions, Date: May 24, 201117 September 2013, RDACWG-BDANIST BD-WG and UvA BDAF29
IBM GBS Business Analytics and Optimisation 7-4f41-a92350e5c6374b6d/media&ei yrknUbjMNM liwKQhoCQBQ&usg AFQjCNF Xu6aifcAhlF4266xXNhKfKaTLw&sig2 j8JiFV md5DnzfQl0spVrg&bvm bv.4276178644,d.cGESeptember 2013, RDA-CWG-BDANIST BD-WG and UvA BDAF30
Sep 17, 2013 · Overview NIST Big Data Working Group (NBD-WG) activities and deliverables Proposed Big Data Architecture Framework (BDAF) – Data Models and Big Data Lifecycle – Big Data Infrastructure (BDI) Discussion: Liaison and information exchange with NIST BD-WG 17 September 2013, RDA