Transcription

DP-203 Microsoft Azure DataEngineerDay8 – Azure HDInsight1st Aug 2021Vinodkumar Bhovi

DataData StorageAzure Storage AccountsData TransformationAzure Cosmos DBAzure Data FactoryAzure Stream AnalyticsAzure SQLAzure Data LakeAzure DatabricksAzure Synapse Analytics 2021 databag.ai – Proprietary and ConfidentialAzure HDInsight

AgendaWhy Hadoop is Hard?Why Hadoop on cloud?How HDInsight makes Hadoop easy?Important aspects of Hadoop?HDInsight Architecture 2021 databag.ai – Proprietary and Confidential

3 Challenges with Hadoop? 2021 databag.ai – Proprietary and Confidential

HDInsight makes Hadoop easy 2021 databag.ai – Proprietary and Confidential

What is Azure HDInsight?EasyCosteffectiveScalableHDInsight is acloud distribution ofHadoop componentsHDInsightsecureFastManaged 2021 databag.ai – Proprietary and Confidential

HDInsight makes Hadoop easy 2021 databag.ai – Proprietary and Confidential

Important aspects of HDInsightIntegrationServicesVisualStudioPower BIH vicesSecurityMonitoring 2021 databag.ai – Proprietary and Confidential

Hierarchy 2021 databag.ai – Proprietary and Confidential

Demo overview 2021 databag.ai – Proprietary and Confidential

Ambari“Ambari is a Hadoop management platform responsiblefor cluster administration, monitoring andconfiguration” 2021 databag.ai – Proprietary and Confidential

Visualized reportsSystem monitoringChange configurationManage authentication 2021 databag.ai – Proprietary and Confidential

Managed IdentityManaged identities areused by Azure services to authenticate to other Azure servicesthat support Azure AD authentication. 2021 databag.ai – Proprietary and Confidential

Managed IdentityUpload filesAzure virtual machine 2021 databag.ai – Proprietary and ConfidentialAzure Blob Storage

Managed Identity – 2 stepsAuthenticationAuthorization 2021 databag.ai – Proprietary and Confidential

Two types of Managed IdentitySystem-assigned Enable directly on Azure service instance Lifecycle is tied to service instanceUser-assigned Created as a stand alone Azure resource Lifecycle managed separately 2021 databag.ai – Proprietary and Confidential

HDInsight cluster typesHadoop – Batch query and analysis of HDFS stored dataHBase – Processing for large schema less NoSQL dataInteractive Query – In-memory caching for fast Hive queriesKafka – Distributed streaming data platformML Services – Predictive modeling and machine learningSpark – In-memory processing and interactive queriesStrom – Real-time event processing 2021 databag.ai – Proprietary and Confidential

2021 databag.ai – Proprietary and Confidential

Hadoop -Batch query and analysis of HDFS stored data HBase -Processing for large schema less NoSQL data Interactive Query -In-memory caching for fast Hivequeries Kafka -Distributed streaming data platform ML Services -Predictive modeling andmachine learning Spark -In-memory processing and interactivequeries Strom -Real-time event .