Transcription

Cloudera EnterpriseReference Architecturefor Bare MetalDeployments

Important Notice 2010-2019 Cloudera, Inc. All rights reserved.Cloudera, the Cloudera logo, and any other product or service names or slogans contained in thisdocument, except as otherwise disclaimed, are trademarks of Cloudera and its suppliers or licensors, andmay not be copied, imitated or used, in whole or in part, without the prior written permission of Clouderaor the applicable trademark holder.Hadoop and the Hadoop elephant logo are trademarks of the Apache Software Foundation. All othertrademarks, registered trademarks, product names and company names or logos mentioned in thisdocument are the property of their respective owners to any products, services, processes or otherinformation, by trade name, trademark, manufacturer, supplier or otherwise does not constitute or implyendorsement, sponsorship or recommendation thereof by us.Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rightsunder copyright, no part of this document may be reproduced, stored in or introduced into a retrievalsystem, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, orotherwise), or for any purpose, without the express written permission of Cloudera.Cloudera may have patents, patent applications, trademarks, copyrights, or other intellectual propertyrights covering subject matter in this document. Except as expressly provided in any written licenseagreement from Cloudera, the furnishing of this document does not give you any license to these patents,trademarks copyrights, or other intellectual property.The information in this document is subject to change without notice. Cloudera shall not be liable for anydamages resulting from technical errors or omissions which may be present in this document, or fromuse of this document.Cloudera, Inc.395 Page Mill RoadPalo Alto, CA [email protected]: 1-888-789-1488Intl: 1-650-362-0488www.cloudera.comRelease InformationVersion: 6.2Date: 20190404Cloudera Reference Architecture for Bare Metal Deployments 2

Table of ContentsAbstractInfrastructureSystem Architecture Best PracticesJavaRight-size Server ConfigurationsDeployment TopologyPhysical Component ListNetwork SpecificationCloudera ManagerCluster Sizing Best PracticesCluster Hardware Selection Best PracticesNumber of SpindlesDisk LayoutData Density Per DriveNumber of Cores and MultithreadingRAMPower SuppliesOperating System Best PracticesHostname Naming ConventionHostname ResolutionFunctional AccountsTimeName Service CachingSELinuxIPv6iptablesStartup ServicesProcess MemoryKernel and OS TuningFilesystemsCluster ConfigurationTeragen and Terasort Performance BaselineCluster Configuration Best PracticesSecurity IntegrationCommon QuestionsReferencesAcknowledgementsCloudera Reference Architecture for Bare Metal Deployments 3

AbstractAn organization’s requirements for a big-data solution are simple: acquire and combine any amount ortype of data in its original fidelity, in one place, for as long as necessary, and deliver insights to all kinds ofusers, as quickly as possible.Cloudera, an enterprise data management company, introduced the concept of the enterprise data hub(EDH): a central system to store and work with all data. The EDH has the flexibility to run a variety ofenterprise workloads (for example, batch processing, interactive SQL, enterprise search, and advancedanalytics) while meeting enterprise requirements such as integrations to existing systems, robustsecurity, governance, data protection, and management. The EDH is the emerging center of enterprisedata management. EDH builds on Cloudera Enterprise , which consists of the open source ClouderaDistribution including Apache Hadoop (CDH), a suite of management software and enterprise-classsupport.As organizations embrace Hadoop-powered big data deployments, they also want enterprise-gradesecurity, management tools, and technical support--all of which are part of Cloudera Enterprise.Cloudera Reference Architecture documents illustrate example cluster configurations and certifiedpartner products. The RAs are not replacements for o fficial statements of supportability , rather they’reguides to assist with deployment and sizing options. Statements regarding supported configurations inthe RA are informational and should be cross-referenced with the l atest documentation .This document is a high-level design and best-practices guide for deploying Cloudera Enterprise on baremetal.Cloudera Reference Architecture for Bare Metal Deployments 4

InfrastructureSystem Architecture Best PracticesThis section describes Cloudera’s recommendations and best practices applicable to Hadoop clustersystem architecture.JavaCloudera Manager and CDH are certified to run on Oracle JDK. OpenJDK is also supported for ClouderaManager and CDH 5.16 and higher 5.x releases. See U pgrading the JDK for more information.Cloudera distributes a compatible version of the Oracle JDK through the Cloudera Manager repository.Customers are also free to install a compatible version of the Oracle JDK distributed by Oracle.Refer to C DH and Cloudera Manager Supported JDK Versions for a list of supported JDK versions.Right-size Server ConfigurationsCloudera recommends deploying three or four machine types into production: Master Node. Runs the Hadoop master daemons: NameNode, Standby NameNode, YARNResource Manager and History Server, the HBase Master daemon, Sentry server, and the ImpalaStateStore Server and Catalog Server. Master nodes are also the location where Zookeeper andJournalNodes are installed. The daemons can often share single pool of servers. Depending onthe cluster size, the roles can instead each be run on a dedicated server. Kudu Master Serversshould also be deployed on master nodes. Worker Node. Runs the HDFS DataNode, YARN NodeManager, HBase RegionServer, Impalaimpalad, Search worker daemons and Kudu Tablet Servers. Utility Node. Runs Cloudera Manager and the Cloudera Management Services. It can also host aMySQL (or another supported) database instance, which is used by Cloudera Manager, Hive,Sentry and other Hadoop-related projects. Edge Node. Contains all client-facing configurations and services, including gatewayconfigurations for HDFS, YARN, Impala, Hive, and HBase. The edge node is also a good place forHue, Oozie, HiveServer2, and Impala HAProxy. HiveServer2 and Impala HAProxy serve as agateway to external applications such as Business Intelligence (BI) tools.For more information refer to Recommended Cluster Hosts and Role Distribution .Note:The edge and utility nodes can be combined in smaller clusters.Cloudera Reference Architecture for Bare Metal Deployments 5

Deployment TopologyThe graphic below depicts a cluster deployed across several racks (Rack 1, Rack 2, Rack n) .Each host is networked to two top-of-rack (TOR) switches which are in turn connected to a collection ofspine switches which are then connected to the enterprise network. This deployment model allows eachhost maximum throughput, minimize of latency, while encouraging scalability. The specifics of thenetwork topology are described in the subsequent sections.Physical Component ListThe follow table describes the physical components recommended to deploy an EDH hysical serversTwo-socket, 8-14 cores persocket 2 GHz; minimally128 GB RAM.Hosts that housethe various clustercomponents.Based on clusterdesign.NICs10 Gbps Ethernet NICspreferred.Provide the datanetwork servicesfor the cluster.At least one perserver, although twoNICs can be bondedCloudera Reference Architecture for Bare Metal Deployments 6

for additionalthroughput.Internal HDDs500 GB HDD or SSDrecommended foroperating system and logs;HDD for data disks (sizevaries with data volumerequirements)These ensurecontinuity ofservice on serverresets and containthe cluster data.10-24 disks perphysical server.Ethernet ToR/leafswitchesMinimally 10 Gbpsswitches with sufficientport density toaccommodate the cluster.These require enough portsto create a realisticspine-leaf topologyproviding ISL bandwidthabove a 1:4oversubscription ratio(preferably 1:1).Although mostenterprises havemature datanetwork practices,consider building adedicated datanetwork for theHadoop cluster.At least two perrack.Ethernet spineswitchesMinimally 10 Gbpsswitches with sufficientport density toaccommodate incomingISL links and ensurerequired throughput overthe spine (for inter-racktraffic).Sameconsiderations asfor ToR switches.Depends on thenumber of racks.Network SpecificationDedicated Network HardwareHadoop can consume all available network bandwidth. For this reason, Cloudera recommends thatHadoop be placed in a separate physical network with its own core switch.Switch Per RackHadoop supports the concept of rack locality and takes advantage of the network topology to minimizenetwork congestion. Ideally, nodes in one rack should connect to a single physical switch. Two top- ofrack (TOR) switches can be used for high availability. Each rack switch (i.e. TOR switch) uplinks to a coreswitch with a significantly bigger backplane. Cloudera recommends 10 GbE (or faster) connectionsbetween the servers and TOR switches. TOR uplink bandwidth to the core switch (two switches in a HAconfiguration) will often be oversubscribed to some extent.Uplink OversubscriptionHow much oversubscription is appropriate is workload dependent. Cloudera’s recommendation is thatthe ratio between the total access port bandwidth and uplink bandwidth be as close to 1:1 as is possible.Cloudera Reference Architecture for Bare Metal Deployments 7

This is especially important for heavy ETL workloads, and MapReduce jobs that have a lot of data sent toreducers.Oversubscription ratios up to 4:1 are generally fine for balanced workloads, but network monitoring isneeded to ensure uplink bandwidth is not the bottleneck for Hadoop. The following table provides someexamples as a point of reference:Access Port Bandwidth (In Use)48 x 1 GbE 48 Gbit/s24 x 10 GbE 240 Gbit/s48 x 10 GbE 480 Gbit/sUplink Port Bandwidth (Bonded)4 x 10 GbE 40 Gbit/s2 x 40 Gig CFP 80 Gbit/s4 x 40 Gig CFP 160 Gbit/sRatio1.2:13:13:1Important:Do not exceed 4:1 oversubscription ratio. For example, if a TOR has 20 x 10 GbE ports used, the uplinkshould be a t least 50 Gbps. Different switches have dedicated uplink ports of specific bandwidth (often40 Gbps or 100 Gbps) and therefore careful planning needs to be done in order to choose the rightswitch types.Redundant Network SwitchesHaving redundant core switches in a full mesh configuration will allow the cluster to continue operating inthe event of a core switch failure. Redundant TOR switches will prevent the loss of an entire rack ofprocessing and storage capacity in the event of a TOR switch failure. General cluster availability can stillbe maintained in the event of the loss of a rack, as long as master nodes are distributed across multipleracks.AccessibilityThe accessibility of your Cloudera Enterprise cluster is defined by the network configuration and dependson the security requirements and the workload. Typically, there are edge/client nodes that have directaccess to the cluster. Users go through these edge nodes via client applications to interact with thecluster and the data residing there. These edge nodes could be running a web application for real-timeserving workloads, BI tools, or simply the Hadoop command-line client used to submit or interact withHDFS.Cloudera recommends allowing access to the Cloudera Enterprise cluster via edge nodes only. You canconfigure this in the security groups for the hosts that you provision. The rest of this document describesthe various options in detail.Internet ConnectivityClusters that do not require heavy data transfer between the Internet or services outside of the immediatenetwork and HDFS still might need access to services like software repositories for updates or otherlow-volume outside data sources.If you completely disconnect the cluster from the Internet, you block access for software updates whichmakes maintenance difficult.Cloudera Reference Architecture for Bare Metal Deployments 8

Cloudera ManagerCloudera strongly recommends installing CDH using Cloudera Manager (CM). During the CDH installationvia CM there is the choice to install using parcels or native packages. A parcel is a binary distributionformat. Parcels offer a number of benefits including consistency, flexible installation location, installationwithout sudo, reduced upgrade downtime, rolling upgrades, and easy downgrades. Clouderarecommends using parcels, though using packages is supported.Cluster Sizing Best PracticesEach worker node typically has several physical disks dedicated to raw storage for Hadoop. This numberwill be used to calculate the total available storage for each cluster. Also, the calculations listed belowassume 10% disk space allocated for YARN temporary storage. Cloudera recommends allocatingbetween 10-25% of the raw disk space for temporary storage as a general guideline. This can be changedwithin Cloudera Manager and should be adjusted after analyzing production workloads. For example,MapReduce jobs that send little data to reducers allow for adjusting this number percentage downconsiderably.The following table contains example calculations for a cluster that contains 17 worker nodes. Eachserver has twelve 3 TB drives available for use by Hadoop. The table below outlines the Hadoop storageavailable based upon the number of worker nodes:Default Replication FactorRaw StorageHDFS Storage (Configurable)HDFS Unique Storage (default replication factor)MapReduce Intermediate Storage (Configurable)612 TB550.8 TB183.6 TB61.2 TBErasure Coding RS-6-3Raw StorageHDFS Storage (Configurable)HDFS Unique Storage (EC RS-6-3 -- 1.5x overhead)MapReduce Intermediate Storage (Configurable)612 TB550.8 TB367.2 TB61.2 TBErasure Coding RS-10-4Raw StorageHDFS Storage (Configurable)HDFS Unique Storage (EC RS-10-4 -- 1.4x overhead)MapReduce Intermediate Storage (Configurable)612 TB550.8 TB393.4 TB61.2 TBNote:HDFS Unique Storage will vary depending on the amount of data stored in EC directories and the RSpolicies chosen. The tables above are merely examples of how different policies can affect HDFSUnique Storage.Cloudera Reference Architecture for Bare Metal Deployments 9

Note:Compressing raw data can effectively increase HDFS storage capacity.While Cloudera Manager provides tools such as Static Resource Pools, which utilize Linux Cgroups, toallow multiple components to share hardware, in high volume production clusters it can be beneficial toallocate dedicated hosts for roles such as Solr, HBase, and Kafka.Cluster Hardware Selection Best PracticesThis section will give a high level overview of how different hardware component selections will impactthe performance of a Hadoop cluster.Refer to the H ardware Requirements Guide for detailed workload-specific practices.Number of SpindlesTraditionally Hadoop has been thought of as a large I/O platform. While there are many new types ofworkloads being run on Cloudera clusters that may not be as I/O bound as traditional MapReduceapplications, it is still useful to consider the I/O performance when architecting a Cloudera cluster.Unlike the number of cores in a CPU and the density of RAM, the speed at which data can be read from a1spinning hard drive (spindle) has not changed much in the last 10 years . In order to counter the limitedperformance of hard drive read/write operations, Hadoop reads and writes from many drives in parallel.Every additional spindle added per node increases the overall read/write speed of the cluster.Additional spindles also come with the likelihood of more network traffic in the cluster. For the majorityof cases, network traffic between nodes is generally limited by how fast data can be written to or readfrom a node. Therefore, the rule normally follows that with more spindles network speed requirementsincrease.Generally speaking, the more spindles a node has, the lower the cost per TB. However, the larger thequantity of data stored on one node yields a longer re-replication time if that node goes down. Hadoopclusters are designed to have many nodes. It is generally better to have more average nodes than fewersuper nodes. This has a lot to do with both data protection, as well as increased parallelism fordistributed computation engines such as MapReduce and Spark.Lastly, the number of drives per node will impact the number of YARN containers configured for a node.YARN configuration and performance tuning is a complicated topic, but for I/O bound applications, thenumber of physical drives per host may be a limiting factor in determining the number of container slotsconfigured per node.Kafka clusters are often run on dedicated servers that do not run HDFS data nodes or processingcomponents such as YARN and Impala. Because Kafka is a message-based system, fast storage and1SSDs have dramatically changed the persistent storage performance landscape, but the price per GB ofspinning disks is still significantly less than that of SSD storage. As SSDs come down in cost andtechnologies such as I ntel’s Optane enter the market, workloads may swing back towards being CPUbound. Most Cloudera customers are still deploying clusters that store data on spinning hard disks.Cloudera Reference Architecture for Bare Metal Deployments 10

network I/O is critical to performance. Although Kafka does persist messages to disk, it is not generallynecessary to store the entire contents of a Kafka topic log on the Kafka cluster indefinitely. Kafka brokersshould be configured with dedicated spinning hard drives for the log data directories. Using SSDs insteadof spinning disks has not been shown to provide a significant performance improvement for Kafka.Kafka drives should also be configured as RAID 10 because the loss of a single drive on a Kafka brokerwill cause the broker to experience an outage.Disk LayoutFor Master nodes, the following layout is recommended 2 x Disks (capacity at least 500GB) in RAID 1 (software or hardware) for OS and logs 4 x Disks ( 1TB each) in RAID 10 for Database data (see Note) 2 x Disks (capacity at least 1 TB) in RAID 1 (software or hardware) for NameNode metadata 1 x Disk JBOD/RAID 0 for ZooKeeper ( 1TB) (see Note) ZooKeeper disks must be HDD, not SSD 1 x Disk JBOD/RAID 0 for Quorum JournalNode ( 1TB)Note:Ideally databases should be run on an external host rather than running on the master node(s).Note:If customer has experienced fsync delays and other I/O related issues with ZooKeeper, ZooKeeper’sdataDir and dataLogDir can be configured to use separate disks. It’s hard to determine ahead oftime whether this will be necessary; even a small cluster can result in heavy ZooKeeper activity.For Worker nodes, the following layout is recommended: 2x Disks (capacity at least 500GB) in RAID 1 (software or hardware) for OS and logs 15-24 SATA Disks JBOD mode (or as multiple single-drive RAID 0 arrays if using a RAID controllerincapable of doing JBOD passthrough) no larger than 4 TB in capacity. If the RAID Controller hascache, use it for write caching (preferably with battery backup); disable read caching. Follow yourhardware vendor’s best practices where available. For higher performance profile, use 10K RPM SATA or faster SAS drives; these often have lowercapacity but capacity considerations can be offset by adding more data nodes.SAS drives are supported but typically do not provide significant enough performance or reliabilitybenefits to justify the additional costs. Hadoop is designed to be fault-tolerant and therefore drive failurecan easily be tolerated. In order to achieve a good price-point, SATA drives should typically be used.RAID controllers should be configured to disable any optimization settings for the RAID 0 arrays.Data Density Per DriveHard drives today come in many sizes. Popular drive sizes are 1-4 TB although larger drives arebecoming more common. When picking a drive size the following points need to be considered.Cloudera Reference Architecture for Bare Metal Deployments 11

Lower Cost Per TB – Generally speaking, the larger the drive, the cheaper the cost per TB, whichmakes for lower TCO.Replication Storms – Larger drives means drive failures will produce larger re-replication storms,which can take longer and saturate the network while impacting in-flight workloads.Cluster Performance – In general, drive size has little impact on cluster performance. Theexception is when drives have different read/write speeds and a use case that leverages thisgain. MapReduce is designed for long sequential reads and writes, so latency timings aregenerally not as important. HBase can potentially benefit from faster drives, but that isdependent on a variety of factors, such as HBase access patterns and schema design; this alsoimplies acquisition of more nodes. Impala and Cloudera Search workloads can also potentiallybenefit from faster drives, but for those applications the ideal architecture is to maintain as muchdata in memory as possible.Cloudera does not support exceeding 100 TB per data node. You could use 12 x 8 TB spindles or 24 x 4TB spindles. Cloudera does not support drives larger than 8 TB.2Warning:Running CDH on storage platforms other than direct-attached physical disks can provide suboptimalperformance. Cloudera Enterprise and the majority of the Hadoop platform are optimized to providehigh performance by distributing work across a cluster that can utilize data locality and fast local I/O.Refer to the C loudera Enterprise Storage Device Acceptance Criteria Guide for more information aboutusing non-local storage.Number of Cores and MultithreadingOther than cost there is no negative for buying more and better CPUs, however the ROI on additional CPUpower must be evaluated carefully. Here are some points to consider: Cluster Bottleneck – In general, CPU resources (and lack thereof) do not bottleneck MapReduceand HBase. The bottleneck will almost always be drive and/or network performance. There arecertainly exceptions to this, such as inefficient Hive queries. Other compute frameworks likeImpala, Spark, and Cloudera Search may be CPU-bound depending on the workload.Additional Cores/Threads – Within a given MapReduce job, a single task will typically use onethread at a time. As outlined earlier, the number of slots allocated per node may be a function ofthe number of drives in the node. As long as there is not a huge disparity in the number of cores(threads) and the number of drives, it does not make sense to pay for additional cores. Inaddition, a MapReduce task is going to be I/O bound for typical jobs, thus a given thread used bythe task will have a large amount of idle time while waiting for I/O response.Clock Speed – Because Cloudera clusters often begin with a small number of use cases andassociated workloads and grow over time, it makes sense to purchase the fastest CPUsavailable. Actual CPU usage is use case and workload dependent; for instance, computationally2Larger disks offer increased capacity but not increased I/O. Clusters with larger disks can easily result incapacities exceeding 100 TB per-worker, contributing to replication storms mentioned above. Clusterswith larger disks that observe the 100 TB limit end up having fewer spindles which reduces HDFSthroughput.Cloudera Reference Architecture for Bare Metal Deployments 12

intensive Spark jobs would benefit more from faster CPUs than I/O bound MapReduceapplications.Important:Allocate two vCPUs for the operating system and other non-Hadoop use (although this amount mayneed to be higher if additional non-Hadoop applications are running on the cluster nodes, such asthird-party active monitoring/alerting tools). The more services you are running, the more vCPUs will berequired; you will need to use more capable hosts to accommodate these needs.For worker nodes, a mid-range 12-14 core CPU running at 2.4-2.5 GHz would typically provide a goodcost/performance tradeoff. For master nodes, a mid-range 8 core CPU with a slightly faster clock speed(e.g. 2.6 GHz) would suffice. Where available, Simultaneous Multi-Threading implementations should beenabled (for example Intel’s HyperThreading). BIOS settings for CPU and memory should be set toMaximum Performance mode or equivalent.Refer to the H ardware Requirements Guide for detailed workload-specific practices.RAMMore memory is always good and it is recommended to purchase as much as the budget allows.Applications such as Impala and Cloudera Search are often configured to use large amounts of heap, anda mixed workload cluster supporting both services should have sufficient RAM to allow all requiredservices.Refer to the H ardware Requirements Guide for detailed workload-specific practices.Important:Allocate at least 4 GB memory for the operating system and other non-Hadoop use (although thisamount may need to be higher if additional non-Hadoop applications are running on the cluster nodes,such as third-party active monitoring/alerting tools). The more services you are running, the morememory will be required; you will need to use more capable hosts to accommodate these needs.It is critical to performance that the total memory allocated to all Hadoop-related processes (includingprocesses such as HBase) is less than the total memory on the node, taking into account the operatingsystem and non-Hadoop processes. Oversubscribing the memory on a system can lead to the Linuxkernel’s out-of-memory process killer being invoked and important processes being terminated. It canalso be harmful to performance to unnecessarily over-allocate memory to a Hadoop process as this canlead to long Java garbage collection pauses.For optimum performance, one should aim to populate all of the memory channels available on the givenCPUs. This may mean a greater number of smaller DIMMS, but this means that both memory and CPU areoperating at their best performance. Confer with your hardware vendor for defining optimal memoryconfiguration layout.Cloudera Reference Architecture for Bare Metal Deployments 13

Whilst 128 GB RAM can be accommodated, this typically constrains the amount of memory allocated toservices such as YARN and Impala, therefore reducing the query capacity of the cluster. A value of 256GB would typically be recommended with higher values also possible.Power SuppliesHadoop software is designed around the expectation that nodes will fail. Redundant hot-swap powersupplies are not necessary for worker nodes, but should be used for master, utility, and edge nodes.Operating System Best PracticesCloudera currently supports running the EDH platform on several Linux distributions. To receive supportfrom Cloudera, a supported version of the operating system must be in use. The Requirements andSupported Versions guide lists the supported operating systems for each version of Cloudera Managerand CDH.Hostname Naming ConventionCloudera recommends using a hostname convention that allows for easy recognition of roles and/orphysical connectivity. This is especially important for easily configuring rack awareness within ClouderaManager. Using a project name identifier, followed by the rack ID, the machine class, and a machine ID isan easy way to encode useful information about the cluster. For example:acme-test-r01m01This hostname would represent the ACME customer’s test project, rack #1, master node #1.Hostname ResolutionCloudera recommends using DNS for hostname resolution. The usage of /etc/hosts becomescumbersome quickly, and routinely is the source of hard to diagnose problems. /etc/hosts should onlycontain an entry for 127.0.0.1, and localhost should be the only name that resolves to it. The machinename must not resolve to the 127.0.0.1 address. All hosts in the cluster must have forward and reverselookups be the inverse of each other for Hadoop to function properly. An easy test to perform on thehosts to ensure proper DNS resolution is to execute:dig hostname dig –x ip address returned from hostname lookup)For example:dig themis.apache.orgthemis.apache.org.1758 INA140.211.11.105dig -x 140.211.11.105105.11.211.140.in-addr.arpa. 3513 INPTRthemis.apache.org.This is the behavior we should see for every host in the cluster.Cloudera Reference Architecture for Bare Metal Deployments 14

Functional AccountsCloudera Manager and CDH make use of dedicated functional accounts for the associated daemonprocesses. By default these accounts are created as local accounts on every machine in the cluster thatneeds them if they do not already exist (locally or from a directory service, such as LDAP). TheRequirements and Supported Versions guide includes a table showing the UNIX user and groupassociated with each service.Note:Kerberos deployment models (including identity integration with Active Directory) are covered in detailwithin the Authentication documentation .As of Cloudera Manager 5.3, it is also possible to install the cluster in single user mode where all servicesshare a single service account. This feature is provided for customers who have policy requirements thatprevent the use of multiple service accounts. Cloudera does not recommend the use of this featureunless the customer has this requirement, as CDH uses separate accounts to achieve proper securityisolation and therefore removing this feature will reduce the overall security of the installation. Additionalinformation about single user mode can be found in the Cloudera Installation and Upgrade manual:Configuring Single User Mode .TimeAll machines in the cluster need to have the same time and date settings, including ti

security, management tools, and technical support--all of which are part of Cloudera Enterprise. Cloudera Reference Architecture documents illustrate example cluster configurations and certifi