Transcription

Hitachi Analytics Infrastructure for SplunkReference ArchitectureMK-SL-204-01November 2021

Legal Notices 2021 Hitachi Vantara LLC. All rights reserved.No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including copying and recording,or stored in a database or retrieval system for commercial purposes without the express written permission of Hitachi, Ltd., or Hitachi Vantara LLC(collectively “Hitachi”). Licensee may make copies of the Materials provided that any such copy is: (i) created as an essential step in utilization of theSoftware as licensed and is used in no other manner; or (ii) used for archival purposes. Licensee may not make any other copies of the Materials.“Materials” mean text, data, photographs, graphics, audio, video and documents.Hitachi reserves the right to make changes to this Material at any time without notice and assumes no responsibility for its use. The Materials containthe most current information available at the time of publication.Some of the features described in the Materials might not be currently available. Refer to the most recent product announcement for information aboutfeature and product availability, or contact Hitachi Vantara LLC at https://support.hitachivantara.com/en us/contact-us.html.Notice: Hitachi products and services can be ordered only under the terms and conditions of the applicable Hitachi agreements. The use of Hitachiproducts is governed by the terms of your agreements with Hitachi Vantara LLC.By using this software, you agree that you are responsible for:1.Acquiring the relevant consents as may be required under local privacy laws or otherwise from authorized employees and other individuals; and2.Verifying that your data continues to be held, retrieved, deleted, or otherwise processed in accordance with relevant laws.Notice on Export Controls. The technical data and technology inherent in this Document may be subject to U.S. export control laws, including theU.S. Export Administration Act and its associated regulations, and may be subject to export or import regulations in other countries. Reader agrees tocomply strictly with all such regulations and acknowledges that Reader has the responsibility to obtain licenses to export, re-export, or import theDocument and any Compliant Products.Hitachi and Lumada are trademarks or registered trademarks of Hitachi, Ltd., in the United States and other countries.AIX, AS/400e, DB2, Domino, DS6000, DS8000, Enterprise Storage Server, eServer, FICON, FlashCopy, GDPS, HyperSwap, IBM, Lotus, MVS, OS/390, PowerHA, PowerPC, RS/6000, S/390, System z9, System z10, Tivoli, z/OS, z9, z10, z13, z14, z/VM, and z/VSE are registered trademarks ortrademarks of International Business Machines Corporation.Active Directory, ActiveX, Bing, Edge, Excel, Hyper-V, Internet Explorer, the Internet Explorer logo, Microsoft, the Microsoft corporate logo, theMicrosoft Edge logo, MS-DOS, Outlook, PowerPoint, SharePoint, Silverlight, SmartScreen, SQL Server, Visual Basic, Visual C , Visual Studio,Windows, the Windows logo, Windows Azure, Windows PowerShell, Windows Server, the Windows start button, and Windows Vista are registeredtrademarks or trademarks of Microsoft Corporation. Microsoft product screen shots are reprinted with permission from Microsoft Corporation.All other trademarks, service marks, and company names in this document or website are properties of their respective owners.Copyright and license information for third-party and open source software used in Hitachi Vantara products can be found at .html.FeedbackHitachi Vantara welcomes your feedback. Please share your thoughts by sending an email message to [email protected] To assist therouting of this message, use the paper number in the subject and the title of this white paper in the text.Revision historyChangesSupport for Whitley processors on Hitachi Advanced Server DS120 G2 and Hitachi Advanced ServerDateNovember 1, 2021DS220 G2 servers.Initial releaseJuly 23, 2020Hitachi Analytics Infrastructure for Splunk — Hitachi Vantara2

Reference ArchitectureHitachi Analytics Infrastructure for Splunk provides guidelines for deploying Splunk 8. Usethis guide to implement an architecture that maximizes the return on your investment.Splunk Enterprise is a software technology that is used for monitoring, searching, analyzing,and visualizing machine generated data in real time. It monitors and reads different types oflog files and stores data as events in indexers. This tool allows you to visualize data indashboards.Splunk can consist of many different components such as the following: Indexer — This component stores the data. Indexer Cluster Master — This component coordinates the activities of an indexer clusterand distributes application configurations to the indexers. Forwarder — This component gathers the data and sends it to the search head. Universal Forwarder — This lightweight component processes the runs on existingmachines and forwards the data to the indexers. Heavy Forwarder — This is a lightweight version of Splunk Enterprise that gathers dataand forwards it to the indexers. It can store and manipulate the data before forwardingit. Search Head — This component is an instance of Splunk that distributes searches to theindexers. Search Head Captain — When configuring the Splunk deployment with a search cluster,this component coordinates job and replication activities among the search heads. Search Head Deployer — This component distributes applications and configurations tothe search head cluster members. Deployment Server — This component distributes applications and configurations to othercomponents, primarily forwarders. License Master — This component handles Splunk Enterprise licensing. HTTP Event Collector (HEC) — This component provides a means for Splunk to reviewdata over HTTP or HTTPS.Reference ArchitectureHitachi Analytics Infrastructure for Splunk — Hitachi Vantara3

Splunk can be deployed in multiple formats: standalone node, clustered node, and distributedwith multiple independent nodes. It can consist of single-site or multi-site configurations.Some common deployments are as follows: Single Server Deployment — One server gathers the data, stores the data, and is used tosearch the data. Separate forwarders can be deployed. Distributed Indexer Non-Clustered Deployment — This deployment has a single searchhead, with multiple indexers working independent of each other. The forwarder forwardsdata directly to the individual indexers. Distributed Indexer Clustered Deployment — This deployment uses an indexer clustermaster that forwards data directly to the indexer cluster master which then passes thedata to the indexers. Distributed Indexer Clustered Deployment with a Search Head Cluster — Multiple searchheads are added to the indexer cluster.Reference ArchitectureHitachi Analytics Infrastructure for Splunk — Hitachi Vantara4

The following figure shows an example layout for a distributed non-cluster indexer with asingle search head deployment.Reference ArchitectureHitachi Analytics Infrastructure for Splunk — Hitachi Vantara5

Splunk indexed data is stored in buckets, which are directories containing the data and itsindexes. Splunk storage is built around the concepts of data temperature and has thefollowing five data bucket tiers: Hot data — Data that is actively being written. Warm data — Data that is active but not being written to. Warm data uses the samestorage as hot data. When a trigger is reached, either index size or indexer restart, andthe hot data is rolled in place to a warm bucket. Cold data — After a condition is reached, data is moved from a warm bucket to a coldbucket. This move can be across different storage devices. Frozen Data — After a condition is reached, the data is moved to a frozen bucket. Frozendata is not searchable. Then default behavior is to delete frozen data. You can archive thefrozen data for future retrieval. Thawed Data — Data retrieved from the frozen bucket. Depending on how the data isarchived, retrieving this data can take a long time.The following figure shows the data flow through the system.Reference ArchitectureHitachi Analytics Infrastructure for Splunk — Hitachi Vantara6

Data flow sequenceData flow sequenceThe sequence of the data flow through the system is as follows:1. Forwarders send data to Splunk.2. The data goes to an index on the indexer nodes. This stores the data in a bucket on thehot tier.3. When certain conditions are met, Splunk performs an automatic logical move of datafrom the hot tier to the warm tier. This move keeps the data on the same storage device.4. When another set of conditions are met, Splunk performs an automatic move of datafrom warm to cold. This move can be a physical move to different storage devices.5. When the next set of conditions are met, the data moves to frozen storage. The defaultbehavior is to delete this data. If it is being moved, its usually to different storage devicesand could be remote. Frozen data cannot be queried.6. To retrieve the data, Splunk uses a manual process to move the data from frozen tothawed and, when necessary, back again to frozen.Other Splunk actions include the following: A query comes in from the search head asking the indexers to retrieve and process data. An indexer pulls data from buckets in the hot, warm, cold, and thawed tiers to resolve thequery.Designing and sizing a cluster is complex and depends on the following factors: Data digestion rate Data retention rate Number of concurrent queries High availability and disaster recover requirements Query response requirementsThese and many other factors must be considered for any cluster design and architecture.Data is moved from one tier to the next. To guarantee that you can still write data to a tier, thewrite performance of each tier must match the overall write performance of the previous tier.If the hot data is written at 500 MBps, the system must be able to write cold data at 500 MBpsor the hot/warm tier will fill up. In this example, the frozen tier must also be written at 500MBps or the cold tier will fill up.The hot/warm tier will have more reads than the cold tier and the frozen tier is not read innormal processing. When taking this into account, the combined read/write performance of atier can be lower than the previous tiers.See Splunk Validated Architectures, Splunk Enterprise Capacity Planning Manual, andSplunk Distributed Deployment Manual for details on designing a cluster and different clusterlayouts.Note: This reference architecture does not cover Splunk SmartStore, runningSplunk in Containers, or any external storage.Reference ArchitectureHitachi Analytics Infrastructure for Splunk — Hitachi Vantara7

Key solution componentsKey solution componentsThese are the key components of this solution.Hardware componentsThe following table describes standard configurations for indexers. See your Hitachi Vantarasales representative for a complete list.HardwareDS120 G2 tieredDS120 G2 FlashDS220 TieredDetails 2 Intel 6342 (24C, 2.7G, 220W) 2 PSU 1600w AC Platinum x2 8 32 GB 3200 RDIMM 2 256 GB m2 drives 1 QS-3916 RAID 16i 1 Intel VROC Standard License 1 Mellanox CX-6 Lx EN Dual Port 25 GbE LP QSFP2 2 SSD 3DWPD 1.92 TB 10 SFF SAS HDD 10K RPM 2.4 TB 2 Intel 6342 (24C, 2.7G, 220W) 2 PSU 1600w AC Platinum x2 8 32 GB 3200 RDIMM 2 256 GB m2 drives 1 Intel VROC Premium license 1 Mellanox CX-6 Lx EN Dual Port 25 GbE LP QSFP2 12 NVMe 3DWPD 2.0 TB 2 Intel 6342 (24C, 2.7G ,220W) 2 PSU 1600w AC Platinum x2 8 32 GB 3200 RDIMM 2 256 GB m2 drives 1 SAS3916 4G RAID Mezzanine 1 Intel VROC Premium license 1 Mellanox CX-6 Lx EN Dual Port 25 GbE LP QSFP2Reference ArchitectureHitachi Analytics Infrastructure for Splunk — Hitachi Vantara8

Software componentsHardwareDetails 2 SSD 3DWPD 1.92 TB 20 SFF SAS HDD 10K RPM 2.4 TB 2 Intel 6342 (24C, 2.7G, 220W) 2 PSU 1600w AC Platinum x2 8 32 GB 3200 RDIMM 2 256 GB m2 drives 1 Mellanox CX-6 Lx EN Dual Port 25 GbE LP QSFP2 1 Intel VROC Premium License 24 NVMe 3DWPD 2.0 TBCisco Nexus 93180YC-FXswitch 2 ToR switches for data network per rackCisco Nexus 3332 switch 2 Aggregate data networks as neededCisco Nexus 92348 switch 1 Management network switch per rackPower Supply Units 6 units per rack determined by regionRack 1 rackDS220 FlashSoftware componentsThe following table lists the solution software components.SoftwareVersionRed Hat Enterprise Linux (RHEL)8.2SUSE Linux Enterprise Server (SLES)15 SP2Splunk Enterprise8.2.1Solution designThis section describes the design used for this solution.Reference ArchitectureHitachi Analytics Infrastructure for Splunk — Hitachi Vantara9

Storage architectureStorage architectureSplunk storage configuration depends on the individual deployment. Some of theconsiderations are as follows: Data size of each temperature tier Performance requirements for each temperature tier Retention for each temperature tier Storage devices used for each temperature tier The Splunk component deployed on the nodeThe following table lists the recommended storage types for different components.UsageStorage TypeDescriptionSearch HeadSSD or HDDSSDs are recommended. The storage should supportat least 800 sustained IOPS and at least 300 GB ofdedicated storage.Indexer Hot/Warm TierSSD or NVMeThis is the primary storage area in standarddeployments. For availability purposes this storageshould be RAID. Hot and warm tiers share the samestorage area as RAID devices. SSD or NVMe drivesare recommended.Indexer: ColdTiersSSD/HDD, SAN,NAS, networkfile systemsThis data is not used as often and can have lowerperformance requirements. This allows more storageoptions.Indexer: FrozenstorageSAN, NAS,network filesystems, HDD,archival devicesFrozen data is archived from the system. The defaultaction for frozen data is to be deleted.Indexer:ThawedStorageSAN, NAS,network filesystems, HDDThawed data storage requirements are similar to thecold data requirements, except that thawed datastorage is usually short-lived and then deleted.ForwardersAnyDepending on your deployment forwarders can bedeployed on existing devices with no extra storagerequirements.Forwarders can also have their own devices andstore data.Reference ArchitectureHitachi Analytics Infrastructure for Splunk — Hitachi Vantara10

Application architectureApplication architectureSizing a Splunk solution is deployment specific. See Splunk's capacity planningdocumentation for information on sizing. The following table shows a starting point formachine figurationMid-rangeconfigurationHigh performanceSearch Head AnyForwardersAnySizing 12 physical CPU cores, or 24 vCPUs at 2GHz 12 GB RAM 10/25 Gb NIC 24 physical CPU cores, or 48 vCPUs at 2GHz 64 GB RAM 10/25 Gb NIC 48 physical CPU cores, or 96 vCPU at 2GHz 128 GB RAM 10/25 Gb NIC 16 physical CPU cores, or 32 vCPUs at 2GHz orgreater speed per core 12 GB RAM 10/25 Gb NICDepending on your deployment forwarders can bedeployed on existing devices with no extra storagerequirements.Forwarders can also have their own devices and storedata.The actual number of machines depends on the following factors: Data indexing volume Number of searches Total data size Replication factor Deployment architecture Number of sitesReference ArchitectureHitachi Analytics Infrastructure for Splunk — Hitachi Vantara11

Application architectureThe following table provides a reference starting point based on the number of users and thevolume of new data.Number ofUsers162448Indexing Volume 1TB per dayIndexing Volume 1 TBto 2 TB per dayIndexing Volume 2 TBto 3 TB per day 2 search heads 2 search heads 2 search heads 4 indexers 10 indexers 15 indexers 2 search heads 2 search heads 3 search heads 6 indexers 12 indexers 18 indexers 2 search heads 3 search heads 3 search heads 7 indexers 14 indexers 21 indexersReference ArchitectureHitachi Analytics Infrastructure for Splunk — Hitachi Vantara12

Network architectureNetwork architectureThis solution uses two networks, a data network, and an out-of-band management network.The data network is uplinked to the rest of the client network. Because forwarders aredeployed near the site where the data is generated, they use the client network to connectback to the indexers. The following figure illustrates this configuration.Reference ArchitectureHitachi Analytics Infrastructure for Splunk — Hitachi Vantara13

Engineering validationEngineering validationTest hardwareThis test was performed using one search head, three indexers, and multiple machines forthe test driver. Its purpose is to validate the hardware, not the performance of the system.The configuration is listed in the following table.UseSearchheadIndexerMachine Type1 DS220 G23 DS220 G2Configuration 2 m2 NVMe drives RAID 1 for boot 6 SSD for storage 521 GB memory 1 Dual Port 25 GB NIC 2 m2 NVMe RAID 1 drives for boot 512 GB memory 1 Dual Port 25 GB NICStorage option 1 4 NVMe drives, RAID 0Storage option 2 6 SSD, RAID 0Storage option 3 Testdrivers 1 test driver node 4 physical forwardnodes 3 virtual machines6 SAS HDD, RAID 0Multiple configurations using both physical machinesand virtual machinesTest methodologyThe purpose of this test was to generate a load for a small Splunk deployment. Differentstorage architectures were used to validate the design feasibility. Because large computerswere used for the test, they were often idle during the test with many of their resourcesunder-utilized.Reference ArchitectureHitachi Analytics Infrastructure for Splunk — Hitachi Vantara14

Engineering validationThe test harness used the following configuration: 3 nodes used as indexers. 1 node used as a search head. 1 node used as a test controller node. 7 nodes used as data generators/forwarders. Each node had 50 forwarders running on each node. 350 total forwarders were used.Data generation. Each test generated data for 20 minutes. No data met the requirements to move from hot/warm tier to cold storage.A single type of storage was used for Splunk hot, warm, and cold storage. Run 1 used NVMe storage. Run 2 used SSD storage. Run 3 used HDD storage. For all runs, the storage performance was well beyond the Splunk recommendedminimum of 1200 IOPS.Searches. These tests ran with no searches being performed. These tests were repeated with searches being performed for every 10,000 recordsgenerated.Reference ArchitectureHitachi Analytics Infrastructure for Splunk — Hitachi Vantara15

Engineering validationThe following figure illustrates the test configuration.Test resultsWhen results display stored data, an estimated value is displayed. This estimate is basedupon extending the rate to show the sustained data stored and calculated based onuncompressed data.Splunk is typically used for log or machine generated data that is in text format. Theforwarders send the data in a compressed format and the indexer stores it in a compressedformat. It is very common to see a 95% reduction in size.The test results displayed are calculated by the test tool kit.Note: These results are useful to show that the system is working; however, theydo not provide performance information to compare the different configurations.Because of the size of the system being used, simplicity of the test, and the dataset size, these results are skewed.Results without performing searchesThe following figures show the results of the test for all three storage types when searcheswere not performed.Reference ArchitectureHitachi Analytics Infrastructure for Splunk — Hitachi Vantara16

Engineering validationThe following figure shows the data indigested per day.The following figure shows the average number of disk writes.The following figure shows the average CPU usage. As you can see the system CPU usageis low.Reference ArchitectureHitachi Analytics Infrastructure for Splunk — Hitachi Vantara17

Engineering validationThe following figure shows the average inbound network traffic.Results when performing searchesThe following figures show the results of the tests for all three storage types when searcheswere performed over 10,000 events. Again, these are not performance results. Instead theyare to verify that the system works.The following figure shows the data ingested per day.The following shows the average number of disk writes.Reference ArchitectureHitachi Analytics Infrastructure for Splunk — Hitachi Vantara18

Product descriptionsThe following figure shows the average CPU usage. While the CPU usage is higher whenperforming searches, most of the cores are still idle.The following figure shows the average inbound network traffic.Product descriptionsThis is information about the hardware and software components used in this solution forHitachi Analytics Infrastructure for Splunk.Splunk EnterpriseSplunk Enterprise is a data platform that allows you to investigate, monitor, analyze and acton your data with ease for enhanced security and operational efficiency.Reference ArchitectureHitachi Analytics Infrastructure for Splunk — Hitachi Vantara19

Hitachi Advanced Server DS120 G2Hitachi Advanced Server DS120 G2With support for two Intel Xeon Scalable processors in just 1U of rack space, the HitachiAdvanced Server DS120 G2 delivers exceptional compute density. It provides flexiblememory and storage options to meet the needs of converged and hyperconvergedinfrastructure solutions, as well as for dedicated application platforms such as internet ofthings (IoT) and data appliances.The Intel Xeon Scalable processor family is optimized to address the growing demands ontoday’s IT infrastructure. The server provides 32 slots for high-speed DDR4 memory, allowingup to 4 TB memory capacity with RDIMM population (128 GB 32) or 8 TB (512 GB 16) ofIntel Optane Persistent Memory. DS120 G2 supports up to 12 hot-pluggable, front-sideaccessible 2.5-inch non-volatile memory express (NVMe), serial-attached SCSI (SAS), serialATA (SATA) hard disk drive (HDD), or solid-state drives (SSD). The system also offers 2onboard M.2 slots.With these options, DS120 G2 can be flexibly configured to address both I/O performanceand capacity requirements for a wide range of applications and solutions.Hitachi Advanced Server DS220 G2With a combination of two Intel Xeon Scalable processors and high storage capacity in a 2Urack-space package, Hitachi Advanced Server DS220 G2 delivers the storage and I/O tomeet the needs of converged solutions and high-performance applications in the data center.The Intel Xeon Scalable processor family is optimized to address the growing demands ontoday’s IT infrastructure. The server provides 32 slots for high-speed DDR4 memory, allowingup to 4 TB memory capacity with RDIMM population (128 GB 32) or 8TB (512 GB 16)with Intel Optane Persistent Memory population.DS220 G2 comes in three storage configurations to allow for end user flexibility. The firstconfiguration supports 24 2.5-inch non-volatile memory express (NVMe) drives, the secondsupports 24 2.5-inch serial-attached SCSI (SAS), serial-ATA (SATA) and up to 8 NVMedrives, and the third supports 12 3.5-inch SAS or SATA and up to 8 NVMe drives. All theconfigurations support hot-pluggable, front-side-accessible drives as well as 2 optional 2.5inch rear mounted drives. The DS220 G2 delivers high I/O performance and high capacity fordemanding applications and solutions.Cisco Nexus switchesThe Cisco Nexus switch product line provides a series of solutions that make it easier toconnect and manage disparate data center resources with software-defined networking(SDN). Leveraging the Cisco Unified Fabric, which unifies storage, data and networking(Ethernet/IP) services, the Nexus switches create an open, programmable networkfoundation built to support a virtualized data center environment.Reference ArchitectureHitachi Analytics Infrastructure for Splunk — Hitachi Vantara20

Hitachi Unified Compute Platform AdvisorHitachi Unified Compute Platform AdvisorHitachi Unified Compute Platform Advisor (UCP Advisor) is a comprehensive cloudinfrastructure management and automation software that enables IT agility and simplifies day0-N operations for edge, core, and cloud environments. The fourth-generation UCP Advisoraccelerates application deployment and drastically simplifies converged and hyperconvergedinfrastructure deployment, configuration, life cycle management, and ongoing operations withadvanced policy-based automation and orchestration for private and hybrid cloudenvironments.The centralized management plane enables remote, federated management for the entireportfolio of converged, hyperconverged, and storage data center infrastructure solutions toimprove operational efficiency and reduce management complexity. Its intelligent automationservices accelerate infrastructure deployment and configuration, significantly minimizingdeployment risk and reducing provisioning time and complexity, automating hundreds ofmandatory tasks.Red Hat Enterprise LinuxUsing the stability and flexibility of Red Hat Enterprise Linux, reallocate your resourcestowards meeting the next challenges instead of maintaining the status quo. Delivermeaningful business results by providing exceptional reliability on military-grade security. UseEnterprise Linux to tailor your infrastructure as markets shift and technologies evolve.SUSE Linux Enterprise High Availability ExtensionCompete more effectively though improved uptime, better efficiency, and acceleratedinnovation using SUSE Linux Enterprise Server. This is a versatile server operating systemfor efficiently deploying highly available enterprise-class IT services in mixed IT environmentswith performance and reduced risk.SUSE Linux Enterprise Server was the first Linux operating system certified for use with SAPHANA. It remains the operating system of choice for the vast majority of SAP HANAcustomers.Reference ArchitectureHitachi Analytics Infrastructure for Splunk — Hitachi Vantara21

Hitachi VantaraCorporate HeadquartersContact Information2535 Augustine DriveUSA: 1-800-446-0744Santa Clara, CA 95054 USAGlobal: 1-858-547-4526HitachiVantara.com tact

Hitachi Analytics Infrastructure for Splunk provides guidelines for deploying Splunk 8. Use this guide to implement an architecture that maximizes the return on your investment. Splunk Enterprise is a software technology that is used for monitoring, searching, analyz