Transcription

CA NSMInside Event Management and AlertManagementr11.2 SP2

This documentation and any related computer software help programs (hereinafter referred to as the"Documentation") are for your informational purposes only and are subject to change or withdrawal by CA at any time.This Documentation may not be copied, transferred, reproduced, disclosed, modified or duplicated, in whole or in part,without the prior written consent of CA. This Documentation is confidential and proprietary information of CA and maynot be used or disclosed by you except as may be permitted in a separate confidentiality agreement between you andCA.Notwithstanding the foregoing, if you are a licensed user of the software product(s) addressed in the Documentation,you may print a reasonable number of copies of the Documentation for internal use by you and your employees inconnection with that software, provided that all CA copyright notices and legends are affixed to each reproduced copy.The right to print copies of the Documentation is limited to the period during which the applicable license for suchsoftware remains in full force and effect. Should the license terminate for any reason, it is your responsibility to certifyin writing to CA that all copies and partial copies of the Documentation have been returned to CA or destroyed.TO THE EXTENT PERMITTED BY APPLICABLE LAW, CA PROVIDES THIS DOCUMENTATION "AS IS" WITHOUTWARRANTY OF ANY KIND, INCLUDING WITHOUT LIMITATION, ANY IMPLIED WARRANTIES OF MERCHANTABILITY,FITNESS FOR A PARTICULAR PURPOSE, OR NONINFRINGEMENT. IN NO EVENT WILL CA BE LIABLE TO THE END USEROR ANY THIRD PARTY FOR ANY LOSS OR DAMAGE, DIRECT OR INDIRECT, FROM THE USE OF THIS DOCUMENTATION,INCLUDING WITHOUT LIMITATION, LOST PROFITS, LOST INVESTMENT, BUSINESS INTERRUPTION, GOODWILL, ORLOST DATA, EVEN IF CA IS EXPRESSLY ADVISED IN ADVANCE OF THE POSSIBILITY OF SUCH LOSS OR DAMAGE.The use of any software product referenced in the Documentation is governed by the applicable license agreement andis not modified in any way by the terms of this notice.The manufacturer of this Documentation is CA.Provided with "Restricted Rights." Use, duplication or disclosure by the United States Government is subject to therestrictions set forth in FAR Sections 12.212, 52.227-14, and 52.227-19(c)(1) - (2) and DFARS Section252.227-7014(b)(3), as applicable, or their successors.Copyright 2010 CA. All rights reserved. All trademarks, trade names, service marks, and logos referenced hereinbelong to their respective companies.

CA Product ReferencesThis document references the following CA components and products: CA 7 Workload Automation CA Access Control CA ADS (CA ADS) CA Advanced Systems Management (CA ASM) CA Cohesion Application Configuration Manager (ACM) CA ASM2 Backup and Restore CA eHealth Performance Manager CA Jobtrac Job Management (CA Jobtrac JM Workstation) CA NSM CA NSM Job Management Option (CA NSM JMO) CA San Manager CA Scheduler Job Management (CA Scheduler JM) CA Security Command Center (CA SCC) CA Service Desk CA Service Desk Knowledge Tools CA Software Delivery CA Spectrum Infrastructure Manager CA Virtual Performance Management (CA VPM)

Contact CAContact Technical SupportFor your convenience, CA provides one site where you can access theinformation you need for your Home Office, Small Business, and Enterprise CAproducts. At http://ca.com/support, you can access the following: Online and telephone contact information for technical assistance andcustomer services Information about user communities and forums Product and documentation downloads CA Support policies and guidelines Other helpful resources appropriate for your productProvide FeedbackIf you have comments or questions about CA product documentation, you cansend a message to [email protected] you would like to provide feedback about CA product documentation, completeour short customer survey, which is also available on the CA Support website,found at http://ca.com/docs.

ContentsChapter 1: Introduction to Events, Correlation, and Alerts11Event, Correlation, and Alerts Overview . 11Event Management . 11Advanced Event Correlation . 12Alert Management System. 13Related Publications . 13Chapter 2: Managing Enterprise Events with Event Management17Event Management System . 17What Is an Event? . 18Event Sources . 18Reading Syslog Messages . 19Life Cycle of an Event . 19Event Managers and Agents . 20Event Management Configuration . 21Configure a Linux Manager and UNIX/Linux Agents . 23Configure a Windows Manager and UNIX/Linux Agents . 23Add Agent Machines to the Administrative Configuration . 24Verify Event Agent Installation . 25Authorize Users to Run Commands . 25Configure the Event Agent . 26Non-Root Event Agent . 29High Availability . 30How You Configure Event Management in a Cluster Environment . 30Event Policy . 31Message Records . 31Message Actions . 35Using Wildcards to Build Message . 41Define a Calendar . 41Using Regular Expressions . 42Using Variables to Enhance the Current Action . 47Use Back-Quote Processing in a Message Action . 51Create Multiple Message Actions . 52Replicate Message Records and Actions . 53Restrict Message Actions . 53Test Policy by Simulating Messages . 56View Event Messages . 62Contents 5

Message Formats . 63Manage Console Messages . 68Held Messages . 70Console Log Files . 72Store and Forward . 73SAF Configuration. 73Send Notification . 74Reports from the Console Log . 77SNMP Traps . 78Support for SNMP Version 3 Traps . 78How catrap Issues Traps . 82How catrapd Formats Traps . 82Enable Automatic Formatting of Traps . 83Binary and Hex Octet String Varbinds . 83Secure Event Management . 84Users Authorized to Run Commands . 84Authorize Users to Run Commands . 85Access to EM Database Tables . 85Console Views . 86Event Policy Packs . 92Message Record/Action Policy Packs . 92Advanced Event Correlation Policy Packs . 93Chapter 3: Improving Event Processing with Advanced Event Correlation97Advanced Event Correlation . 97High Availability . 98Why Use AEC? . 98How AEC Works . 99Alert Management Integration . 100Event Definitions . 100Configure AEC . 100Start the IDE Policy Editor . 102Start the Web Policy Editor . 103Tutor Pane . 105Event Pick List . 105Components of a Correlation Rule . 105Event Pipeline Rule Components . 106Boolean Logic Rule Components . 110Timing Parameters . 113Tokens . 115Internal Tokens . 116User-Defined Tokens . 1166 Inside Event Management and Alert Management

Global Constants. 120Calendar Support . 121Template Rules . 121Example:Template Rule . 121Regular Expressions . 122Impact Analysis . 124Impact Events . 125Aggregate Impact Messages . 126Implement AEC. 127Deploy Policy . 127Check the AEC Engine Status . 129Check Policy Status and Utilization . 129Event Log Player . 129Advanced Template String Editor . 130Advanced Configuration . 130Override Maturity . 130Suppress Processed Messages . 130Reformat Processed Messages . 131Reset Rules Dynamically . 131Correlation among Rules and RC Suppression Flag . 132Flexible Configuration of Certain Reported Fields . 132Event Filtering . 133AND Concept for Pipeline . 134Calendar Support . 134Event Counters and Thresholds . 134Event Sequencing . 135Generate Root Cause for Every Matched Event . 135Restart Timer on Match . 135Rule Chaining. 135Rolling Event Window . 136Examples: AEC Applications . 137Ping Failure (Single Rule) . 137Ping Failure (Multiple Rules) . 141Operator Server Shutdown (Three Item Pipeline) . 146Operator Service Shutdown (Multiple Tokens) . 151Chapter 4: Monitoring Your Enterprise with the Alert Management System157Alert Management System . 157What Are Alerts? . 158Alert Sources . 158Life Cycle of an Alert . 159Alert Management Configuration . 159Contents 7

Event Agent, Event and Alert Manager . 160Multiple Event Agents with Multiple Event Managers . 161Multiple Agents, DSM, Event Agent, Event Manager . 161Event Manager and Alert Manager . 161High Availability . 161How Alert Management Works . 162Define Alert Policy . 163Display Attributes . 165User Actions . 167Action Menus . 169Escalation Policies . 171Alert Global Definition . 174Make Remote Nodes Available to the Alert Global Definition . 176User Data . 177Alert Queues . 178Alert Classes . 179Define EMS and AEC Policy for Alerts. 185Message Policy for Alerts . 186Correlation Rules for Alerts . 189Alerts in the Management Command Center . 191Alert Queue . 191Chart of Alert Statistics. 195For a Managed Object . 195Maintain Alert Policy . 196Export and Import AMS Policies . 196Archive Alerts . 197Purge Alerts . 197caamsalertcsv Command—Create a CSV File . 198Integrate with Unicenter Service Desk . 200How the Integration with Service Desk Works. 200Scenarios . 201Affected End User . 202Service Desk Tags in Event and AEC Policy . 202Sample Policy with Service Desk Tags . 204Configuration of AMS and Service Desk . 204Manage Service Desk Requests . 211Diagnostics and Troubleshooting . 218Cannot Delete an Alert Policy . 218Error Messages Appear When Viewing Alerts for a Managed Object . 219User Action Does Not Run . 220Access Denied to Event Messages and Alerts. 221Cannot Display Alerts for a Managed Object . 2218 Inside Event Management and Alert Management

Security Error Occurs When Closing an Alert . 221AMS Cannot Close a Service Desk Ticket . 222AMS Closes Alerts When Service Desk Tickets Have a Status Change . 222No Synchronization Between Alert Closure and Service Desk Ticket Closure . 223Alert Management Does Not Always Create Service Desk Tickets . 223Index225Contents 9

Chapter 1: Introduction to Events,Correlation, and AlertsThis section contains the following topics:Event, Correlation, and Alerts Overview (see page 11)Related Publications (see page 13)Event, Correlation, and Alerts OverviewThis chapter provides a brief introduction to the Event Management System(EMS), Advanced Event Correlation (AEC), and the Alert Management System(AMS). The remaining chapters explain how to use these componentsthroughout your enterprise to: Monitor and consolidate event activity from a variety of sources Group associated events for further processing Focus on and manage the highest severity eventsEach chapter contains detailed explanations and examples.Note: The examples in this guide focus on the Unicenter MCC interface. You canalso perform some of the same actions with Unicenter Management Portal.Unicenter MP provides a framework for accessing enterprise management data,but not for generating the data. For example, you can view and acknowledgeevents, but not define message records and actions.Event ManagementThe Event Management System is the focal point for managing enterprise eventsfrom a variety of sources throughout your network. Through the console log, youcan monitor event activity and immediately respond to events as they occur. Byfiltering messages that appear on each console, you can retrieve specificinformation about a particular node, user, or workstation.By defining console log views, you can restrict message access to authorizedusers and user groups. By defining console view objects to the database, you canfilter messages from the console log, thereby limiting access to sensitivemessages.Chapter 1: Introduction to Events, Correlation, and Alerts 11

Event, Correlation, and Alerts OverviewBy defining calendars, you can establish date and time controls for automatedevent processing. Determining a course of action based on when an event occurscan be critical to its proper handling.By defining message record and action profiles, you can identify events that areimportant to your operation and define the special processing that UnicenterNetwork and Systems Management (CA NSM) performs when encounteringthem. You can enhance your message record and action policy by using AEC toidentify a set of events that you want to monitor and correlate, and what actionsshould be performed if correlation exists or does not exist.Advanced Event CorrelationAdvanced Event Correlation, an extension to EMS, provides a powerful eventcorrelation, root cause, and impact analysis capability. When used with existingCA NSM features, it can increase the quality and reduce the quantity ofinformation reported on the console log. It groups associated events for furtherprocessing. For example, you can suppress events, combine multiple events intoone, extract data from events, reformat events, and detect the absence ofscheduled events.Root cause analysis helps you drastically reduce the number and frequency ofevents seen by console operators, eliminate message flooding, and reduce falsenotifications.Impact analysis helps you alert users of an impending problem, thus reducingthe load on your help desk. It also helps you initiate failover or recoveryprocedures for the dependent systems, or alert operations staff that they neednot address a particular problem.To use AEC, you must first identify a set of events that you want to monitor andcorrelate and identify actions to be performed if correlation exists or does notexist. The events to be monitored are reported to the console log as messagesthat act as input to AEC and are intercepted and analyzed. Then you configureAEC to act on the input messages it receives to produce the desired output,which are the messages that are actually sent to the console log. AEC usescorrelation rules to analyze the input messages in relation to each other andidentify the root cause messages from those incoming messages. Once AEC hasidentified a root cause message, it applies the formatting specified in thecorrelation rule and reports the resulting message to the console log.12 Inside Event Management and Alert Management

Related PublicationsAlert Management SystemThe Alert Management System, a tool for organizing and tracking the mostimportant events in an enterprise or logical segment of an enterprise, lets youfocus on and manage the highest severity IT events. It provides tools for definingalert policies and multiple panes in the Unicenter MCC for viewing alerts.To use AMS, you must first define policies that control how alerts are displayedand indicate which event messages are alerts. You do this by defining alertprofiles, creating message record actions for alerts, and defining AEC correlationrules for alerts. The alert policies define configuration settings for all alerts,group alerts for viewing in the Unicenter MCC, and more. The message recordactions and correlation rules indicate which serious situations lead to alertcreation.After defining alert policies, you can view and manage alerts in the ManagementCommand Center. You can view all alerts, alerts of a specific type, and alertsassociated with a managed object.AMS also lets you link to Unicenter Service Desk, which is a customer supportapplication that manages calls and IT assets, tracks problem resolutions, andshares corporate knowledge. Interaction with the Service Desk reduces theworkload of the customer support staff because some manual tasks areeliminated. For example, you can open, update, and close Service Desk requestsautomatically when an AMS alert is created, escalated, or closed.Related PublicationsThe following guides provide information that you will find useful. Most areavailable on the CA NSM installation media.Administration GuideIs intended for use by system administrators and contains generalinformation and procedures about how to secure, customize, configure, andmaintain CA NSM after installation and implementation. Individual chaptersdescribe the components that are included with or that can be integratedwith your CA NSM installation.Agent Technology Support for SNMPv3Provides information about how Agent Technology can take advantage of theSNMPv3 protocol. Documents how the security information is handled on themanager and agent side as well as how it is applied to the managed systems.SNMPv3 configuration and usage details are provided in this guide.Chapter 1: Introduction to Events, Correlation, and Alerts 13

Related PublicationsCA ProceduresContains procedures and processes for all components of CA NSM, includingWorldView, Agent Technology, Enterprise Management, Event Management,CAICCI, Data Scoping, Discovery, Notification Services, Wireless Messaging,Security Management, and CA NSM Job Management Option.CA ReferenceContains commands, parameters, and environment variables for allcomponents of CA NSM, including Advanced Event Correlation, AgentTechnology, Enterprise Management, Event Management, CAICCI, DataScoping, Discovery, Notification Services, Wireless Messaging, SecurityManagement, CA NSM Job Management Option, and WorldView.Implementation GuideContains architecture considerations, pre-installation tasks, installationinstructions, post-installation configuration information, and implementationscenarios. Appendixes include in-depth information about DistributedIntelligence Architecture (DIA), the MDB, and the CA High AvailabilityService (HAS) for cluster aware environments. This guide is intended forusers who are implementing CA NSM on a new system.Inside Active Directory ManagementProvides general information, installation scenarios, and configurationprocedures for Active Directory Management.Inside Event Management and Alert ManagementProvides detailed information about Event Management (message recordsand actions), Advanced Event Correlation, and Alert Management.Inside the Performance AgentContains detailed information about the configuration and use of thePerformance Agent.Inside Systems ManagementDescribes systems management from the CA NSM architecture perspective.The guide describes the different layers (WorldView, Management Layer,Monitoring Layer) and associated components, for example: DistributedState Machine (DSM), Unicenter Configuration Manager, dashboards, and soon.Inside Systems MonitoringExplores how to use and configure the system agents of CA NSM to monitorthe system resources in your environment. The chapters guide you throughthe process of configuring and optimizing the agent for your specialrequirements.14 Inside Event Management and Alert Management

Related PublicationsInside Systems PerformanceContains detailed information about the three architectural layers ofSystems Performance, and provides guidance in the deployment,configuration, use, and best practices of the Systems Performancecomponents.MDB OverviewProvides a generic overview of the Management Database (MDB), a commonenterprise data repository that integrates CA product suites. The MDBprovides a unified database schema for the management data stored by allCA products (mainframe and distributed). The MDB integrates managementdata from all IT disciplines and CA products. The guide includesimplementation considerations for the database systems that support theMDB and information specific to the CA NSM implementation of the MDB.MIB Reference GuideProvides detailed information about each MIB attribute of the CA NSMsystem agents.Migration GuideProvides detailed upgrade and migration instructions. This guide is onlyavailable on the CA Support website: http://ca.com/supportProgramming GuideProvides details for constructing applications by CA development teams andby third parties and their clients. The guide is intended for developers whouse one or more of the application programming interfaces (APIs) in the SDKto develop applications for use with CA NSM. Key among these APIs are theWorldView API, the Agent Technology API, and the Enterprise ManagementAPI.Readme FilesProvides information about known issues and information discovered afterCA NSM publication. The following readme files are available: The CA NSM r11.2 SP2 for UNIX and Linux readme The CA NSM r11.2 SP2 Windows readme The Unicenter Management Portal readmeRelease NotesProvides information about operating system support, system requirements,new and changed features, published fixes, international suppor

CA Service Desk . Unicenter MP provides a framework for accessing enterprise management data, but not for generating the data. For example, you can view and acknowledge events, but