Transcription

& JavaEE Platform MonitoringA Good Match?RIGA, 22 Sep 2012 Marek Neumann

Company Facts Jesta Digital is a leading globalprovider of next generationentertainment content andservices for the digital consumer. subsidiary of Jesta Group, adiversified company withholdings in real estate,manufacturing, technology andaviation. home to established brands Jamba, Jamster, iLove andMobizzo and mobile subscription,payment and ad monetizationtechnologies2RIGA, 22 Sep 2012 Marek Neumann

Who am I? more than 10 years experience in various areas of Java and JavaEE6 years work for different consulting companiesJBoss support and training pioneerstrategy and architecture team @ Jesta Digital– technical guidelines, software infrastructure– Application Monitoring is one aspect of our work settled near Berlin with my family (2 kids) spending much of my spare time for marathon training3RIGA, 22 Sep 2012 Marek Neumann

Agenda Jesta Digital application monitoring architecture Performance problems and how we tackled them Zabbix API: Automization of the monitoring configuration using an inhouse application Zabbix API: Automization of the service monitoring within the UltraESB Zabbix Server monitoring in a public cloud Summary4RIGA, 22 Sep 2012 Marek Neumann

Application Monitoring Architecture 5Zabbix 1.8.5 (supported version)Server with passive Java agentsPassive proxy for cloud hostsMySQL 5.5 backendseparate installation for test and internal systems (CI, Staging.)24x7 Monitoring team (SOC) with access to Zabbix and other monitoringtoolsRIGA, 22 Sep 2012 Marek Neumann

Application Monitoring Architecture6RIGA, 22 Sep 2012 Marek Neumann

Application Monitoring Architecture JVMs monitored––––JBoss Application Server 4.3.0 EAP/6.0.0 EAPUltraESB 1.7.1Apache Tomcat 7.x(Bea Weblogic 8.x) WHAT is monitored– basic JVM metrics (heap/perm gen memory usage, garbage collection, filedescriptors.)– business metrics (content index size, subscription reminder sms count.)– host availability (http port check)– database query executions– exception counts– log level counts (WARN, ERROR, FATAL)7RIGA, 22 Sep 2012 Marek Neumann

Application Monitoring Architecture JMX-based architecture (standard way to gather metrics from a JVM byusing queries and requests) implemented by many application server vendors is part of JDK since version 5 Zabbix agent is essential part of the appserver installationsJava Agent„Zabbix Port“JVM8RIGA, 22 Sep 2012 Marek NeumannJMX MBeanServer

Application Monitoring Architecture Zabbix Agent is a modified implementation of former „Zapcat“– extended to support more complicated object structures and to method calls easily deployable in the application server (JBoss, UltraESB, Tomcat.) transformation of Zabbix protocol to JMX syntax and vice-versa local „in-VM“ calls to ensure good performance JMX client is provided in Zabbix 2.0 upwards - no agent is necessaryanymore9RIGA, 22 Sep 2012 Marek Neumann

Application Monitoring Architecture JVM Availability Check– first version was implemented using noData() function– flood of false positives when server performance degraded– changed to simple TCP check with DISASTER alerts (90s VM AVAILABLE st(#5)} 0) &({JVM AVAILABLE st(#4)} 0) &({JVM AVAILABLE st(#3)} 0) &({JVM AVAILABLE st(#2)} 0) &({JVM AVAILABLE st(#1)} 0)10RIGA, 22 Sep 2012 Marek Neumann

Application Monitoring Architecture all templates are provided by S&A team to support infrastructuremonitoring requirements developers can easily add new „business“ monitoring items byimplementing JMX MBeans– no special knowledge of Zabbix is required the configuration process has a lot of manual steps right now– high workload for Operations team11RIGA, 22 Sep 2012 Marek Neumann

Performance problems and how we tackled them Zabbix was introduced at Jesta Digital in 2008 (v1.6)– decision based on the architecture and the frontend capabilities monitoring for a big new customer was required existing monitoring was based on complicated custom implementationsthat nobody wanted to maintain over the last years we faced some severe performance problems let‘s go through our „Zabbix performance learning curve“!12RIGA, 22 Sep 2012 Marek Neumann

Performance problems and how we tackled themPerformance Problem #1 - Virtualized server setup very first installation was on a virtualized server - both Zabbix Server andMySQL backend I/O throughputs were temporarily poor, degraded without any visiblereason server queue was filling quickly, noData() function reported alerts due tothe exhausted queue, delayed item processingSolution: Zabbix database was moved to physical hardware (16 Cores, 32 GB RAM,Linux 64bit) Availability check was changed to simple TCP13RIGA, 22 Sep 2012 Marek Neumann

Performance problems and how we tackled themPerformance Problem #2 - Zabbix Housekeeper was configured to run every hour concurrent write processes blocked during that time (transactiontimeouts), slow frontend queueing problems– template import, host deletion, mass changesSolution: stop Zabbix housekeeping, introduce MySQL partitioning for history uintand trends uint tables deletion of obsolete historical data is now much quicker and has nomeasureable impact on the Zabbix performance partitions are cutted on daily and monthly base14RIGA, 22 Sep 2012 Marek Neumann

Performance problems and how we tackled themPerformance Problem #3 - MySQL configuration (Log Size) Symptom: Zabbix queue filled up without any clear cause, item processingdelayed, no recover without db restart from operational point of view all systems were working correctly– system load, cpu usage, memory, swap15RIGA, 22 Sep 2012 Marek Neumann

Performance problems and how we tackled themSolution: upgrade from 1.8.3 to 1.8.5 good understanding of background processes introduce performance metrics to visualize Zabbix „internal“ performance(thanks to support) cause was related to poor syncer thread performance (persist history)16RIGA, 22 Sep 2012 Marek Neumann

Performance problems and how we tackled them experimented with several MySQL optionsinnodb log file size: transaction logwas set to 5MB before (causing high I/O overhead on the disc)a correct size can be easily calculated (depending on the current MySQLworkload)– -to-calculate-agood-innodb-log-file-size/ the log file size was then increased to 270MB no queue problems afterwards, normal and steady syncer thread usage17RIGA, 22 Sep 2012 Marek Neumann

Performance problems and how we tackled themPerformance Problem #4 - MySQL configuration (Query Cache) Symptom: Zabbix queue filled up without any clear cause, slow but steadyincrease of syncer and poller usage, no self-recovery database restart needed every two months MySQL threads: „Waiting for query cache lock”18RIGA, 22 Sep 2012 Marek Neumann

Performance problems and how we tackled themSolution: decrease the MySQL query cache size limit from 8GB! to 256MB when the cache size is set too high there more and more threadcontention during updates 400.000 results in query cache, so the limit is sufficient long-term graph reveals that the problem is solved:19RIGA, 22 Sep 2012 Marek Neumann

Performance problems and how we tackled themOur „Lessons Learnt“: do not virtualize the database server introduce Zabbix internal monitoring, esp. syncer usage– zabbix[process,history syncer,avg,busy]– Utilization of all history syncer threads more than 50%:{Template Internal avg(600)} 50 adapt the database configuration to your requirements!– Zabbix server itself is quick enough for processing high througputs use MySQL partitioning instead of housekeeper to avoid concurrent writeblocking20RIGA, 22 Sep 2012 Marek Neumann

Zabbix API: Automization configuration changes were a manual process since we introduced Zabbixin 2008 error-prone and time-consuming task for the operations team all templates must be created using a template generator imported into Zabbix using the frontend User Macros were not available at that time - a lot of templates have to begenerated since 1.8 Zabbix API introduces more flexibility when it comes toadministrative tasks POC: use the api to temporarily (de)activate host availibility checks21RIGA, 22 Sep 2012 Marek Neumann

Zabbix API: Automization Overall goal: all administrative tasks can be done without the Zabbixfrontend (read-only access) reduce the maintenance efforts by 70% integration is done with an existing inhouse application– application is managing complete server infrastructure and is the servicerepository for the ESB templates can be created and assigned to different abstraction levels all changes are recorded and can be rolled back easier to change only single values– change the threshold of a trigger– add and remove items22RIGA, 22 Sep 2012 Marek Neumann

Zabbix API: Automization23RIGA, 22 Sep 2012 Marek Neumann

Zabbix API: Automization create/update/deletetemplates create/update/deleteitems/triggers ontemplates (de)activate hosts andavailability checks bulk changessupported to avoidsingle remoteoperations over thenetwork24RIGA, 22 Sep 2012 Marek Neumann

Zabbix API: ESB Monitoring Automization ESB is the central part of the service-oriented architecture in the platform remote communication between software components is done throughthe ESB ESB was integrated into the monitoring long ago––––manual configuration processhuge templates (1:1 mapping template-service)long-lasting configuration updates due to long template import timesoutdated monitoring configuration Requirement: update the monitoring configuration once the underlyingESB gets a new version– no manual intervention should be required25RIGA, 22 Sep 2012 Marek Neumann

Zabbix API: ESB Monitoring Automization Usage of the Zabbix api for integration! Administration can be done either using the web console or the commandline client Templates are located on the disc (JSON structure)26RIGA, 22 Sep 2012 Marek Neumann

Zabbix API: ESB Monitoring Automization{"uz meta": {"parent": "endpoint-item-parent"},"params":{"description": " {.*LIVE.*} endpoint - state READY address count","key ":"jmx[org.adroitlogic.ultraesb.detail:Type Endpoints,Name {.*LIVE.*} ][Details.readyAddressCount]"}}{"uz meta": {"parent": "queue-item-parent"},"params":{"description": "Log queue defaultFault current size","key ":"jmx[com.jamster.infra.appserver.esb:Type Queue,ConnectorName log queue defaultFault][CurrentSize]"}}27RIGA, 22 Sep 2012 Marek Neumann

Zabbix API: ESB Monitoring Automization predefined or custom items are possible– calls in progress, caches, service execution times, endpoints cluster update is also done (Zookeeper-based) graphs and screens can be updated through the api as well!28RIGA, 22 Sep 2012 Marek Neumann

Zabbix API: ESB Monitoring Automization{"uz meta": {"parent": "screens-parent","cluster": "true"},"params": {"hsize":"2","vsize":"7","name":"UltraESB cluster cockpit for installation "Open File Descriptor Count","width":"500", {SHARED ESB DIR}/bin/uterm.sh -configdir {HOME}/ {CLUSTER DIR}/conf -c zr-zu ZABBIX URL -u ZABBIX USER -p ZABBIX PASS -t {HOME}/ {CLUSTER DIR}/conf/hosts.properties -doyw29RIGA, 22 Sep 2012 Marek Neumann

Zabbix API: ESB Monitoring AutomizationVoilà: UltraESB Monitoring Cockpit30RIGA, 22 Sep 2012 Marek Neumann

Zabbix Server monitoring in a public cloud some of our services hosted in a public cloud (Amazon) monitoring principle is similiar to the corporate one– firewall restrictions exist - no access to cloud hosts from corporate network– Zabbix server has no access to DMZ Zabbix developed supported feature - the passive proxy– Zabbix server is polling the Zabbix proxy for data Cloud instances (Apache Tomcat) are „equipped“ with the Zabbix JavaAgents for data retrieval31RIGA, 22 Sep 2012 Marek Neumann

Zabbix Server monitoring in a public cloud Auto-Registration feature:– during startup the instance sends a registerrequest to the proxy („stolen“ from nativeagent){ZBXD01E "request":"active checks","host":„10.20.30.40", "port":65535}32RIGA, 22 Sep 2012 Marek Neumann

Zabbix Server monitoring in a public cloud the instance is created based on the registrationaction all templates are assigned the monitoring is started automatically when the instance shuts down (regularly), ashutdown signal is sent to a self-written server onthe proxy machine using a cronjob the Zabbix server queries allunregistered instances on the proxy machine anddisables them with a Zabbix API call Drawbacks:– crashed instances cannot be unregistered– no host availability checks– no historical data usage due to often VM recreation33RIGA, 22 Sep 2012 Marek Neumann

Summary Zabbix can be an excellent tool for monitoring a huge Enterprise JavaPlatform just a question of the agent (native vs. Java)– JMX and Zabbix are a good marriage easily extensible for custom checks many monitoring tools cannot speak JMX avoid remote calls on the same machine performance problems could be successfully tackled with the help of thesupport– learning curve was really long and exhausting the api is an appropriate way to integrate Zabbix with other systems– reduce the number of tools for the operations team– at least in 1.8 the api lets place for improvements cloud monitoring can be done using some simple workarounds34RIGA, 22 Sep 2012 Marek Neumann

Q&A35RIGA, 22 Sep 2012 Marek Neumann

Application Monitoring Architecture Zabbix Agent is a modified implementation of former „Zapcat“ –extended to support more complicated object structures and to method calls easily deployable in the application server (JBoss, UltraESB, Tomcat.) transformation of Zabb