PERFORMANCE TUNING AND OPTIMIZATION OF J2EE APPLICATIONSON THE JBOSS PLATFORMSamuel Kounev, Björn Weis and Alejandro BuchmannDepartment of Computer ScienceDarmstadt University of Technology, tadt.deOver the past couple of years the JBoss application server has established itself as acompetitive open-source alternative to commercial J2EE platforms. Although it has beencriticized for having poor scalability and performance, it keeps gaining in popularity andmarket share. In this paper, we present an experience report with a deployment of theindustry-standard SPECjAppServer2004 benchmark on JBoss. Our goal is to study howthe performance of JBoss applications can be improved by exploiting different deploymentoptions offered by the JBoss platform. We consider a number of deployment alternativesincluding different JVMs, different Web containers, deployment descriptor settings and dataaccess optimizations. We measure and analyze their effect on the overall system performance in both single-node and clustered environments. Finally, we discuss some generalproblems we encountered when running the benchmark under load.1IntroductionThe JBoss Application Server is the world’s most popular open-source J2EE application server. Combining a robust, yet flexible architecture with a free opensource license and extensive technical support from theJBoss Group, JBoss has quickly established itself as acompetitive platform for e-business applications. However, like other open-source products, JBoss has oftenbeen criticized for having poor performance and scalability, failing to meet the requirements for mission-criticalenterprise-level services.In this paper, we study how the performance of J2EEapplications running on JBoss can be improved by exploiting different deployment options offered by the platform. We use SPECjAppServer2004 1 - the new industry standard benchmark for measuring the performanceand scalability of J2EE application servers. However,our goal is not to measure the performance of JBoss ormake any comparisons with other application servers.We rather use SPECjAppServer2004 as an example of1 SPECjAppServer is a trademark of the Standard PerformanceEvaluation Corp. (SPEC). The SPECjAppServer2004 results or findings in this publication have not been reviewed by SPEC, therefore nocomparison nor performance inference can be made against any published SPEC result. The official web site for SPECjAppServer2004 islocated at realistic application, in order to evaluate the effect ofsome typical JBoss deployment and configuration options on the overall system performance. The exact version of JBoss Server considered is 3.2.3, released onNovember 30, 2003. In addition to studying how JBossperformance can be improved, we report on problemswe encountered during deployment of the benchmarkas well as some scalability issues we noticed when testing under load.The rest of the paper is organized as follows. Westart with an overview of SPECjAppServer2004, concentrating on the business domain and workload it models. We then describe the deployment environment inwhich we deployed the benchmark and the experimental setting for our performance analysis. After this westudy a number of different configuration and deployment options and evaluate them in terms of the performance gains they bring. We start by comparing severalalternative Web containers (servlet/JSP engines) thatare typically used in JBoss applications, i.e. Tomcat 4,Tomcat 5 and Jetty. Following this, we evaluate the performance difference when using local interfaces, as opposed to remote interfaces, for communication betweenthe presentation layer (Servlets/JSPs) and the businesslayer (EJBs) of the application. We measure the performance gains from several typical data access optimiza-

tions often used in JBoss applications and demonstratethat the choice of Java Virtual Machine (JVM) has verysignificant impact on the overall system performance.Finally, we report on some scalability and reliability issues we noticed when doing stress eDomainThe SPECjAppServer2004 BenchmarkSPECjAppServer2004 is a new industry standardbenchmark for measuring the performance and scalability of Java 2 Enterprise Edition (J2EE) technologybased application servers.SPECjAppServer2004was developed by SPEC’s Java subcommittee,which includes BEA, Borland, Darmstadt University of Technology, Hewlett-Packard, IBM, Intel,Oracle, Pramati, Sun Microsystems and Sybase.It is important to note that even though someparts of SPECjAppServer2004 look similar toSPECjAppServer2002,SPECjAppServer2004ismuch more complex and substantially different fromprevious versions of SPECjAppServer. It implementsa new enhanced workload that exercises all majorservices of the J2EE platform in a complete end-to-endapplication nFigure 1: SPECjAppServer2004 Business ModelCUSTOMER DOMAINCORPORATE DOMAINOrder Entry ApplicationCustomer, Supplier andParts Information- Place Order- Get Order Status- Get Customer Status- Cancel Order- Register Customer- Determine Discount- Check Credit2.1 SPECjAppServer2004 Business ModelThe SPECjAppServer2004 workload is based on a distributed application claimed to be large enough andcomplex enough to represent a real-world e-businesssystem [Sta04]. The benchmark designers have chosen manufacturing, supply chain management, and order/inventory as the "storyline" of the business problemto be modeled. This is an industrial-strength distributedproblem, that is heavyweight, mission-critical and requires the use of a powerful and scalable infrastructure.The SPECjAppServer2004 workload has been specifically modeled after an automobile manufacturer whosemain customers are automobile dealers. Dealers usea Web based user interface to browse the automobilecatalogue, purchase automobiles, sell automobiles andtrack dealership inventory.As depicted in Figure 1, SPECjAppServer2004’sbusiness model comprises five domains: customer domain dealing with customer orders and interactions,dealer domain offering Web based interface to theservices in the customer domain, manufacturing domain performing "just in time" manufacturing operations, supplier domain handling interactions with external suppliers, and corporate domain managing allcustomer, product, and supplier information. Figure 2shows some examples of typical transactions run inthese domains (the dealer domain is omitted, since itdoes not provide any new services on itself).The customer domain hosts an order entry application that provides some typical online ordering functionality. The latter includes placing new orders, retrievingthe status of a particular order or all orders of a givenManufacturing ApplicationInteractions withSuppliers- Schedule Work Order- Update Work Order- Complete Work Order- Create Large Order- Select Supplier- Send Purchase Order- Deliver Purchase OrderMANUFACTURING DOMAINSUPPLIER DOMAINFigure 2: SPECjAppServer2004 Business Domainscustomer, canceling orders and so on. Orders for morethan 100 automobiles are called large orders.The dealer domain hosts a Web application (calleddealer application) that provides a Web based interfaceto the services in the customer domain. It allows customers, in our case automobile dealers, to keep track oftheir accounts, keep track of dealership inventory, manage a shopping cart, and purchase and sell automobiles.The manufacturing domain hosts a manufacturingapplication that models the activity of production lines inan automobile manufacturing plant. There are two typesof production lines, namely planned lines and large order lines. Planned lines run on schedule and producea predefined number of automobiles. Large order linesrun only when a large order is received in the customerdomain. The unit of work in the manufacturing domainis a work order. Each work order is for a specific numberof automobiles of a certain model. When a work order iscreated, the bill of materials for the corresponding typeof automobile is retrieved and the required parts are

taken out of inventory. As automobiles move throughthe assembly line, the work order status is updated toreflect progress. Once the work order is complete, it ismarked as completed and inventory is updated. Wheninventory of parts gets depleted, suppliers need to belocated and purchase orders (POs) need to be sent out.This is done by contacting the supplier domain, which isresponsible for interactions with external suppliers.2.2 SPECjAppServer2004 Application DesignAll the activities and processes in the five domains described above are implemented using J2EE components (Enterprise Java Beans, Servlets and Java ServerPages) assembled into a single J2EE application that isdeployed in an application server running on the System Under Test (SUT). The only exception is for the interactions with suppliers which are implemented usinga separate Web application called Supplier Emulator.The latter is deployed in a Java-enabled Web server ona dedicated machine. The supplier emulator providesthe supplier domain with a way to emulate the processof sending and receiving purchase orders to/from suppliers.The workload generator is implemented using a multi-threaded Java application calledSPECjAppServer Driver . The latter is designed torun on multiple client machines using an arbitrarynumber of Java Virtual Machines to ensure that ithas no inherent scalability limitations. The driver ismade of two components - Manufacturing Driver andDealerEntry Driver. The manufacturing driver drivesthe production lines (planned lines and large orderlines) in the manufacturing domain and exercises themanufacturing application. It communicates with theSUT through the RMI (Remote Method Invocation)interface. The DealerEntry driver emulates automobiledealers that use the dealer application in the dealerdomain to access the services of the order entryapplication in the customer domain. It communicateswith the SUT through HTTP and exercises the dealerand order entry applications using three operationsreferred to as business transactions:1. Browse - browses through the vehicle cataloguethe SPECjAppServer database. Data access components follow the guidelines in [KB02] to provide maximum scalability and rver2004 is implemented using asynchronous messaging exploiting the Java MessagingService (JMS) and Message Driven Beans (MDBs). Inparticular, the placement and fulfillment of large orders(LOs), requiring communication between the customerdomain and the manufacturing domain, is implementedasynchronously. Another example is the placement anddelivery of supplier purchase orders, which requirescommunication between the manufacturing domain andthe supplier domain. The latter is implemented according to the proposal in [KB02] to address performanceand reliability issues.The throughput of the benchmark is driven by theactivity of the dealer and manufacturing applications.The throughput of both applications is directly related tothe chosen Transaction Injection Rate. The latter determines the number of business transactions generatedby the DealerEntry driver, and the number of work orders scheduled by the manufacturing driver per unit oftime. The summarized performance metric provided after running the benchmark is called JOPS and it denotes the average number of successful JAppServerOperations Per Second completed during the measurement interval.3Experimental SettingIn our experimental analysis, we use two different deployment environments for SPECjAppServer2004, depicted on Figures 3 and 4, respectively. The first one isa single-node deployment, while the second one is aclustered deployment with four JBoss servers. Table 1provides some details on the configuration of the machines used in the two deployment environments. SinceJBoss exhibits different behavior in clustered environment, the same deployment option (or tuning parameter) might have different effect on performance whenused in the clustered deployment, as opposed to thesingle-node deployment. Therefore, we consider bothdeployment environments in our analysis. 0ELW/ 12. Purchase - places orders for new vehicles3. Manage - manages the customer inventory (sellsvehicles and/or cancels open orders)-'%& 773 50,Each business transaction emulates a specific typeof client session comprising multiple round-trips to theserver. For example, the browse transaction navigatesto the vehicle catalogue Web page and then pages atotal of thirteen times, ten forward and three backwards.A relational database management system (DBMS)is used for data persistence and all data access operations use entity beans which are mapped to tables in'ULYHU 0DFKLQH-%RVV 6HUYHU'DWDEDVH 6HUYHUFigure 3: Single-Node DeploymentJBoss is shipped with three standard server configurations: ’minimal’, ’default’ and ’all’. The ’default’ configuration is typically used in single-server environments,

0ELW/ 1 773 50, -'%&'ULYHU 0DFKLQH'DWDEDVH 6HUYHU-%RVV 6HUYHU &OXVWHUFigure 4: Clustered DeploymentTable 1: Deployment Environment DetailsNodeDriver MachineSingle JBoss ServerJBoss Cluster NodesDatabase ServerDescriptionSPECjAppServer Driver &Supplier Emulator2 x AMD XP2000 CPU2 GB, SuSE Linux 8JBoss 3.2.3 Server2 x AMD XP2000 CPU2 GB, SuSE Linux 8JBoss 3.2.3 Server1 x AMD XP2000 CPU1 GB, SuSE Linux 8Popular commercial DBMS2 x AMD XP2000 CPU2 GB, SuSE Linux 8while the ’all’ configuration is meant for clustered environments. We use the ’default’ configuration as a basisfor the single JBoss server in our single-node deployment, and the ’all’ configuration as a basis for theJBoss servers in our clustered deployment. For detailson the changes made to the standard server configurations for deploying SPECjAppServer2004, the reader isreferred to [Wei04].The driver machine hosts the SPECjAppServer2004driver and the supplier emulator. All entity beansare persisted in the database. The DBMS we useruns under SQL isolation level of READ COMMITTEDby default. For entity beans required to run under REPEATABLE READ isolation level, pessimisticSELECT FOR UPDATE locking is used.This isachieved by setting the row-locking option in the configuration file.We adhere to the SPECjAppServer2004 Run Rulesfor most of the experiments in our study. However, sincenot all deployment options that we consider are allowedby the Run Rules, in some cases we have to slightly deviate from the latter. For example, when evaluating the performance of different entity bean commit options, insome cases we assume that the JBoss server has exclusive access to the underlying persistent store (storing entity bean data), which is disallowed by the RunRules. This is acceptable, since our aim is to evaluatethe impact of the respective deployment options on performance, rather than to produce standard benchmarkresults to be published and compared with other results.In both the single-node and the clustered deployment, all SPECjAppServer2004 components (EJBs,servlets, JSPs) are deployed on all JBoss servers. Inthe clustered deployment, client requests are evenlydistributed over the JBoss servers in the cluster. ForRMI requests (from the manufacturing driver), loadbalancing is done automatically by JBoss. Unfortunately, this is not the case for HTTP requests, sinceJBoss is shipped without a load balancer for HTTPtraffic. Therefore, we had to modify the DealerEntrydriver to evenly distribute HTTP requests over the cluster nodes. Although we could have alternatively useda third-party load balancer, we preferred not to do this,since its performance would have affected our analysiswhose main focus is on JBoss.Another problem we encountered, was that access tothe ItemEnt entity bean was causing numerous deadlock exceptions. The ItemEnt bean represents itemsin the vehicle catalogue and is accessed very frequently. However, it was strange that it was causingdeadlocks, since the bean is only read and never updated by the benchmark. Declaring the bean as readonly alleviated the problem. After additionally configuring it to use JBoss’ Instance-Per-Transaction Policy, theproblem was completely resolved. The Instance-PerTransaction Policy allows multiple instances of an entitybean to be active at the same time [Sco03]. For eachtransaction a separate instance is allocated and therefore there is no need for transaction based locking.4Performance AnalysisWe now present the results from our experimental analysis. We look at a number of different JBoss deployment and configuration options and evaluate their impact on the overall system performance. As a basis for comparison a standard out-of-the-box configuration is used with all deployment parameters set to theirdefault values. Hereafter, we refer to this configuration as Standard (shortened "Std"). For each deployment/configuration setting considered, its performanceis compared against the performance of the standardconfiguration. Performance is measured in terms of thefollowing metrics: CPU utilization of the JBoss server(s) and thedatabase server Throughput of business transactions Mean response times of business transactions

By business transactions, here, we mean the threedealer operations, Purchase, Manage and Browse (asdefined in section 2.2) and the WorkOrder transactionrunning in the manufacturing domain.It is important to note that the injection rate at whichexperiments in the single-node environment are conducted, is different from the injection rate for experiments in the clustered environment. A higher injectionrate is used for cluster experiments, so that the fourJBoss servers are utilized to a reasonable level. Disclosing the exact injection rates at which experimentsare run, is not allowed by the SPECjAppServer2004 license agreement.that the response time improvement was most significant for the Browse transaction. The reason for thisis that, while Purchase and Manage comprise only 5round-trips to the server, Browse comprises a total of17 round-trips each going through the Web container.As mentioned, the effect on transaction throughput wasnegligible. This was expected since, for a given injection rate, SPECjAppServer2004 has a target throughputthat is normally achieved unless there are some systembottlenecks.4.1 Use of Different Web ContainersJBoss allows a third-party Web container to be integrated into the application server framework. The mostpopular Web containers typically used are Tomcat [Apa]and Jetty [Mor]. By default Tomcat 4.1 is used. As ofthe time of writing, the integration of Tomcat 5 in JBossis still in its beta stage. Therefore when using it, numerous debug messages are output to the console andlogged to files. This accounts for significant overheadthat would not be incurred in production deployments.For this reason, we consider two Tomcat 5 configurations, the first one out-of-the-box and the second onewith debugging turned off. It is the latter that is morerepresentative and the former is only included to showthe overhead of debugging.Since the manufacturing application does not exercise the Web container, it is not run in the experiments ofthis section. Only the dealer application and the orderentry application are run, so that the stress is put onthe benchmark components that exercise the Web container.We consider four different configurations:1. Tomcat 4.1 (shortened Tom4)2. Tomcat 5 out-of-the-box (shortened Tom5)3. Tomcat 5 without debugging (shortened Tom5WD)4. Jetty4.1.1Analysis in Single-node EnvironmentComparing the four Web containers in the single-nodedeployment, revealed no significant difference with respect to achieved transaction throughput and averageCPU utilization. With exception of Tom5WD, in all configurations, the measured CPU utilization was about90% for the JBoss server and 45% for the databaseserver. The Tom5WD configuration exhibited 2% lowerCPU utilization both for the JBoss server and thedatabase server. As we can see from Figure 6, thelower CPU utilization resulted in Tom5WD achieving thebest response times, followed by Jetty. It stands outFigure 5: Legend for diagrams on Figures 6 and 7Figure 6: Mean response times with different Web containers in the single-node environment4.1.2Analysis in Clustered EnvironmentThe four Web containers exhibited similar behavior inthe clustered deployment. The only exception was forthe Tom5 configuration, which in this case was performing much worse compared to the other configurations.The reason for this was that, all four servers in theclustered deployment were logging their debug messages to the same network drive. Since, having fourservers, means four times more debug information tobe logged, the shared logging drive turned into a bottleneck. Figure 7 shows the response times of the threebusiness transactions. Note that this diagram uses adifferent scale.4.2 Use of Local vs. Remote InterfacesIn SPECjAppServer2004, by default, remote interfacesare used to access business logic components (EJBs)from the presentation layer (Servlets/JSPs) of the application. However, since in both the single-node andclustered environments, presentation components areco-located with business logic components, one canalternatively use local interfaces. This eliminates the

Figure 7: Mean response times with different Web containers in the clustered environmentever, in this case, the delays resulting from calls to theEJB layer were small compared to the overall responsetimes. This is because in clustered environment, thereis additional load balancing and synchronization overhead which contributes to the total response times. Asa result, delays from calls to the EJB layer constitutesmaller portion of the overall response times than in thesingle-node case. Therefore, the performance improvement from using local interfaces was also smaller thanin the single-node case. Figure 10 shows the measuredresponse times of business transactions. The effect ontransaction throughput and CPU utilization was negligible.overhead of remote network communication and is expected to improve performance. In this section, we evaluate the performance gains from using local interfacesto access EJB components from Servlets and JSPs inSPECjAppServer2004. Note that our standard configuration (i.e. Std) uses remote interfaces.4.2.1Analysis in Single-node EnvironmentFigure 9 shows the transaction response times with remote vs. local interfaces in the single-node deployment.As we can see, using local interfaces led to responsetimes dropping by up to 35%. Again, most affected wasthe Browse transaction. In addition to this, the use oflocal interfaces led to lower CPU utilization of the JBossserver. It dropped from 82% to 73%, when switchingfrom remote to local interfaces. Again, differences intransaction throughputs were negligible.Figure 8: Legend for diagrams on Figures 9 and 10Figure 10: Mean response times with remote vs. localinterfaces in the clustered environment4.3 Data Access OptimizationsIn this section, we measure the effect of several data access configuration options on the overall system performance. The latter are often exploited in JBoss applications to tune and optimize the way entity beans are persisted. We first discuss these options and then presentthe results from our analysis.4.3.1Description of Considered OptimizationsEntity Bean Commit Options: JBoss offers fourentity bean persistent storage commit options, i.e. A,B, C and D [Sco03, BR01]. While the first three are defined in the EJB specification [Sun02], the last one isa JBoss-specific feature. Below we quickly summarizethe four commit options:Figure 9: Mean response times with remote vs. localinterfaces in the single-node environment4.2.2Analysis in Clustered EnvironmentAs expected, switching to local interfaces brought performance gains also in the clustered deployment. How- Commit Option A - the container caches entity beanstate between transactions. This option assumesthat the container has exclusive access to the persistent store and therefore it doesn’t need to synchronize the in-memory bean state from the persistent store at the beginning of each transaction. Commit Option B - the container caches entity beanstate between transactions, however unlike optionA, the container is not assumed to have exclusive access to the persistent store. Therefore, thecontainer has to synchronize the in-memory entity

bean state at the beginning of each transaction.Thus, business methods executing in a transactioncontext don’t see much benefit from the containercaching the bean, whereas business methods executing outside a transaction context can take advantage of cached bean data. Commit Option C - the container does not cachebean instances. Commit Option D - bean state is cached betweentransactions as with option A, but the state is periodically synchronized with the persistent store.Note that the standard configuration (i.e. Std) usescommit option B for all entity beans. We considertwo modified configurations exploiting commit optionsA and C, respectively. In the first configuration (calledCoOpA), commit option A is used. While in the singlenode deployment commit option A can be configured forall SPECjAppServer2004 entity beans, doing so in theclustered deployment, would introduce potential datainconsistency. The problem is that changes to entitydata in different nodes of the cluster are normally notsynchronized. Therefore, in the clustered deployment,commit option A is only used for the beans which arenever modified by the benchmark (the read-only beans),i.e. AssemblyEnt, BomEnt, ComponentEnt, PartEnt,SupplierEnt, SupplierCompEnt and POEnt.The second configuration that we consider(called CoOpC), uses commit option C for allSPECjAppServer2004 entity beans.Entity-Bean-With-Cache-InvalidationOption:We mentioned that using commit option A in clusteredenvironment may introduce potential data inconsistency. This is because each server in the clusterwould assume that it has exclusive access to thepersistent store and cache entity bean state betweentransactions. Thus, when two servers update an entityat the same time, the changes of one of them couldbe lost. To address this problem, JBoss provides theso-called cache invalidation framework [Sac03]. Thelatter allows one to link the entity caches of serversin the cluster, so that when an entity is modified, allservers who have a cached copy of the entity are forcedto invalidate it and reload it at the beginning of nexttransaction. JBoss provides the so-called "StandardCMP 2.x EntityBean with cache invalidation" optionfor entities that should use this cache invalidationmechanism [Sac03]. In our analysis, we considera configuration (called EnBeCaIn), which exploitsthis option for SPECjAppServer2004’ entity beans.Unfortunately, in the clustered deployment, it wasnot possible to configure all entity beans with cacheinvalidation, since doing so led to numerous rollbackexceptions being thrown when running the benchmark.The latter appears to be due to a bug in the cacheinvalidation mechanism. Therefore, we could only applythe cache invalidation mechanism to the read-onlybeans, i.e. AssemblyEnt, BomEnt, ComponentEnt,PartEnt, SupplierEnt, SupplierCompEnt and POEnt.Since, read-only beans are never modified, this shouldbe equivalent to simply using commit option A withoutcache invalidation. However, as we will see later,results showed that there was a slight performancedifference.Instance-Per-Transaction Policy: JBoss’ defaultlocking policy allows only one instance of an entity beanto be active at a time. Unfortunately, the latter oftenleads to deadlock and throughput problems. To addressthis, JBoss provides the so-called Instance Per Transaction Policy, which eliminates the above requirement andallows multiple instances of an entity bean to be activeat the same time [Sco03]. To achieve this, a new instance is allocated for each transaction and it is droppedwhen the transaction finishes. Since each transactionhas its own copy of the bean, there is no need for transaction based locking.In our analysis, we consider a configuration (calledInPeTr ), which uses the instance per transaction policyfor all SPECjAppServer2004 entity beans.No-Select-Before-InsertOptimization: JBossprovides the so-called No-Select-Before-Insert entitycommand, which aims to optimize entity bean createoperations [Sco03]. Normally, when an entity beanis created, JBoss first checks to make sure that noentity bean with the same primary key exists. Whenusing the No-Select-Before-Insert option, this check isskipped. Since, in SPECjAppServer2004 all primarykeys issued are guaranteed to be unique, there isno need to perform the check for duplicate keys. Toevaluate the performance gains from this optimization, we consider a configuration (called NoSeBeIn),which uses the No-Select-Before-Insert option for allSPECjAppServer2004 entity beans.Sync-On-Commit-OnlyOptimization: Anotheroptimization typically used is the so-called Sync-OnCommit-Only container configuration option. It causesJBoss to synchronize changes to entity beans with thepersistent store, only at commit time. Normally, dirtyentities are synchronized whenever a finder is called.When using Sync-On-Commit-Only, synchronizationis not done when finders are called, however it is stilldone after deletes/removes, to ensure that cascadedeletes work correctly. We consider a configurationcalled SyCoOnly, in which Sync-On-Commit-Only isused for all SPECjAppServer2004 entity beans.Prepared Statement Cache: In JBoss, by default,prepared statements are not cached. To improve perfor-

mance one can configure a prepared statement cacheof an arbitrary size [Sco03]. We consider a configurationcalled PrStCa, in which a prepared statement cache ofsize 100 is used.In summary, we are going to comp

PERFORMANCE TUNING AND OPTIMIZATION OF J2EE APPLICATIONS ON THE JBOSS PLATFORM Samuel Kounev, Björn Weis and Alejandro Buchmann Department of Computer Science Darmstadt University of Technology, Germany {skounev,weis,buchmann} Over the past couple of years the J