Transcription

Benchmarking the Session Initiation Protocol (SIP)Yueqing Zhang, Arthur Clouet, Oluseyi S. Awotayo, Carol DavidsVijay K. GurbaniIllinois Institute of TechnologyEmail: [email protected] Laboratories, Alcatel-LucentEmail: [email protected]—Measuring and comparing performance of an Internet multimedia signaling protocol across varying vendorimplementations is a challenging task. Tests to measure theperformance of the protocol as it is exhibited on a vendor deviceand conditions of the laboratory setup to run the tests haveto be normalized such that no special favour is accorded to aparticular vendor implementation. In this paper, we describe aperformance benchmark to measure the performance of a devicethat includes a Session Initiation Protocol (SIP) proxy function.This benchmark is currently being standardized by the InternetEngineering Task Force (IETF). We implemented the algorithmthat has been proposed in the IETF to measure the performanceof a SIP server. We provide the test results of running thebenchmark on Asterisk, the popular open source SIP privatebranch exchange (PBX).I. I NTRODUCTIONThe Session Initiation Protocol, (SIP [1]) is an IETFstandardized protocol for initiating, maintaining and disconnecting media sessions. The protocol been adopted by manysectors of the telecommunications industry: IP Private BrancheXchanges (PBX’es) and SIP Session Border Controllers(SBCs) are used for Business VoIP phone services as wellin the delivery of Next Generation 911 services over theEmergency Services IP Network (ESINet [2]); LTE mobilephone systems are designed to the SIP protocol and replace thecircuit switched telephone service currently provided over cellular networks; SIP-to-SIP and SIP-PSTN-SIP ecosystems areused to reduce dependence on and supplement the traditionallong distance trunk circuits of the Public Switched TelephoneNetwork (PSTN). Many commercial systems and solutionsbased on SIP are available today to meet the industry’srequirements. As a result, there is a strong need for a vendorneutral benchmarking methodology to allow a meaningfulcomparison of SIP implementations from different vendors.The term performance has many meanings in many different contexts. To minimize ambiguity and provide a levelfield to interpret the results in, we conduct our work in theBenchmarking Working Group (BMWG 1 ) within the InternetEngineering Task Force (IETF 2 ). BMWG does not attemptto produce benchmarks in live, operational networks for thesimple reason that conditions in such networks cannot becontrolled as they would be in a laboratory environment.Furthermore, benchmarks produced by BMWG are vendor1 http://datatracker.ietf.org/wg/bmwg/charter/2 http://www.ietf.org978-1-4799-8575-3/15/ 31.00 c 2015 IEEEneutral and have universal applicability to a given technologyclass.The specific work on SIP in the BMWG consists of twoInternet-Drafts, a terminology draft [3] and a methodologydraft [4]. The methodology draft uses the established terminology from the terminology draft to define concrete testcases and to provide an algorithm to benchmark the test casesuniformly. We note that the latest versions of the terminologyand methodology drafts are awaiting publication as a RequestFor Comment (RFC) documents. The work we describe inthis paper corresponds to earlier versions of these drafts (morespecifically to version -09 of these drafts [5], [6]).The remainder of this paper is organized as follows. SectionII contains the problem statement. Section III describes relatedwork. Section IV contains a brief background on the SIPprotocol. Section V describes the associated benchmarkingalgorithm from [6] and the script that we wrote to implementthe algorithm. This script is available for public download atthe URL indicated in [7]. Section VI contains a descriptionof the test-bed. Section VII describes the test harness and thebase results that can be obtained without the SIP server in themix, and Section VIII takes a look at the results when a SIPserver is added to the ecosystem. Section IX describes ourplans for future work.II. P ROBLEMSTATEMENTPerformance can be defined in many ways. Generally theterm is used to describe the rate of consumption of systemresources. Time or processing latency is often the metric usedto quantify the resource consumption. The probability that anoperation or transaction will succeed is another metric thathas been proposed. The metrics are generally collected whilethe Device Under Test (DUT) is processing a well-definedoffered load. The characteristics of the offered load — whetheror not media is being used, the specific transport used fortesting, codecs used, etc. — are an important part of the metricdefinition.In this study we use a different type of metric and adifferent type of load. Loads are chosen whose individualcalls have a constant arrival rate and a constant duration. Thevalue assigned to a given load is the number of calls persecond, also referred to as session attempt rate. The loads arenot necessarily designed to emulate any naturally occurringoffered load, rather they are designed to be easily reproduced

and capable of exercising the DUT to the point of failure. We“test to failure” looking for the performance knee or breakpoint of the DUT while applying a well calibrated pre-definedload. The measure of performance is the value of the highestoffered load that when applied to the DUT produces zerofailures, the next highest rate attempted having produced atleast one failure.The rationale for such an approach has to do with the natureof SIP servers, the applications they are used to create andthe environment in which they are deployed. SIP is used tocreate a diverse and growing variety of applications. Therewill be a diverse set of user communities each with its owncharacteristic use pattern. The characteristics of the offeredloads will differ and are not easily predicted. Testing to zerofailure using a standard pre-defined load is a reproducible wayto learn about how a system behaves under a steady load. Theapproach resembles the use of statistical measures such asmeans and standard deviations to predict future behavior. It isassumed that the resulting metric will be useful as a predictorof behavior under more realistic loads.III. R ELATED WORKThere is a growing body of research related to variousaspects of the performance of SIP proxy servers. In manyof these, performance is measured by the rate at which theDUT consumes resources when presented with a given workload. SIPstone [8] describes a benchmark by which to measurethe request-handling capacity of a SIP proxy or cluster ofproxies. The work outlines a method for creating an offeredload that will exercise the various components of the proxy.A mix of registrations and call initiations, both successful andunsuccessful, are described and the predicted performance ismeasured by the time it takes the proxy to process the variousrequests. SPEC SIP [9], is a software benchmark product thatseeks to emulate the type of traffic offered to a SIP-enabledVoice of IP service provider’s network on which the serversplay the role of SIP proxies and registrars. The metric definedby SPEC SIP is the number of users that complete 99.99%of their transactions successfully. Successful transactions aredefined to be those that end before the expiration of therelevant SIP timers and with the appropriate 200, 302 and 487final responses. Cortes et al. [10] define performance in termsof processing time, memory allocation, CPU usage, threadperformance and call setup time.A common theme in these studies is the creation of arealistic traffic load. In contrast, the work described in thispaper defines a way to use simple, easily produced (and reproducible) loads to benchmark the performance of SIP proxiesin a controlled environment. Such benchmarks can be used tocompare the performance across vendor implementations.IV. SIP BACKGROUNDThe base SIP protocol defines five methods, of which four— REGISTER, INVITE, ACK and BYE — are used in ourwork. The REGISTER method registers a user with the SIPnetwork while the INVITE method is used to establish aFigure 1: SIP REGISTER and subsequent INVITE methodssession. A third method, ACK, is used as the last methodin a 3-way handshake. BYE is used to tear down a session.SIP ecosystem consists of SIP user agent clients (UACs)and user agent servers (UASs), registrars and proxies (seeFigure 1). A UAS registers with a registrar, and a proxy,possibly co-located with the registrar, routes requests from aUAC towards the UAS when a session is established. Proxiestypically rendezvous the UAC with the UAS and then drop outof the chain; subsequent communications go directly betweenthe UAC and UAS. The protocol, though, has means that allowall signaling messages to proceed through the proxy. Figure1 also shows media traversing directly between the UAC andUAS; while this is preferred, in some cases it is necessaryfor the media to traverse a media gateway or media relaycontrolled by the proxy (e.g., conferencing or upgrading amedia codec).SIP defines six response classes, with 1-xx class beingprovisional responses and 2-xx to 6-xx class responses beingconsidered final. A SIP transaction is defined as an aggregationof a request, one or more provisional responses and a finalresponse (see Figure 1). While transactions are associatedon a hop-by-hop basis, dialogues are end-to-end constructs.The dialogue is a signaling relationship necessary to managethe media session. For SIP networks where media traversesthrough a proxy, there will be a dialogue established betweenthe UAC and the proxy and another one between the proxyand the UAS.SIP can be used to create calls or sessions that resembletraditional telephone calls. SIP can be used to create otherservices and applications, but the present work only examines its use in creating audio/video sessions, which are alsoreferred to as “calls”. The terms “session” and “call” are usedinterchangeably in this paper. But the term session is moreapplicable in the context of the Internet and in recognition ofthe functionality beyond phone calls that SIP offers. One canconsider a session to be a three-part vector with a signalingcomponent (SIP), a media component (Real-time protocol orRTP [11]) and a management component (Real-time Control

Algorithm 1 Benchmarking algorithm from [6]Figure 2: SIP session as a 3-part vectorProtocol or RTCP [11]). Figure 2 illustrates this interpretation.With this interpretation a traditional “call” is a SIP sessionwith a non-zero component in the media plane. This interpretation enables the study of maximum rates of SIP loads otherthan INVITEs. A load consisting only of REGISTER requests,for example, will create sessions with components only inthe signaling plane. Such rates, particularly registration rates,are of great interest to the SIP community. A load consistingof INVITE requests and including media that traverses theDUT,on the other hand, will require more resources since theDUT will handle the RTP as well as the SIP messages on theDUT.V. T HE S ESSION E STABLISHMENT R ATE A LGORITHMThe algorithm in Algorithm 1 was programmed using theUnix Bash shell scripting language [12] and is availablefor public download at [7]. The algorithm finds the largestsession attempt rate at which the DUT can process requestssuccessfully for a pre-defined, extended period of time withzero-failures. The name given to this session rate is “SessionEstablishment Rate” (SER [6]). The period of time is largeenough that the DUT can process with zero errors whileallowing the system to reach steady state.The algorithm is defined as an iterative process. A startingrate of r 100 sessions/second (sps) is used and calls areplaced at that rate until n 5000 calls have been placed.If all n calls are successful, the rate is increased to 150 spsand again, calls continue at that rate until n 5000 callshave been placed. The attempt rate is continuously rampedup until a failure is encountered before n 5000 calls havebeen placed. Then an attempt rate is calculated that is higherthan the last successful attempt rate by a quantity equal to halfthe difference between the rate at which failures occurred andthe last successful rate. If this new attempt rate also resultsin errors, a new attempt rate is tried that is higher than thelast successful attempt rate by a quantity equal to half thedifference between the rate at which failures occurred and thelast successful rate. Continuing in this way, an attempt ratewithout errors is found. The tester can specify margin of errorusing the parameter G, the granularity, which is measured inunits of sps (sessions/sec). Any attempt rate that is within anacceptable tolerance of G can be used as a SER.{Parameters of test; adjust as needed}n 5000 {local maximum; used to figure out largest value(number of sessions attempted)}N 50000 {global maximum; once largest session rate hasbeen established, send this many requests before calling thetest a success}m {.} {other attributes affecting testing, media forinstance}r 100 {Initial session attempt rate (sessions/sec)}G 5 {granularity of results; margin of error insessions/sec}C 0.05 {calibration amount: how much to back downif we have found candidate s but cannot send at rate s fortime T without zero-failure}{End parameters of test.}f f alse {set when test is done}c 0 {upper limit}repeatsend traffic(r, m, n) {Send r req/sec with m mediacharacteristics until n requests have been sent}if all requests succeeded thenr′ r {save candidate value of metric}if (c 0) thenr r (0.5 r)else if (c 1) and (r′′ r′ ) 2 G) thenr r (0.5 (r′′ r)else if (c 1) and (r′′ r′ ) 2 G) thenf trueelse{one or more requests fail}c 1 {found upper bound for metric}r′′ r {save new upper bound}r r (0.5 (r r′ ))end ifend ifuntil (f 6 true)As an example, assume it is known (through the vendor)that a SIP proxy (the DUT) exhibits a SER of 269 sps. Toindependently verify this, the benchmark algorithm starts withr 100 sps and continue ramping this rate until failures areseen. At, say, 400 sps, the DUT will exhibit failures and thealgorithm will reduce the next attempted rate to 300 sps, whichis half of the difference between the last failure (400) and thelast known success (200). The algorithm continues in this formuntil stable state is reached.VI. E XPERIMENTAL TEST BEDThe test bed, shown in Figure 3 consists of five elements: aswitch, hosts generating SIP traffic (with and without RTP), ahost to act as a DUT and host to collect traffic using Wireshark.The test bed is on a private LAN isolated from the Internetand a mirror port is configured on the switch to capture tracesof the calls in the load without contributing to the resourceconsumption of any of the functional elements of the test.

Figure 3: Physical architecture of the testbedThe SIP traffic is generated by SIPp [13], an open sourceSIP traffic generator. One hosts acts as a SIP UAC and theother host is the SIP UAS. A host running the open sourceAsterisk server [14]. The UAC is an Intel Core 2 6420, with2.13 Ghz speed and 4 Gb Memory; the UAS and the DUT runon Intel Core 2 6320 with 1.86 Ghz speed and 4 Gb memory.The Wireshark trace host is a Macbook Pro, with a speed of2Ghz Intel i7, and 8 Gb memory.VII. T ESTINGTHE TEST HARNESSBefore testing any DUT, it is necessary to discover the SERof the test harness itself. The harness includes the Bash testscript, the SIPp load generator, the platforms upon which theserun, and the switch that connects them all. Use of a slowmachine to run the SIPp elements, for example, would limit therate at which calls are generated, so that carrier-grade DUTsthat operate at or near line speed, might not be testable usinga test harness unable to generate the load necessary to producefailures.Following the methodology in [6], the SER of the testharness with and without RTP associated with the SIP callswas measured. The algorithm of Section V is a two-stepprocess: A test with a short duration is run and a candidateSER is found. That candidate becomes the starting rate for amuch longer duration test. The duration of the first test is thelength of time it would take to attempt 5000 calls; the secondtest lasts the length of time it would take to attempt 50,000calls. SIPp, the software used to implement our tests, offerstwo ways to end a single test run: in one, the test ends after acertain number of call attempts; in the other, the test ends aftera fixed time has elapsed. The methodology described in [6]uses the first method. The tests described here use the secondone.A. Testing the test harness without RTPFirst the test harness was tested using SIP calls with noassociated RTP. All SIP calls were set to last for 9 secondsand used a granularity of 3 (G 3, c.f. Section V). Test resultsare displayed in Figure 4 and analyzed below.1) Impact of the value of the granularity parameter:Comparing the results in Figure 4 for different granularities,we see that for a 90 second test duration, a granularity of1, produced an SER of 2,874; a granularity of 2, producedFigure 4: Test harness performance without RTP (Breakpointon y-axis is SER)an SER of 2,623; and a granularity of 3, an SER of 2,740.The granularity of 1 produced the highest SER. This is to beexpected because the granularity of 1 produces an SER thatis only one sps lower than the rate at which the first failureoccurs. Note however that the granularity of 2 produced alower value than the granularity of 3, a result we cannot yetexplain.Tests with a higher granularity find an SER more quicklythan those with a granularity of 1. Testing to determine theSER takes a long time. Each individual test loop of thealgorithm lasts a fixed time and the number of iterations fora long test-duration can cause the test process to last morethan 24 hours. As an example, if the test-duration is set to20 minutes and the iterative process is begun using a call rateof 100 calls per second, the first pass through the loop lasts20 minutes. If no errors occur, the rate is increased to 200calls per second and, the second pass through the loop lastsanother 20 minutes. After another 20 minutes, the rate is againincreased to 400 calls per second, and so on. It takes many20-minute periods to reach the 2,855 SER recorded in thetable. Even though the tests used a granularity of 3, and useda relatively short test-duration, the time to obtain the SER wasgreater than 6 hours.2) Impact of longer test-durations: It is expected thatlonger test-durations will produce lower SERs. This is becausethe longer the DUT sustains a call load at any rate, the morelikely it is that a failure such as a SIP time-out will occur. TheDUT’s operating system as well as its code and hardware arestressed, and the time to perform memory management andother functions will accumulate making delays and eventualfailures more likely the longer the test runs. The data show thatfor a granularity of 1 and test durations of 30s, 60s and 90s,the SERs were respectively 3,941 sps, 3,467 sps and 2,874 sps.Testing with a granularity of 2 and 3, for the same series oftest durations produced a similar steep decline in the value ofthe SERs. Tests for a granularity of 3 were conducted using anextended series of test-durations, 5-, 10- and 20-minute testdurations were attempted and the SERs produced were 2,799,2,803, and 2,855 respectively. The slight rise in values may

be attributed to the non-deterministic nature of the operatingsystems of the platforms on which the SIPp applications ran.The conclusion that can be drawn from these data is that thetest harness can deliver a load of around 2,700 sps when thereis no associated RTP and when the call duration is set for 9seconds. This means that a system that can support a highercall rate than 2,700 sps, cannot be tested with this harness.B. Testing the test harness with RTPThe next results were obtained for the DUT-less test harnesswhen RTP was associated with the SIP calls. To associate RTPwith the SIP session, the “PCAP-replay” method provided bySIPp was used. In that method, SIPp replays a pre-recordedPCAP file. We recorded our file such that it consisted of 9 totalseconds, broken down into 8 seconds of sound bytes plus a 1second payload of Dual Tone Multi-Frequency (DTMF). Thetests results reported in this section were such that SIP callswith RTP lasted for 9 seconds.The results when a granularity of 1, 2, and 3, a test-durationof 30s, 60s, and 90s, and a call-duration of 9 seconds wereused, are displayed in Figure 5. An extended series of testdurations for a granularity of 3, using test-durations of 5 and10 minutes was also performed. The results are presented inFigure 5.was running, the UAC sent RTP of 0.090 s instead or 1s andthe UAS sent an RTP audio packet of 7.049s instead of 8sfollowed by an RTP DTMF packet of 0.21s instead of 1s.Thus, in the actual test there were (7.040 0.090 0.21) or7.34 s of RTP, so the actual number of calls that would beactive is 7.34 s * 277 sps, or 2,033 calls. Thus, the actualnumber of file handles needed is 2,033 * 2, or 4,066, which isclose to but less than the 4,096 maximum that was configuredon the platforms.In conclusion, the test harness is able to find the SER fordevices whose SERs do not exceed 272 sps when RTP isassociated with the test calls.VIII. T ESTING THE DUTWe used the Asterisk server [14] as a DUT to measure theSER. Asterisk is an open-source code distribution that supportsdifferent SIP-based telecommunications services including IPPBX, conference servers and VoIP gateways. Asterisk can beconfigured to allow RTP to pass through the host it is runningon, or it can be configured to enable RTP to by-pass its hostand flow directly between the UAC and the UAS. Performancetests involving RTP forced the RTF flows through the machinerunning Asterisk.A. Testing the DUT without RTPThe results of testing for Asterisk’s SER when there is noRTP associated with the call load, are displayed in Figure6. This test measures the ability of the Asterisk to processSIP requests only. SIPp was configured to insert the 9s pausewhether or not RTP was actually sent during the pause. A SIPBYE request is sent 9 seconds after the call started.Figure 5: Test harness performance with associated RTP(Breakpoint on y-axis is SER)All the SER values fall between 281 and 272. This is severalorders-of-magnitude less than the case when RTP was notassociated with the calls. Also, the drop that we observedin Figure 4 as the test-duration increases does not appear inthis data. The difference is attributable to limitations on theplatforms on which SIPp runs and to the method that SIPpuses to generate the calls in the test. Regarding the platformlimitations: The open-file limit on these machines is set to4,096. Thus, no more than 4,096 PCAP files can be open atany time. SIPp requires two file handles for each call that itsets up: one for the PCAP file that it sends, and one for thePCAP file it receives. In the trace taken while this scenarioFigure 6: Asterisk performance without RTP (Breakpoint ony-axis is SER)Comparing the results in Figure 6 for different granularities,we see that granularities 1, 2, and 3, all exhibit the expecteddrop in SER from test durations of 30s to test durations of 60s.For test-durations of more than 60s, the SER’s for all threegranularities showed convergence to a constant value: 97 inthe case of the granularity of 1; 98 in the case of a granularityof 2; and 99 in the case of a granularity of 3. Tests for agranularity of 3 were extended to include test-durations of 5minutes and 20 minutes and the SER measured in each casewas 99.

B. Testing the DUT with RTPThe results of testing for Asterisk’s SER when is RTPassociated with the call load, are displayed in Figure 7. Thistest measures the ability of the Asterisk to process SIP requestsas well as relay RTP between the UAS and UAC. SIPp wasconfigured to insert a 9s pause and to send RTP during thatpause. A SIP BYE request is sent 9 seconds after the callstarted.Next steps include additional tests using the existing harnessand updates to the harness. Some directions for future workinclude applying the current test harness to a set of virtualimplementations of Asterisk to compare the SERs of theAsterisk code running on different operating systems, eachwith a different speed and memory size. We plan to test forthe Asterisk’s SER when call-associated RTP does not traversethe DUT. Comparison of the results obtained in VII and thoseresults would be of interest: Are Asterisk’s resources useddifferently when RTP is associated with the call but doesnot pass through Asterisk and when there is no RTP at allassociated with the call? We also plan to obtain the registrationrate of the Asterisk and measure the SERs and registrationrates of other open-source SIP servers including Kamilio [15]and FreeSWITCH [16], as well as commercial SIP servers.And finally, we plan to update the test script to conform tothe algorithm in [4]. The new version should enable the userto select a wider variety of SIPp options and to collect moredata about the DUT and the test bed elements, including thememory usage and processing times.R EFERENCESFigure 7: Asterisk performance with associated RTP (Breakpoint on y-axis is SER)Comparing the results in Figure 7 for different granularities,we see that granularities 1,2, and 3, all exhibit the expecteddrop in SER as test durations rise from 30s to 60s to 90s,with values of 88, 85 and 76 respectively for a granularity of1; 87, 86 and 77 for a granularity of 2; and 85, 85 and 79 fora granularity of 3. Tests for a granularity of 3 were extendedto include test-durations of 5 minutes and 10 minutes withresulting SERs of 80 and 77 respectively.IX. C ONCLUSION AND FUTURE WORKThe current work demonstrates the importance of establishing a baseline model (testing the test harness, Section VII inorder to understand its limits and behaviors and how these mayimpact the test results when an actual DUT is used. Lookingat Figure 4 and Figure 7 for a granularity of 1 and a testduration of 90s, we observe that the performance capacity ofthe test harness drops almost 90% when RTP is used. Clearly,processing RTP impacts the SER rate. Interestingly enough,observing the same data — granularity of 1 and test durationof 90s — in Section VIII, Figures 6 and 7, we note that thereis minimal drop in capacity when RTP is used in the DUT.We postulate that the uniformity in the SER when Asteriskis used as a DUT is attributed to the processing Asterisk isdoing for each session. The complexity of processing eachsession is high enough when Asterisk acts as a back-to-backuser agent that it is unable to achieve the same SER rates asthe test harness does without RTP. Thus upon the introductionof RTP, the SER remains the same in Asterisk.[1] J. Rosenberg, H. Schulzrinne, G. Camarillo, A. Johnston, J. Peterson,R. Sparks, M. Handley, and E. Schooler, “SIP: Session InitiationProtocol,” RFC 3261 (Proposed Standard), Internet Engineering TaskForce, Jun. 2002. [Online]. Available: http://www.ietf.org/rfc/rfc3261.txt[2] org/resources/resmgr/Standards/08003 Detailed Functional a.pdf[3] C. Davids, V. K. Gurbani, and S. Poretsky, “Terminology for benchmarking Session Initiation Protocol devices: Basic session setup andregistration,” IETF Internet-Draft (work-in-progress), draft-ietf-bmwgsip-bench-term-12, November 2014.[4] ——, “Methodology for benchmarking Session Initiation Protocol devices: Basic session setup and registration,” IETF Internet-Draft (workin-progress), draft-ietf-bmwg-sip-bench-meth-12, November 2014.[5] ——, “Terminology for benchmarking Session Initiation Protocol devices: Basic session setup and registration,” IETF Internet-Draft (workin-progress), draft-ietf-bmwg-sip-bench-term-09, February 2014.[6] ——, “Methodology for benchmarking Session Initiation Protocol devices: Basic session setup and registration,” IETF Internet-Draft (workin-progress), draft-ietf-bmwg-sip-bench-meth-09, February 2014.[7] A. Clouet, Y. Zhang, O. S. Awotayo, and C. Davids. (2015, Jan)SIPp Script for benchmarking SIP Servers. [Online]. /SIPpScript[8] H. G. Schulzrinne, S. Narayanan, J. Lennox, and M. Doyle, “SIPstone:Benchmarking SIP server performance,” Columbia University, Tech.Rep. CUCS-005-02, 2002.[9] Standard Performance Evaluation Corporation. (2014, December) SPECSIP Infrastructure. [Online]. Available: http://www.spec.org/specsip/[10] M. Cortes, J. R. Ensor, and J. O. Esteban, “On SIP performance,” BellLabs Technical Journal, vol. 9, no. 3, pp. 155–172, 2004.[11] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson,“RTP: A Transport Protocol for Real-Time Applications,” RFC3550 (Standard), Internet Engineering Task Force, Jul. 2003,updated by RFCs 5506, 5761, 6051, 6222. [Online]. Available:http://www.ietf.org/rfc/rfc3550.txt[12] Free Software Foundation. (2015, March) GNU Bash. [Online].Available: http://www.gnu.org/software/bash[13] (2014, Dec) SIPp. [Online]. Available: http://sipp.sourceforge.net/[14] /www.asterisk.org/[15] p://www.kamailio.org/w/[16] (2014, Dec) FreeSWITCH. [Online]. Available: https://freeswitch.org/

lular networks; SIP-to-SIP and SIP-PSTN-SIP ecosystems are used to reduce dependence on and supplement the traditional long distance trunk circuits of the Public Switched Telephone Network (PSTN). Many commercial systems and solutions based on SIP are available today to meet the industry’s