High Definition Videoconferencing:Codec Performance, Security, and Collaboration Tools ‡Gregg Trueb (Student), Suzanne Lammers (Student), Prasad Calyam (PI)Ohio Supercomputer Center, USA; Email: {gtrueb, slammers, pcalyam}@osc.eduAbstractInternet Videoconferencing has rapidly emerged as a beneficial technology to manyapplication domains. Recent developments in videoconferencing technology are allowingusers to videoconference using high definition video (HDVC). The HDVC technologyhas many advantages over the current standard definition videoconferencing (SDVC) andis being seen as vital technology in health care, education, judicial, and business. Thegoal of this report is to determine the differences between HDVC and SDVC in terms ofend-user usability and reliability. Specifically, we compare their traffic characteristics,subjective and objective Quality of Experience (QoE) measurements and study the waysto secure them on the Internet. This report will also inform potential users about therequirements, strengths and weaknesses of HDVC and will assess its future as a means ofcommunication. We conclude this paper with a comparison of different collaborationtools and their future in HDVC.‡This work has been sponsored by the American Distance Consortium and the National ScienceFoundation under the Summer Research Experience for Undergraduates (REU) program. Any opinions,findings, and conclusions or recommendations expressed in this material are those of the author(s) and donot necessarily reflect the views of the National Science Foundation.

Table of Contents1. Introduction2. Background and Related Work2.1. Internet Videoconferencing2.2. HDVC compared with SDVC2.3. Video Codecs2.4. Security Concerns2.5. Collaboration Tools3. Methodology and Results3.1. Codec Performance3.1.1 Test bed setup3.1.2. Traffic Characteristics3.1.3. End-user Quality of Experience3.1.4: Performance conclusions3.2. Security3.2.1. GNU Gatekeeper3.2.2. Polycom V2IU solution3.2.2. Cisco PIX firewall with “H.323 fixup”3.3. Collaboration3.4. HDVC Applications3.4.1. Health Care3.4.2. Education3.4.3. Judicial3.4.4. Business4. The Future for HDVC5. Conclusions6. Acknowledgements7. BibliographyAppendix I: Annual Distance Teaching and Learning Conference – Trip ReportAppendix II: Features comparison of Collaboration toolsAppendix III: Research IIIIXV

11. IntroductionIt is hard to argue that the most defining change in the last few decades has beenin the area of communication. Businesses, educators and consumers alike are able totransfer ideas, pictures and documents instantly using the Internet. This report deals withvideoconferencing, specifically high definition videoconferencing (HDVC). The goal isto examine the feature sets, traffic characteristics, security issues, and future uses inorder to assess the usability and reliability of HDVC technology.Performance problems associated with videoconferencing include latency, jitter,and packet loss. Latency, otherwise known as delay, is the time it takes for a packet ofinformation to travel from a source to a receiving system [1]. If a videoconferencingsystem begins to have long latencies, the video conference is no longer real time whichhinders its effectiveness. It effects interaction because the participants may interrupt eachother unknowingly. Jitter is when network conditions cause the latency to be inconsistentthroughout a videoconferencing session [1]. Jitter can make a simple conversationdifficult because of the unpredictable performance that makes it difficult to adjust buffersizes at the receiving ends. Packet loss is when pieces of information are lost over thenetwork [1] [13]. In videoconferencing, packet loss causes the video to tile or soundsnippets to dropout. These issues might be due to the amount of traffic in the network orimproperly configured or inadequate network equipment.Also affecting videoconferencing performance are security measures taken bynetwork administrators. There are two conflicting forces in networking: security andusability. With both areas being essential, engineers and software designers are forced towork in the middle ground when designing systems. This report compares the wayscommon security measures are dealt with by VC administrators.The traffic characteristics and end-user Quality of Experience (QoE) in terms ofMean Opinion Scores (MOS) are used to compare HDVC and SDVC. Our research isalso focused on the collaboration tools in context of HDVC applications. Examining thedifferent features, this report discusses the future of collaboration tools and theirintegration into HDVC.The remainder of this report is organized as follows: Section 2 explains relatedresearch and literary reviews about high definition video, codecs, collaboration tools, andsecurity concerns of HDVC and SDVC. Section 3 presents the methodology, testbedsetup and results of our experiments. The results show the differences in performancebetween HDVC and SDVC. Section 3 also describes how HDVC is being preferred overSDVC today in different application domains. Finally, in Section 4, our vision on thefuture of HDVC is described. Section 5 concludes the report.

22. Background and Related Work2.1. Internet VideoconferencingSince its humble beginnings with AT&T in 1930[1], videoconferencing (VC) hasbecome a common form of communication.Unlike telephone or text chat,videoconferencing allows for a more realistic conversation between two physicallyseparated parties over the Internet. It facilitates easier communication between barrierssuch as clean rooms and nuclear installations, or distances with multinationalcorporations, military, or distance learning. Aiding the development and growth of VC iscontinual improvements in network technology and infrastructure. Over 68% of the USpopulation now has direct access to the Internet, which continues to increase every year[24]. Even more instrumental to VC growth is corporations, research, and educationalfacilities gaining access to increasingly capable networks. Corporations, researchers, andeven families are now able to hold meetings and communicate over the Internet, thussaving time and travel costs. Educators are also taking advantage by using the technologyto reach a larger base of students in distance learning applications sometimes includingmultinational classrooms. The improved broadband access is allowing more and morepeople to use VC technology.Like in any other form of communication technology there are standards in placeto foster compatibility in VC. The most common standard for videoconferencing is theH.323 standard. In 1996, the International Telecommunications Union (ITU-T)completed the first version of the H.323 based on earlier versions of H.32x standards.H.323 provides a means for communication between a variety of different hosts such asend points, gatekeepers and gateways. The protocol was designed to communicatethrough packet based networks such as, IP, local area networks (LAN) and wide areanetworks (WAN) [1]. This makes it perfect for the modern Internet which is based on theIP protocol. Rivaling the H.323 standard is a standard developed by the InternetEngineering Task Force (IETF) around the same time. This technology is called SessionInitiation Protocol (SIP) which acts as an initiator of communication between endpoints.Unlike H.323, the SIP protocol is independent of the type of data that it is carrying andacts only as a session starter [1]. Many commercial VC systems are compatible with bothprotocols [18] [15].Advances in audio and video display, compression/decompression, andtransmission technology are being utilized in VC. High definition is a technology thatafter finding a home in movies and television has started to be incorporated intovideoconferencing. High definition videoconferencing (HDVC) provides both a richerand in some situations, a more informative video session. The combination of highdefinition video and sound provides the participants with a more realistic setting. It hasthe benefit of decreasing meeting fatigue and loss of attention as well as increasing theuse of visually detailed material [25]. Combining these benefits with collaboration tools,effective and efficient meeting can be held across a campus or across the world ininspiring clarity. An example of high definition video frame and standard definitionvideo frame is in Figure 2.1.

3Figure 2.1: Visually apparent difference between HD (left) and SD (right)2.2. HDVC compared with SDVCThe differences between HDVC and SDVC systems are in a variety of differentareas, most notably in image size and data requirements. Video images are made up ofpixels, tiny pieces of color information laid out in a grid to form an image. Highdefinition’s increased resolution, shown in Figure 2.1 and Table 2.1, is due to its use ofmore pixels within an image than standard definition; this makes the images clearer andmore representative of the actual image [2] as shown in Figure 2.1. High definition alsorequires a larger screen to display video than a standard definition system because of itshigher resolution [2]. Table 2.1 shows some of the differences between HDVC andSDVC.Table 2.1: Comparison of HDVC and SDVC systemsCostVendorsSystemsFeaturesResolutionPrimary Video CodecsDialing SpeedsHigh DefinitionStandard Definition 12,000 [20] - 300,000LifeSizePolycomSonyTandbergLifeSize Room [15]HDX Series [18]PCS-HG90 [16]Edge 95/85/75 [17]Application SharingData SharingSecurity1280 x 72016:9 aspect ratioH.264H.263128 Kb/s, 256 Kb/s, 384 Kb/s512 Kb/s, 768 Kb/s, 1024 Kb/s1152 Kb/s, 1472 Kb/s, 1920Kb/s, 2500 Kb/s, 3000 Kb/s,4000 Kb/s, 5000 Kb/s 100 - 12,000 [20]PolycomSonyTandbergView StationVSX 7000PCS-TL30Centric 150 MXPApplication SharingData SharingSecurity704 x 4804:3 aspect ratioH.263H.264128 Kb/s, 256 Kb/s, 384 Kb/s512 Kb/s, 768 Kb/s

42.3. Video CodecsVideoconferencing relies on video codecs to compress and decompress theinformation being transmitted, because using the raw data would use more networkresources than necessary or in most cases possible. Video codecs use static areas,irrelevant information and predictive algorithms to compress the number of bits neededfor a sequence. The compression and decompression of data can allow for a lowerbandwidth than would be necessary without it [13]. These video codecs need to bepatched regularly otherwise it can lead to functionality problems during avideoconference [12]. The performance of a videoconferencing system is directly relatedto a video codec’s ability to cope with different network conditions on the Internet.The two primary video codecs used in video conferencing are H.263 and H.264.Both video codecs use a variety of different schemes such as static segments and interframe memory to compress and decompress the video streams to a manageable size. Oneof the points of comparison is rooted in the codecs’ ability to compress the necessarydata. Because of its increased frame size, and resolution HDVC requires an increasedamount of data. This means that the systems performance has an increased reliance onthe video codecs performance.H.263 which was developed in the mid 90’s and has undergone a few revisionsand was a precursor to the H.264 standard. The H.264 was the first video codec designedto be network friendly with many features that help to makeup for lost or delayed packetsin the stream. It also provides an increased compression ratio and thus is the codec ofchoice in HDVC systems. For more information on video codecs, see [33].2.4. Security ConcernsA fundamental step that network administrators take in securing networks is todeploy a firewall. A firewall is a system that checks information entering and leaving anetwork. It checks based on a set of rules and either denies or allows the information topass to its destination. This is a powerful step in preventing hackers and malicioussoftware from gaining access to a network. This however poses a problem forvideoconferencing and H.323 based applications specifically. The problem arises due tothe method in which firewalls make decisions, which are based on address, port andmessage type criteria. This information is stored in the header portion of each packet sentthrough a packet switching network. H.323, which is designed to run on a packetswitching network, has two phases: a connection phase and a transmission phase. Theconnection phase is on a well known port, i.e. 1720, and is easily allowed through afirewall. [1] However in this connection process subsequent transmission ports arenegotiated between the two VC units in the body of the control packets. This poses aproblem because a normal firewall will have no knowledge of this transaction, so whenthe transmitted packets reach the firewall on the new port they will be dropped per thefirewall rules. In line with firewalls is another mechanism used in modern networks thatcause problems in VC situations. Devices that eliminate the need for multiple public IPaddresses by converting messages to correspond to internal private IP address, callednetwork address translation (NAT). The problem posed by NAT and a more detaileddescription of the firewall problem is given in section 3 of this report.

5There are four main solutions to these problems. One is to open the networkduring videoconferencing session by disabling the firewall on the RTP port range.However, this is not an option in certain situations where security is a high priority anddoes not solve NAT problems. Solution provided by the open-source community i.e.,openH323 in particular is the GNU gatekeeper software. This software runs on adedicated server that facilitates connections of videoconferencing end-points. Each endpoint that wants to engage in videoconferencing first makes a connection to thegatekeeper. The gatekeeper is then responsible for forwarding all traffic between the twoend-points on either side of the firewall, effectively working as a way around a closedfirewall and NAT devices [27]. In order for the gatekeeper to bypass the firewall, it musthave unrestricted access to both the inside and outside of the network. It is thereforeplaced in a demilitarized zone (DMZ). The problem with the gatekeeper solution is thatthe gatekeeper itself is vulnerable to attacks and if compromised, allows unrestrictedaccess to the internal network. So users of this solution need to harden the gatekeeperserver as much as possible [27].The next two are commercial solutions offered by Cisco and Polycom. First theCisco PIX firewall is a device that in normal operation is a powerful versatile firewall.The special feature of the PIX is that is has an implementation of “H.323 fixup.” Thefirewall itself has the ability to examine the body of H.323 control packets and makedecisions based on the contents [28] [29]. The Polycom solution includes the V2IUsystem which provides routing and firewall solutions. The V2IU system is speciallydesigned to allow H.323 through without compromising security [26]. This is done by theunit examining and effectively relegating H.323 traffic to the units internal DMZ so asnot to be checked by the firewall and tunneling them through using the H.460 protocol.2.5. Collaboration ToolsIn addition to videoconferencing being used for Internet collaboration, a wholehost of different tools are also available. They provide more ways for people to worktogether than just using video and voice. The most widely used of these tools is email,though this report is interested in more real time collaboration tools. There are toolsavailable that will allow users to work jointly on documents, software, give presentations,and conduct online training and classes - increasing productivity and reducing cost oftravel and lost time.Collaboration tools can be broken up into three main categories: chat, multi-pointconferencing and web conferencing. Chat simply allows users to communicate one onone using text/video/voice. Multi-point conferencing is when users are in a many to manyenvironment. Web conferencing is similar to conferencing adding different functionalitysuch as application and desktop sharing. Depending on the situation one or more of thesedifferent types may be appropriate.The main feature in most network collaboration tools is usually instant messaging,the sending of text messages from user to user. This is mainly used in the chat format butcan be used in a conference form where all users see all messages i.e. chat rooms.Advancing up the scale adds videoconferencing capabilities i.e., voice and video chat.Internet collaboration also gives the ability to move away from simple chat andconferencing, and adds a richer framework for user participation. Features such as

6application and desktop sharing, annotation and white boarding tools allow multiple usersto work on the same project simultaneously over the Internet. Advanced features includerecording of collaboration sessions, the ability to poll participants, and even to controlinstruments remotely. The use of these features together allow for highly effectivemeetings over the Internet.Internet collaboration tools are separated from each other by their platform,available features and price among other things. Below in Table 2.2 and Table 2.3 arecomparison charts of some of the more popular collaboration tools available.Table 2.2 Comparison of common high-end collaboration toolsPlatformPart aredBrowserOne wayVideoText robatAcrobatConnect Connect Yes 375750/monYes 0.09 0.15/minYesPossibly 46K 23000 39/monPresentNot allavailableYes5 Users 375/mGearedtowardClass

7Table 2.3 Comparison of common inexpensive collaboration toolsPlatformVSeeSkypeUnyte mOfYesEvo LinuxWin/Mac/LinuxWin/Mac/LinuxWinYesYesPart udioYesSharedBrowserYesText esYesYesFlashYesYesYesYesPollingRecordYesYesYes t)StructureNotesSIPFreeRunthroughSkype 25/mon 750FreeFreeServerbasedDesktopSharingonlyOut ofdate

83. Methodology and Results3.1. Codec PerformanceWe gained the information presented in this report by using Internet, electronicjournals, interviews with experts and everyday users of HDVC systems and experimentalsources. System information was obtained from the VC distributor’s websites and onlinereviews. OhioLink was used for the electronic journals. Contacts for interviews andassessments for this project were made through the Megaconference mailing list, H.323forum mailing list, The Ohio State University and OSC. Experimental data was collectedthrough tests run at OSC as described below. The goals of this research are to assess thequality of HDVC streams in different network conditions and compare them to a SDVCstream measured in the same conditions and to compare the traffic characteristics of both.3.1.1 Test bed setupShown in Figure 3.1 is a block diagram of the research setup used to collect thedata from both HDVC and SDVC systems. The packets leaving the source machine passthrough a switch on V-LAN 1. The switch routes the data through a network emulatorand then to the other side of the switch running V-LAN 2, where it is sent to thedestination unit. The network emulator is capable of inducing bandwidth caps, packetloss, jitter, and delay. This gives the ability to simulate different network conditions inreal time.Figure 3.1: Test bed setup for HD/SD comparison

9Test procedure for traffic characteristics:1. Connect DVD player to source SD and HD unit2. Play low motion clip (Streaming Kelly)*3. Gather packet information using wireshark packet capture program on networkmonitor server4. Play medium motion clip (Foremen sequence)**5. Repeat step 36. Play high motion clip (Soccer sequence)***7. Repeat step 38. Repeat from step 1 using different dialing speeds for both SD and HD unitsTest procedure for MOS measurements:1. Connect DVD player to source SD and HD units2. Play, record and have the group watch all three source clips (low*, medium**,high***) through an open network.3. Reset the network emulator to good level indicated in table 3.1 below4. Play, record and have group watch and score all three source clips5. Reset the network emulator to acceptable level indicated in table 3.1 below6. Repeat step 47. Rest the network emulator to poor level indicated in table 3.1 below8. Repeat step 49. Repeat sequence from 1 for both HD and SD at selected dialing speeds*Streaming Kelly is a prerecorded video sequence with typical VC video activity levels i.e., talking headmovements**Foreman sequence is a prerecorded video sequence with moderate VC video activity levels and cameramovement i.e., rapid movements and a few scene changes***Soccer sequence is a prerecorded video sequence with high VC video activity levels and cameramovement i.e., rapid movements and a rapid scene changes

103.1.2. Traffic CharacteristicsFigures 3.2 and 3.3 show the instantaneous throughput of SD and HDrespectively, at the popular dialing speed of 768 Kbit/s. The plots show both systemshovering somewhere around the available maximum, i.e., 768Kbit/s, when the clipexhibits high and medium motions. A noticeable difference between the two systems isshown in the low motion clip. It is apparent that the HD system used all of the availablebandwidth on the low motion clip while the SD system did not.Bandwidth usage SD 768(Kbit/s)Instantaneous Throughput(Kbit/s)900800700600500400300Low motion200Medium Motion100High Motion002004006008001000Packet #Figure 3.2: Bandwidth usage of high, medium, and low motion clips through SD unitat a dialing speed of 768Kbit/sBandwidth usage HD 768 Kbit/sInstantaneous Throughput(Kbit/s)800700600500400300200Low motionMedium Motion100High Motion00200400600800100012001400Packet #Figure 3.3: Bandwidth usage of high, medium, and low motion clips through HD unitat a dialing speed of 768Kbit/s

11What can be seen in contrasting the two systems is that HD has no separation inhow the different motion clips are transmitted that was seen in the SD system. Increasingthe speed of the HD system to 2500Kbit/s shown below in Figure 3.4, HD still does notseem to show a noticeable difference in bandwidth usage between the different motionclips.Bandwidth usage HD 2500Kbit/sInstantaneous Throughput(Kbit/s)2540Low motion2520Medium Motion2500High Packet #Figure 3.4: Shows the bandwidth usage with low, medium, high motion clips runningHD at a dialing speed of 2500Kbit/s.This fact is summarized by Figures 3.5 and 3.6 where the average bandwidthusage of the three clips, high, medium, and low motion, is plotted against the dialingspeed for SD and HD.What can be seen is that the average for HD stays pretty linear with the bandwidthused by the video being slightly less then the dialing for all clips. This shows that the HDsystem uses all of its available bandwidth and does not seem to discriminate betweenhigh and low motion. In contrast, we see in Figure 3.5 that there is a noticeabledifference between the average usage in low, medium and high motion clips on the SDsystem. It is apparent that - the lower the motion in the clip transmitted, lesser availablebandwidth is being utilized. Examining figure 3.6 shows that the HD system does notmake this distinction.

12Average instantaneous throughput per dialing speedAverage instantaneous throughput(Kbit/s)900800700600500400Low MotionMedium Motion300High Motion20010000100200300400500600700800900Dialing speed (Kbit/s)Figure 3.5: Average bandwidth usage vs. dialing speed of SD systemAverage instantaneous throughput per dialing speedAverage instantaneous througput(Kbit/s)3000250020001500Low Motion1000Medium MotionHigh Motion5000050010001500200025003000Daling speed (Kbit/s)Figure 3.6: Average bandwidth usage vs. dialing speed of HD system

13Another difference that is worth noting is the manner in which the two systemsmanage the sizes of their video packets. Figures 3.7 and 3.8 show the two using adifferent packet size strategy. The medium motion clip run through SD shows a widevariety of packet sizes, ranging from approximately 200 bytes to slightly over 1400byteswith the average size at 1144. There does not seem to be a noticeable pattern to thepacket size distribution. In Figure 3.8 the packet sizes of the medium clip played throughthe HD system at 768 Kbit/s is shown. The HD system uses a similar range as the SDsystem from approximately 200 bytes to 1400 bytes. The difference is shown in theaverage, with the HD system having an average packet size of 775 bytes. From Figures3.2 and 3.3, we can see that the two systems running the medium motion clip useapproximately the same bandwidth. Combining this with the difference in packet sizedistribution would indicate that the SD takes a slower but larger packet size approachthan the HD system, with the SD having a larger inter-packet time.Packet length SD medium motion 768Kbit/s1600Packet length 00Packet #Figure 3.7: Shows the packet length variability during a medium motion clip throughSD at a dialing speed of 768Kbit/s

14Packet length HD medium motion 768Kbit/s1600Packet length 001400Packet #Figure 3.8: Shows the packet length variability during a medium motion clip throughHD at a dialing speed of 768Kbit/sThe HD system starts to look like the SD in terms of packet sizes when the dialingspeed is increased, which is shown in Figure 3.9 where the packet sizes are plotted at2500Kbit/s. At the 2500Kbit/s, the average packet sizes jumps higher than 1280 byteslike the SD clip nearing the packet size constraints for Ethernet. With both systems, theincreasing dialing speed leads to an increase in packet size with both leveling off ataround 1400 bytes maximum. This trend is shown in Figures 3.10 and 3.11 which graphthe average packet sizes to dialing speed using the low motion and medium motion clip.The packet size has an increasing trend for both systems with higher motion clips.Figure 3.10 shows that there is a distinct difference in packet size between the mediumand low motion clip, while in Figure 3.11, no noticeable difference is evident.

15Packet length HD medium motion 2500(Kbit/s)1600Packet Length 03000Packet #Figure 3.9: Shows the packet length variability during a medium motion clip throughHD at a dialing speed of 2500Kbit/Average packet length per dialing speed SD1400Average packet length (bits)12001000800600Medium motionLow motion40020000100200300400500600700800900Dialing speed (Kbit/s)Figure 3.10: Shows the average packet lengths of low and medium motion clips perdialing speed

16Average packet length per dialing speed HD1600Average packet length (bits140012001000Medium motionLow g speed (Kbit/s)Figure 3.11: Shows the average packet lengths of low and medium motion clips perdialing speed3.1.3. End-user Quality of ExperienceThe Mean Opinion Scores (MOS) values that quantify end-user QoE in this reportare broken up into two different categories: subjective and objective. Test wereperformed using both HD and SD equipment in three different network conditions: good,acceptable, and poor. The reader is referred to Table 3.1 for definition of networkconditions defined by [32]. The setup used was identical to the setup used to obtain thetraffic characteristic shown in Figure 3.1. The tests were performed using the networkemulator to shape the traffic into the characteristics shown in Table 3.1. To obtain thesubjective values, a group of test subjects were exposed to the original clips, low,medium and high motion, and then asked to watch the same clips running through VCunits in different network conditions. Rating each on a scale of 1-5 where, 1-3 range isunusable to poor with frequent faults, 3-4 range is considered acceptable with few faults,and 4-5 range is deemed good with very few if no noticeable faults. The same networkand rating standards were applied when obtaining the objective measurements. Theobjective measurements were taken using the NTIA VQM software that comparesoriginal and processed clips and searches for a variety of faults and provides an objectiveMOS value.

17Table 3.2: Network bandwidthprovisioned for MOS testingTable 3.1: Network conditions configuredon network emulator for MOS testingHDSD(Kbit/s) 00The results echoed what would be expected. The low motion clip performed thebest under all three network conditions on both SD and HD, with medium being ratedsecond best. Comparing the two systems with both objective and subjective MOSmeasurements it can be concluded that in both good and acceptable network conditionsthe HD performed better, this as shown in Figures 3.12, 3.13, and 3.14.SDVC Subjective MOSNetwork ConditionsGoodAc c eptableHigh MotionMedium MotionPoorLow Motion0. MOSFigure 3.12: Subjective video performance of SD at a dialing speed of 768Kbit/s inGood, Acceptable, and Poor network conditions.

18HDVC Subjective MOSNetwork ConditionsGoodAc c eptableHigh MotionMedium MotionPoorLow Motion0.0

videoconferencing. High definition videoconferencing (HDVC) provides both a richer and in some situations, a more informative video session. The combination of high definition video and sound provides the participants with a more realistic setting. It has the benefit of decreasing meetin