Transcription

White paperLatency in live network video surveillance

Table of contents1.Introduction 32.What is latency? 33.How do we measure latency? 34.What affects latency? 44.1 Latency in the camera 44.1.1 Capture latency 44.1.2 Latency during image enhancement 44.1.3 Compression latency 54.1.4 Buffer latency 54.1.5 Audio latency 64.2 Latency in the network 64.2.1 The infrastructure 64.2.2 Video stream data amount 74.2.3 The transmission protocols 74.3 Latency on the Client side 84.3.1 Play-out buffer 84.3.2 Audio buffer 84.3.3 Decompression 84.3.4 Display device refresh rate 85.8Reducing latency 5.1 Camera side 95.2 Network 105.3 Client side 106.10Conclusion

1. IntroductionIn the network video surveillance context, latency is the time between the instant a frame is capturedand the instant that frame is displayed. This is also called end-to-end latency or sensor-to-screenlatency. This transporting process includes a long pipeline of steps. In this white paper we will try todissect these steps. We will first look into those which will affect latency and finally give recommendations in how to reduce latency.2. What is latency?The definition of latency depends on the context, where there will be variations in the meaning. In network technology, latency is commonly perceived as the delay between the time a piece of informationis sent from the source and the time the same piece of information is received at its final destination.This paper discusses latency in network video surveillance systems. Here we define latency as the delayfrom when an image is captured by a camera until it is visible on a video display. There are severalstages required in this process: capture, compress, transmit, decompress and display of the image. Eachstage adds its own share of delay, which together produces the total delay, which we call end-to-endlatency. This end-to-end latency can be divided into 3 major stages impacting the total system latency:1. Latency introduced by the camera (image processing / encoding latency)2. Latency introduced by the network (transmission latency)3. Latency introduced by the receiver side (client buffer, decoder latency, and display latency).Each of these latencies needs to be considered when designing the video solution in order to meet thelatency goal of the video surveillance system.3. How do we measure latency?Latency is usually expressed in time units, e.g., seconds or milliseconds (ms). It is very hard to measureexact latency as this will require the clock on the camera and the display device to be synched exactly.One simple way (with reservation for minimum deviation from the exact values) is by using the timestamp overlay text feature. This method measures the end-to-end latency of a video surveillance systemthat is, the time difference between the capture of one image frame in the lens to when that same frameis rendered on a monitoring device.Note that this method will produce a possible error of up to one frame interval. The possible error of oneframe interval depends on the fact that the timestamps used to calculate the latency are only collectedat frame capture. We will therefore only be able to compute the latency with the factor of the framerate. Hence, if we have a frame rate of 25 fps, we can calculate the latency as a multiple of 40 ms. If wehave a frame rate of 1 fps, we can calculate the latency as a multiple of seconds. This method is therefore not recommended for low frame rates. Turn on timestamp in the overlay by using (%T:%f) Place the camera in an angle so it captures its own live stream output Now take snapshots of the live stream output to compare the time difference between the timedisplayed in original text overlay and the time displayed in the screen loop3

From the picture above you can see the time difference is 460 ms - 300 ms which gives us an end-toend latency at 160 ms.4. What affects latency?4.1 Latency in the camera4.1.1Capture latr at least one frame, while MJPEGdecoding requires no buffer.)Effectiveness of the compression methodMost common encoding schemes used in Axis cameras are MJPEG and H.264. Both MJPEG and H.264introduce latency in the camera. H.264 is a compression encoding that, when applied, minimizes thethroughput of a video stream to a greater extent than when compared with MJPEG. Which means usingH.264 will produce fewer data packets to be sent through the network, unpacked and rendered in thereceiver end. This will, of course, have a positive effect on reducing the total latency.The choice of bitrateVideo compression reduces video data size. However, not all frames will be the same size after compression. Depending on the scene, the compressed date size can vary. In other words, the originalcompressed data is streams of Variable Bit Rate (VBR), which result in variable bitrate being outputtedinto the network. One needs to take the constraints of the available network such as bandwidth limitations into consideration.The bandwidth limitations of a streaming video system usually require regulation of the transmission bitrate. In some encoders, the choice of VBR and Constant Bite Rate (CBR) is presented. By choosing CBRyou will guarantee the network receives a limited amount of data so it will not be overloaded, leadingto network delay and the need of a larger buffer in the receiver end further on in the system.In Axis cameras, choosing H.264 will provide you the choice to select CBR or VBR. From firmware 5.60the choice is between Maximum Bit Rate (MBR) and VBR. However, Axis has always recommended usingnetworked video with VBR where the quality is adapted to scene content in real-time. It is not recommended to always use CBR as a general storage reduction tool or fix for weak network connections, sincecameras delivering CBR video may be forced to erase important forensic details in critical situations.When choosing a compression method one should take all three aspects mentioned above into consideration. On one hand an advanced encoding algorithm will take longer time to encode and decode, on theother hand it will reduce the data volume being sent through the internet, which will in turn shortentransition delays and reduce the size of receiver buffer.4.1.4Buffer latencyBecause images are handled one frame at a time, only a limited amount of data can be compressed at atime, short-term buffers between the processing stages are sometimes needed. These buffers alsocontribute to the latency in the camera.5

4.1.5Audio latencyIn some cases the video stream is accompanied by audio. The audio encoder needs to wait for a certainamount of samples before a block is available to begin the encoding of audio adding additional delay in thecamera side. The sample rate and block size is different in different audio encoding algorithms.4.2 Latency in the networkAfter the image is captured, processed and compressed, the video data will travel through a network before it reaches the client side for rendering. To understand how the network will affect latency we need tofirst understand some basic concepts in video networking, namely the definition of Bandwidth, Throughputand Bitrate. Seen in the picture below if we imagine the link/network between the camera and the monitorto be a pipe, then the Bandwidth is how thick that pipe is. The Throughput measures how much data actually comes through the pipe per time unit. The Bitrate is how much data is being carried out to the pipe pertime unit.Basic concepts: Bandwidth, Throughput and work latency is proportional to bitrate and inversely proportional to bandwidth.The Bandwidth is how much data the network between the camera and the monitor can potentially handle.It is the maximum capability of your link. It depends on the length and the infrastructure of the link, i.e.switches, routers, cables, proxies, etc. If we increase the capacity of the network, more data will be ableto pass thrount of data so it will not be overloaded, leadingto network delay and the need of a larger buffer in the receiver end further on in the system.In Axis cameras, choosing H.264 will provide you the choice to select CBR or VBR. From firmware 5.60the choice is between Maximum Bit Rate (MBR) and VBR. However, Axis has always recommended usingnetworked video with VBR where the quality is adapted to scene content in real-time. It is not recommended to always use CBR as a general storage reduction tool or fix for weak network connections, sincecameras delivering CBR video may be forced to erase important forensic details in critical situations.When choosing a compression method one should take all three aspects mentioned above into consideration. On one hand an advanced encoding algorithm will take longer time to encode and decode, on theother hand it will reduce the data volume being sent through the internet, which will in turn shortentransition delays and reduce the size of receiver buffer.4.1.4Buffer latencyBecause images are handled one frame at a time, only a limited amount of data can be compressed at atime, short-term buffers between the processing stages are sometimes needed. These buffers alsocontribute to the latency in the camera.5

4.1.5Audio latencyIn some cases the video stream is accompanied by audio. The audio encoder needs to wait for a certainamount of samples before a block is available to begin the encoding of audio adding additional delay in thecamera side. The sample rate and block size is different in different audio encoding algorithms.4.2 Latency in the networkAfter the image is captured, processed and compressed, the video data will travel through a network before it reaches the client side for rendering. To understand how the network will affect latency we need tofirst understand some basic concepts in video networking, namely the definition of Bandwidth, Throughputand Bitrate. Seen in the picture below if we imagine the link/network between the camera and the monitorto be a pipe, then the Bandwidth is how thick that pipe is. The Throughput measures how much data actually comes through the pipe per time unit. The Bitrate is how much data is being carried out to the pipe pertime unit.Basic concepts: Bandwidth, Throughput and work latency is proportional to bitrate and inversely proportional to bandwidth.The Bandwidth is how much data the network between the camera and the monitor can potentially handle.It is the maximum capability of your link. It depends on the length and the infrastructure of the link, i.e.switches, routers, cables, proxies, etc. If we increase the capacity of the network, more data will be ableto pass through, leading to lower latency.The Throughput is the actual achieved speed of your data transfer. It depends on if you are sharing the linkwith others. It depends also on the electromagnetic interference on the cables in the link, as well as theQoS configured on the ports that may cap throughput.Bitrate is the number of data in bits that are processed per unit of time. In video surveillance, bitrate isdefined by the amount of data generated by the camera to send through the network per unit of time. Thebitrate depends on many factors; it depends very much on the filmed scene, the processing done in thecamera and the video stream settings. When the camera is producing more data to be transmitted, you canexpect higher network latency if the bandwidth is limited.The total latency in the network depends on three major factors. The infrastructure of the link between thecamera and the video viewing device which determines the bandwidth, the amount of data produced ofthe camera which determines the bitrate, and the choice of transmission protocol.4.2.1The infrastructureThe network is the most unpredictable source of the end-to-end latency. Switches, routers, cables,proxies everything in the network between senders to receiver will affect the total end-to-end latsH.264 or MJPEG / RTP / Unicast / UDPWAN / several hops where you do not have fullcontrol over the nodesH.264 or MJPEG / RTP / Multicast / UDPNormally it will take longer to transport a packet using TCP than through UDP, because of the extraconnection setup, the acknowledgement messages, and re-transition of packages when a loss is detected. On the other hand, with UDP the user will experience artefacts or interruption in the videostream when packets are lost on the way. TCP will yield jitter on packet loss, UDP will yield artefacts and/or interruptions on packet loss. If data loss and temporary quality degradation is acceptable, the UDPcould be a choice for networks with low bandwidth.If you are using TCP, there will be more packets to be sent; to support this you need a better bandwidth.If you know there is a lot of congestion in the network, then select UDP as your transmission protocol.Since packet loss is accepted, at the same time it will also lead to packet loss resulting in lower qualityof image.7

4.3 Latency on the Client sideAfter the video is received on the client side of the video system, it is unpacked, reordered and decoded anda media player is used to render the video. Each step also contributes to the total latency generated on theclient side. The computer itself plays an important role in the overall client side latency. The CPU capacity,the operative system, the network card and graphic card also affects the outcome of latency. UsuallyMJPEG is the method with lowest decoding and display latency because data can be drawn on screen asthey arrive because there are no time codes. H.264 and other video compression standard assign timecodes to each picture and require them to be rendered accordingly.4.3.1Play-out bufferReal networks are often very large and complicated, with bursting traffic behavior and packets arriving indifferent orders. To compensate for variations introduced by network transport, a buffer is used on theclient side. It makes sure that the packets get into the right order and buffers enough data so the decoderdoesn’t “starve”; uniformed frame rate is displayed in the viewer. This buffer is often called play-out bufferor jitter buffer. When used, this buffer contributes to relatively high latency in the client side.It is important to stress that different viewer applications have different play-out buffer size. With VLC thedefault play-out-buffer is set to 20 ms, Quicktime 5 sec. In most viewers, the buffer size could be changed.But it is important to keep in mind that reducing the buffer will increase jitter. The user needs to find thebalance between jitter and tolerable latency.4.3.2Audio bufferIn playback, audio streaming is also more sensitive to hiccups or delays than video streaming. A singledelayed audio packet generates an annoying crack in the soundtrack. The audio has to be lip-synchronizedwith the video. This requires the need to set up a large play-out buffer when video is accompanied withaudio. This will of course increase the end-to-end latency.4.3.3DecompressionThe next source of latency is the time required for the decompression process. Depending on whatencoding method is used, the decoding will vary in time. The decoding latency depends very much on whathardware decoder support is present in the graphic card. It is usually faster to decode in hardware than insoftware. Generally, H.264 is harder to decode than MJPEG. When it comes to decoding in H.264, thelatency also depends on the profile chosen in the encoding phase. Base is the easiest to decode; main andhigh will take longer. The H.264 data stream produced by Axis video products requires the decoder tobuffer at least one frame.4.3.4Display device refresh rateThe display device’s refresh frequency also plays an important role. For TV the refresh rate could be up to1 sec. For computer monitor frames the refresh rate is around 14-15 ms, whereas special gaming monitorshave a refresh rate of 4-5 ms.5. Reducing latencyIt is important to keep in mind that designing a system to meet low-latency goals will require othertradeoffs. The user needs to decide what the acceptable latency is and find the optimum balance between video quality and cost of the surveillance system. It is either decrease video quality or invest inbetter hardware and software solutions. With this in mind, there are a few simple recommendations toreduce the end-to-end latency.8

5.1 Camera sideResolutionChoose a lower resolution if possible. Higher resolution implies more data to be encoded, this may leadto higher latency.EnhancementsImage enhancements (rotating, de-interlacing, scaling, etc.) may also add latency. Reduce these enhancement will reduce latency.EncodingMake sure the encoder provides the level of control over latency that your system requires. There needsto be a balance between the amount of data and the capacity of network infrastructure. If the video issent through a network with limited bandwidth, choose H.264 as the encoding method. This will lead tolower bitrate due to harder compression. Choose baseline profile if the network can manage the bitrate,as baseline will be easier to encode and decode. Motion JPEG is better from a latency standpoint if thenetwork can handle the 10 times higher bitrate.Number of streamsLimit the number of streams from camera with different settings. Each unique combination of settingssuch as resolution, frame rate and compression will required its own individual encoding process, addingload to the processor, causing delay.Frame rateUse as high frame rate as possible. As frames are encoded and decoded one frame at a time the bufferswill delay at least one frame. With higher frame rates the delays caused in buffers will be reduced.For a stream with 30 fps, each frame will take 1/30 of a second to capture. We can then expect alatency of 33 ms in buffers. For 25 fps we will have a delay of 40 ms.AudioAudio needs a higher playback buffer leading to longer latency if lip-synch is required with the video.By removing audio, you will remove an appreciable amount of latency.BitrateTo reduce the latency in the camera we need to reduce the amount of data being generated, andoutputted onto the link to be transferred to the other end for viewing. All boils down to the bitrate thecamera generate.Factors that affect bitrateCompression levelNumber of streamsCode CResolutionLight conditionsFrame rateScene typeGOV length9

5.2 NetworkMany of the recommendations mentioned above are aimed at limiting the total data volume being sentthough the network. In most cases, a limited network is the largest contributor to the end-to-endlatency. If the network has a high capacity then many of the above recommendations are not needed.Make sure that your network has a good quality of service, and that all the hops within the network areconfigured to suite your video demand. Make sure that your bitrate over the network is guaranteed tobe able to deliver the data output from the camera.5.3 Client sideThere is much to be done on the client side to reduce the end-to-end latency. Improvement in the clientside will make the most impact on the total latency.Processor and graphic cardThe CPU plays a central role in the client side latency. Make sure that you have a good processor withenough capacity to process the video stream and handle other requests simultaneously. Make sure youhave a good graphic card updated with the latest firmware with good support for decoding.Viewer/VMSSelection of viewer: make sure that your viewer doesn’t have an unnecessarily long play-out buffer.If it does, try to change it. Some viewers will have up to a few seconds of buffer. The video buffercompensates for variations introduced by network transport. The buffered frames are decoded andplayed out at a constant time interval, achieving a steady video stream. Use a sufficiently sized play-outbuffer in the media player on the receiver side to control latency, but be aware of the cost of video jitter.In AXIS Media Control, you have the choice in increase or decrease this buffer to find the optimal valuebetween the trade-off of Jitter and latency.DisplayUse display with as short refresh rate as possible. Another important step in getting a more pleasant liveview is to adjust the screen frequency to a multiple of the capture frame rate of the camera. An examplewould be 60Hz for 30 fps mode or 50 Hz for 25 fps mode. However, this does not affect the latency.Be sure to keep the graphic card’s driver updated.6. ConclusionThe process of live streaming in IP video surveillance is capturing in the camera’s device, packaging andtransporting through the network and unpacking in the receiver to display. Each of these steps can addmore or less latency.On the camera side the process is largely a shooting, enhance processing and compression and packaging. Roughly speaking, each frame takes a time gap of 1 / 30s exposure. It then requires a millisecond(ms) to scale and encode the image. The encoded image is then chopped up and packaged for thenetwork. Finally, an image is outputted onto the network every 33 ms. The time it takes for this processin the camera is under 50 ms. It varies slightly depending on whether the frame is an I- or P-frame andwhich camera it is (PTZ excluded). The variation is typically around 10 microseconds. In the big pictureof end-to-end latency, the camera only contributes to a small fraction of the total latency.The network latency can be very large or very small. It is the most unpredictable factor in the end-toend latency equation. Invest in a good network between the cameras and the client makes the networklatency more predictable. Network latency depends very much on the data to bandwidth ratio. Althougha lot of configuration in the camera can be made to reduce the latency, the main goal of these configurations is to reduce the amount of data generated, hence reducing packets in the network.10

In the client side, data is received and buffered to be sorted and queued out to the graphics card andmonitor. The receiving buffer in the client is the part that affects the latency the most, even up toseveral seconds. With a big buffer the chance of jerky video stream is reduced; video will be able to playevenly. However, that comes with a cost of added latency. A small buffer holds “fresh pictures” withshort latency, but risk of jerky video stream.To reduce latency is always a question of cost. Reduce the quality of the video or invest in a goodnetwork and good client side hardware and software. Usually the first choice is not preferred. Focusingin improvement in the two later choices will be a better return of investment in the context of latency.11

63380/EN/R1/1504About Axis CommunicationsAxis offers intelligent security solutions that enable a smarter, safer world. As the global marketleader in network video, Axis is driving the industry by continually launching innovative networkproducts based on an open platform - delivering high value to its customers and carried througha global partner network. Axis has long-term relationships with partners and provides them withknowledge and ground-breaking network products in existing and new markets.Axis has more than 1,900 dedicated employees in more than 40 countries around the world,supported by a network of over 75,000 partners across 179 countries. Founded in 1984, Axis is aSweden-based company listed on NASDAQ Stockholm under the ticker AXIS.For more information about Axis, please visit our website www.axis.com. 2015 Axis Communications AB. AXIS COMMUNICATIONS, AXIS, ETRAX, ARTPEC and VAPIX are registeredtrademarks or trademark applications of Axis AB in various jurisdictions. All other company names and productsare trademarks or registered trademarks of their respective companies. We reserve the right to introducemodifications without notice.

4.1.1 Capture latency Let us take a look inside the video camera. Images are made from pixels captured by the camera sensor. The capture frequency of a sensor defines how many exposures the sensor delivers per time unit, i.e. how many frames/ number of images it can capture per minute. Depending on which capture rate you choose