Real-Time Communication Without Plugins

Logo-webrtc

WebRTC stands for  Web Real-Time Communication, and it is a peer-to-peer communication technology for the browser, that enables video/audio calling and data sharing without additional plugins. WebRTC started as an effort by Google, to build a standard real-time Media Engine into all the available major browsers, and is now supported by Google, Mozilla and Opera. The API and underlying protocols are being developed jointly at the W3C and IETF. Similar attempts at implementing peer-to peer communications over the web have been made before by Adobe through their acquisition of Amicima in 2006  and subsequent Flash Player 10 (October 2008) and 10.1 (June 2010) releases. Somehow the peer-to-peer technology in Flash Player never took off.

The guiding principles of WebRTC project are that it is a free, standardized, open sourced project that enables Real-Time Communication across different browsers, using simple Javascript APIs.

Now you may say: “But we already have real-time communication technologies such as Flash Player (with AMS) and WebSockets , there is no need for WebRTC”.

All 3 are slightly different:

  • The WebSockets technology is all about providing a reliable real-time data connection via Javascript.
  • Flash Player uses RTMFP (Real Time Media Flow Protocol, UDP based) developed by Adobe and needs Adobe Media Server Extended or the Adobe Cirrus Service  to enable signaling and NAT transversal
  • WebRTC provides a browser infrastructure for real-time communication but provides no server side tool for signaling and NAT transversal.

The UDP Protocol Enabled in the Browser

WebRTC primarily uses the UDP protocol, which is a lot faster than TCP because it doesn’t deal with packet order or error correction. UDP is used in cases when only the latest piece of data is the most important and there is no need to wait for previous data. VoIP and multiplayer games are a very good example of applications that benefot from these characteristics of the UDP protocol. WebRTC makes UDP available in the browser without additional plugins.

What is the potential for WebRTC use?

WebRTC is primarily known for being a peer-to-peer, audio & video calling technology between browsers, similar to Skype, but WebRTC can do much more than that.

  • Collaborative activities
  • Multiplayer games in the browser
  • Peer-to-peer file sharing
  • Peer-to-peer CDN
  • Remote control of devices

So, as you can see, WebRTC has a high potential across multiple sides of technology. But what drives all of these capabilities , what are the inner workings used to produce such web apps? The answer my friends is a bunch of Javascript APIs that we will discuss in the next part.

The Javascript APIs

Currently WebRTC has three APIs:

  • MediaStream (aka getUserMedia)
  • RTCPeerConnection
  • RTCDataChannel

getUserMedia, like the name suggests, gets the video and audio, if available, from an input (ex web-cam), and outputs it in the browser via the  HTML5 <video> tag. To see it in action take a look at this cross-browser demo.

RTCPeerConnection lets you make peer-to-peer connections and attach media streams like video and audio. The name of the Chrome implementation of the API is prefixed with webkit. Firefox Aurora/Nightly named it mozRTCPeerConnection. When the standardization process will be complete the prefixes will be removed. Here’s a link to a demo of Chrome’s RTCPeerConnection. RTCPeerConnection is needed by video chat apps. Here is an example of this API in action: video-chat application.

RTCDataChannel lets you send arbitrary data across peer-to-peer connections.

The 3 APIs are supported as followed:

  1. PC/Mac
    • Google Chrome 23 (released on the 6th of  November 2012)
    • Mozilla Firefox 22 (released on the 25th of June 2013)
    • Opera 18  (released on the 18th of NOvember 2013)
    • Internet Explorer has no native support for WebRTC
    • Safari not supported
  2. Android
    • Google Chrome version 28 (Needs configuration at chrome://flags/)
    • Mozilla Firefox version 24 (also behind a flag)
    • Opera Mobile version 12 (only supports getUserMedia, no real peer-to-peer support)
  3. Google Chrome OS
  4. WebRTC is also supported by the Ericsson Bowser browser which runs on Android and iOS.

For a detailed comparison between browsers you can access this link.

MediaStream, aka  getUserMedia, in Detail

The MediaStream API is the part of WebRTC describing a stream of audio or video data, the methods for working with them, the success and error callbacks when using the data asynchronously, and the events that are fired during the process. Each MediaStream has an input, that can be a LocalMediaStream generated by navigator.getUserMedia(), and an output which could be passed to a video element from HTML5 or an RTCPeerConnection.

The getUserMedia() method takes three parameters:

  • A constraint object
  • A success callback function that passes the LocalMediaStream
  • A failure callback that passes an error object

Here’s an example of a simple implementation:code_getUserMedia

RTCPeerConnection in Detail

 RTCPeerConnection is the API used by WebRTC to communicate streaming data between browsers. It also needs a mechanism to coordinate communication and to send control messages, a process known as signaling. These signaling methods and protocols are not part of the RTCPeerConnection API. This way developers can choose what messaging protocol to use (ex: SIP or XMPP). Google provides the Channel API as a signaling mechanism. WebRTC has also been proven to work using WebSockets for signaling.

Signaling is used to exchange three types of information:

  • Session control messages: to initialize or close communication and report errors.
  • Network configuration: to the outside world, what’s my computers IP address and port.
  • Media capabilities: what codecs and resolutions can be handled by my browser and the browser it wants to communicate with.

All of this information must be exchanged successfully before a peer-to-peer streaming can be established.

Here’s a code sample from the WebRTC W3C Working Draft, which shows the signaling process in action. (The code assumes the existence of some signaling mechanism, created in the createSignalingChannel() method. Also note that on Chrome, RTCPeerConnection is currently prefixed.)

The code from the link above shows a simplified version of WebRTC from a signaling perspective. In the real world, WebRTC needs servers, however simple, in order to achieve the following:

  • Users discover each other and exchange real world information.
  • WebRTC client apps (peers) exchange network information.
  • Peers exchange media capabilities such as video format and resolution.
  • WebRTC client apps traverse NAT gateways and firewalls.

The requirements for building a server, NAT traversal and peer-to-peer networking exceed the scope of this article, however,it is important to remember the following: WebRTC uses the ICE protocol which in turn uses the STUN and it’s extension TURN protocol, to enable peer-to-peer communications (this is needed in order to enable peers behind a NAT, to find out the IP address and port). Google provides several STUN servers already.

WebRTC, as currently implemented, only supports one-to-one communication but can be used in more complex network scenarios: for example, with multiple peers each communicating each other directly, peer-to-peer, or via a centralized server.

So as we can see, WebRTC also needs a middleman (some kind of server) to handle peer to peer connections. Adobe provides the Adobe Cirrus Beta Service and the Adobe Media Server Extended to handle signaling for peer to peer apps developed in Flash Player using rtmfp.

One good example of an application that uses peer-to-peer technology and also needs servers in order to make user discovery and communication is Skype.

RTCDataChannel in Detail

The RTCDataChannel is a WebRTC API for high performance, low latency, peer-to-peer communication of arbitrary data. The API is simple—similar to WebSocket—but communication occurs directly between browsers, so RTCDataChannel can be much faster than WebSocket even if a relay (TURN) server is required when ‘hole punching’ to cope with firewalls and NATs fails.

Potential applications for the RTCDataChannel API are:

  • Gaming
  • File transfer
  • Real-time text chat
  • Remote desktop applications
  • Decentralized networks

The API provides several features that make the most out RTCPeerConnection:

  • Reliable and unreliable delivery semantics.
  • Built-in security (DTLS) and congestion control.
  • Multiple simultaneous channels with prioritization.
  • Ability to use with or without audio and video.
  • Leveraging of RTCPeerConnection session setup.

Video and Audio Codecs

As stated on the official WebRTC FAQ the currently supported codecs are:

Audio:

  • G.711
  • G.722
  • iLBC
  • iSAC

Video:

  • VP8

The codecs included in the WebRTC project are subject to change.

The huge dilemma right now is how will cross browser communication will actually work. Browsers differ in the audio and video codecs they support. So for example if Chrome encodes the video with VP8 and sends it to Firefox, and Firefox does not know how to decode VP8, then communication would not be possible.

At this time there is a war going on between  Google and Ericsson, on which codecs should be used as a standard for WebRTC.

Google’s side: VP8 for video and Opus for voice along with G.711. All are royalty free and provide high quality.

Ericsson’s side: H.264 for video with the prospect of H.265, G.719 or AMR-NB. Maybe even AMR-WB and EVS. ITU standards impose these codecs.

Microsoft is also pushing a proposal of their own for WebRTC called CU-RTC-Web, but for now this only remains a proposal.

Google comes from the Internet world. In it, royalty free is an asset, making VP8 a better option than H.264. The selection of Opus, which is a royalty free audio codec, comes from the fact that it was developed and standardized by the IETF (where the Internet lives) and is a “derivative work” of Skype’s SILK codec. It is considered a good codec, but for now it is not included in the WebRTC project.

Ericsson’s is coming from the mobile and the ITU standardization work. All codecs suggested by Ericsson come either from the ITU or mobile (3GPP), so it makes sense for them to support this angle. Ericsson also has patents in H.264, making it a benefactor of royalty payments from the use of this codec.

Microsoft? They’re on Ericsson’s side. They are looking for more options, power, flexibility. But by doing that, they are complicating the solution and probably running it to the ground.

If the IETF will settle on multiple codecs for voice and video as mandatory ones this is going to be bad for the industry: the simple solution should win – it will make it easier for companies to adopt and for disruption to appear. If we will end up with 4 or more voice codecs and 2 or more video codecs, then we are in for the same hurdles, of not having a standardized set of codecs, we have today with other VoIP standards.

For now it seems Google has the upper hand. Hope it stays this way, but if the multiple codecs approach will be adopted, transcoding will be needed for communications between browsers, and that is not a good thing at all, because transcoding adds latency, reduces quality, it is expensive and a trusted third-party will always have to be involved.

Security

There are several ways a real-time communication application or plugin might compromise security. For example:

  • Unencrypted media or data might be intercepted en route between browsers, or between a browser and a server.
  • An application might record and distribute video or audio without the user knowing.
  • Malware or viruses might be installed alongside an apparently innocuous plugin or application.

 WebRTC provides the following solutions to avoid these problems:

  • WebRTC implementations use secure protocols such as DTLS and SRTP.
  • Encryption is mandatory for all WebRTC components, including signaling mechanisms.
  • WebRTC is not a plugin: its components run in the browser sandbox and not in a separate process, components do not require separate installation, and are updated whenever the browser is updated.
  • Camera and microphone access must be granted explicitly and, when the camera or microphone are running, this is clearly shown by the user interface.

WebRTC vs P2P implementation in Flash Player

Adobe Cirrus (formerly known as Adobe Stratus) enables peer assisted networking using the Real Time Media Flow Protocol (RTMFP) within the Adobe Flash Platform. In order to use RTMFP, Flash Player endpoints must connect to an RTMFP-capable server, such as the Cirrus service. Cirrus is a beta, hosted rendezvous service that aids establishing communications between Flash Player endpoints. This is a free service. The second solution is the purchasing of Adobe Media Server Extended edition, the only version that supports RTMFP.

Let’s see some comparison between Adobe’s approach and WebRTC.

 WebRTCP2P implementation in Flash Player
Availability DateMay 2011October 2008
Maturity DateNot YetJune 2010
Servers NeededService server and STUN/TURN serversRTMFP enabled Servers (Adobe Media Server Extended which costs around 45000 $ or Adobe Cirrus Service which is free)
Encryption TechnologyDTLS and SRTP 128-bit AES
NAT TraversalICE protocol and STUN/TURN protocolsTURN protocol
Famous Apps Using the TechnologyNoneNone

Conclusions

The standards and APIs of WebRTC are still in the working and the technology behind it is still not fully developed, fact that can be seen by not being fully supported yet across all platforms and browsers. Adobe also spent 2 to 4 years developing peer-to-peer communication technology, but it never really caught any real attention and that is because technology should walk in the steps of desire and need and not the other way around.

Leave a Reply

Your email address will not be published. Required fields are marked *