This document explains about VoIP systems. Recent happenings like Internet diffusion at low cost, new integration of dedicated voice compression processors, have changed common user requirements allowing VoIP standards to diffuse. This howto tries to define some basic lines of VoIP architecture.
Please send suggestions and critics to my email address
Copyright (C) 2000,2001 Roberto Arcomano. This document is free; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This document is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You can get a copy of the GNU GPL here
If you want to translate this document you are free, you only have to:
Warning! You don't have to translate TXT or HTML file, you have to modify LYX file, so that it is possible to convert it all other formats (TXT, HTML, RIFF, etc.): to do that you can use "LyX" application you download from http://www.lyx.org.
No need to ask me to translate! You just have to let me know (if you want) about your translation.
Thank you for your translation!
Thanks to Fatamorgana Computers for hardware equipment and experimental opportunity.
Thanks to Linux Documentation Project for publishing and uploading my document in a very quickly fashion.
Thanks to David Price for his support.
More than 30 years ago Internet didn't exist. Interactive communications were only made by telephone at PSTN line cost.
Data exchange was expansive (for a long distance) and no one had been thinking to video interactions (there was only television that is not interactive, as known).
Few years ago we saw appearing some interesting things: PCs to large masses, new technologies to communicate like cellular phones and finally the great net: Internet; people begun to communicate with new services like email, chat, etc. and business reborned with the web allowing people buy with a "click".
Today we can see a real revolution in communication world: everybody begins to use PCs and Internet for job and free time to communicate each other, to exchange data (like images, sounds, documents) and, sometimes, to talk each other using applications like Netmeeting or Internet Phone. Particularly starts to diffusing a common idea that could be the future and that can allow real-time vocal communication: VoIP.
We cannot know what is the future, but we can try to image it with many computers, Internet almost everywhere at high speed and people talking (audio and video) in a real time fashion. We only need to know what will be the means to do this: UMTS, VoIP (with video extension) or other? Anyway we can notice that Internet has grown very much in the last years, it is free (at least as international means) and could be the right communication media for future.
VoIP stands for 'V'oice 'o'ver 'I'nternet 'P'rotocol. As the term says VoIP tries to let go voice (mainly human) through IP packets and, in definitive through Internet. VoIP can use accelerating hardware to achieve this purpose and can also be used in a PC environment.
Many years ago we discovered that sending a signal to a remote destination could have be done also in a digital fashion: before sending it we have to digitalize it with an ADC (analog to digital converter), transmit it, and at the end transform it again in analog format with DAC (digital to analog converter) to use it.
VoIP works like that, digitalizing voice in data packets, sending them and reconverting them in voice at destination.
Digital format can be better controlled: we can compress it, route it, convert it to a new better format, and so on; also we saw that digital signal is more noise tolerant than the analog one (see GSM vs TACS).
TCP/IP networks are made of IP packets containing a header (to control communication) and a payload to transport data: VoIP use it to go across the network and come to destination.
Voice (source) - - ADC - - - - Internet - - - DAC - - Voice (dest)
When you are using PSTN line, you typically pay for time used to a PSTN line manager company: more time you stay at phone and more you'll pay. In addition you couldn't talk with other that one person at a time.
In opposite with VoIP mechanism you can talk all the time with every person you want (the needed is that other person is also connected to Internet at the same time), as far as you want (money independent) and, in addition, you can talk with many people at the same time.
If you're still not persuaded you can consider that, at the same time, you can exchange data with people are you talking with, sending images, graphs and videos.
Unfortunately we have to report some problem with the integration between VoIP architecture and Internet. As you can easy imagine, voice data communication must be a real time stream (you couldn't speak, wait for many seconds, then hear other side answering): this is in contrast with the Internet heterogeneous architecture that can be made of many routers (machines that route packets), about 20-30 or more and can have a very high round trip time (RTT), so we need to modify something to get it properly working.
In next sections we'll try to understand how to solve this great problem. In general we know that is very difficult to guarantee a bandwidth in Internet for VoIP application.
Here we see some important info about VoIP, needed to understand it.
To setup a VoIP communication we need:
Base architecture Voice )) ADC - Compression Algorithm - Assembling RTP in TCP/IP ----- ----> | <---- | Voice (( DAC - Decompress. Algorithm - Disass. RTP from TCP/IP -----
This is made by hardware, typically by card integrated ADC.
Today every sound card allows you convert with 16 bit a band of 22050 Hz (for sampling it you need a freq of 44100 Hz for Nyquist Principle) obtaining a throughput of 2 bytes * 44100 (samples per second) = 88200 Bytes/s, 176.4 kBytes/s for stereo stream.
For VoIP we needn't such a throughput (176kBytes/s) to send voice packet: next we'll see other coding used for it.
Now that we have digital data we may convert it to a standard format that could be quickly transmitted.
PCM, Pulse Code Modulation, Standard ITU-T G.711
ADPCM, Adaptive differential PCM, Standard ITU-T G.726
It converts only the difference between the actual and the previous voice packet requiring 32 kbps (see Standard ITU-T G.726).
LD-CELP, Standard ITU-T G.728 CS-ACELP, Standard ITU-T G.729 and G.729a MP-MLQ, Standard ITU-T G.723.1, 6.3kbps, Truespeech ACELP, Standard ITU-T G.723.1, 5.3kbps, Truespeech LPC-10, able to reach 2.5 kbps!!
This last protocols are the most important cause can guarantee a very low minimal band using source coding; also G.723.1 codecs have a very high MOS (Mean Opinion Score, used to measure voice fidelity) but attention to elaboration performance required by them, up to 26 MIPS!
Now we have the raw data and we want to encapsulate it into TCP/IP stack. We follow the structure:
VoIP data packets RTP UDP IP I,II layers
VoIP data packets live in RTP (Real-Time Transport Protocol) packets which are inside UDP-IP packets.
Firstly, VoIP doesn't use TCP because it is too heavy for real time applications, so instead a UDP (datagram) is used.
Secondly, UDP has no control over the order in which packets arrive at the destination or how long it takes them to get there (datagram concept). Both of these are very important to overall voice quality (how well you can understand what the other person is saying) and conversation quality (how easy it is to carry out a conversation). RTP solves the problem enabling the receiver to put the packets back into the correct order and not wait too long for packets that have either lost their way or are taking too long to arrive (we don't need every single voice packet, but we need a continuous flow of many of them and ordered).
Real Time Transport Protocol 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | contributing source (CSRC) identifiers | | .... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
For a complete description of RTP protocol and all its applications see relative RFCs 1889 and 1890.
There are also other protocols used in VoIP, like RSVP, that can manage Quality of Service (QoS).
RSVP is a signaling protocol that requests a certain amount of bandwidth and latency in every network hop that supports it.
For detailed info about RSVP see the RFC 2205
We said many times that VoIP applications require a real-time data streaming cause we expect an interactive data voice exchange.
Unfortunately, TCP/IP cannot guarantee this kind of purpose, it just make a "best effort" to do it. So we need to introduce tricks and policies that could manage the packet flow in EVERY router we cross.
So here are:
For an exhaustive information about QoS see Differentiated Services at IETF.
H323 protocol is used, for example, by Microsoft Netmeeting to make VoIP calls.
This protocol allow a variety of elements talking each other:
h323 allows not only VoIP but also video and data communications.
Concerning VoIP, h323 can carry audio codecs G.711, G.722, G.723, G.728 and G.729 while for video it supports h261 and h263.
More info about h323 is available at Openh323 Standards, at this h323 web site and at its standard description: ITU H-series Recommendations.
You can find it implemented in various application software like Microsoft Netmeeting, Net2Phone, DialPad, ... and also in freeware products you can find at Openh323 Web Site.
To create a little VoIP system you need the following hardware:
All that has to be present twice to simulate a standard communication.
The tool above are the minimal requirement for a VoIP connection: next we'll see that we should (and in Internet we must) use more hardware to do the same in a real situation.
Sound card has be full duplex unless we couldn't hear anything while speaking!
As additional you can use hardware cards (see next) able to manage data stream in a compressed format (see Par 4.3).
We can use special cards with hardware accelerating capability. Two of them (and also the only ones directly managed by the Linux kernel at this moment) are the
Quicknet PhoneJack is a sound card that can use standard algorithms to compress audio stream like G723.1 (section 4.3) down to 4.1 Kbps rate.
It can be connected directly to a phone (POTS port) or a couple mic-speaker.
It has a ISA or PCI connector bus.
Quicknet LineJack works like PhoneJack with some addition features (see next).
VoiceTronix V4PCI is a PCI card pretty like Quicknet LineJack but with 4 phone ports
VoiceTronix VPB4 is a ISA card equivalent to V4PCI.
VoiceTronix VPB8L is a logging card with 8 ports.
For more info see Quicknet web site and VoiceTronix web site
Quicknet LineJack and VoiceTronix cards can be connected to a PSTN line allowing VoIP gateway feature.
Then you'll need a software to manage it (see after).
We can choose what O.S. to use:
Under Win9x we have Microsoft Netmeeting, Internet Phone, DialPad or others or Internet Switchboard (from Quicknet web site) for Quicknet cards.
Warning!!: Latest Quicknet cards using Swithboard (older version too) NEED to be connected to Internet to get working for managing Microtelco account (not free of charge), so if you plan to remain isolated from Internet you need to install OpenH323 software.
For VoiceTronix cards you can find software at VoiceTronix web site
Under Linux we have free software GnomeMeeting, a clone of Microsoft Netmeeting, while in console mode we use (also free software) applications from OpenH323 web site: simph323 or ohphone that can also work with Quicknet accelerating hardware.
Attention: all Openh323 source code has to be compiled in a user directory (if not it is necessary to change some environment variable). You are warned that compiling time could be very high and you could need a lot of RAM to make it in a decent time.
To manage gateway feature (join TCP/IP VoIP to PSTN lines) you need some kind of software like this:
You can choose as gatekeeper:
In addition I report some useful software h323 compliant:
Here we see how to configure special hardware card in Linux and Windows environment.
As we saw, Quicknet Phonejack is a sound card with VoIP accelerating capability. It supports:
Quicknet PhoneJack is a ISA (or PCI) card to install into your Pc box. It can work without an IRQ.
Under Windows you have to install:
all downloadable from Quicknet web site
After Switchboard has been installed, you need to register to Quicknet to obtain full capability of your card.
When you pick up the phone Internet Switchboard wakes up and waits for your calling number (directly entered from your phone), you can:
Internet Swichboard is h323 compatible, so if you can use, for example, Microsoft Netmeeting at the other end to talk.
Warning!! Internet Switchboard NEED to be connected to Internet when used with newer Quicknet cards
In place of Internet Switchboard you can use openh323 application openphone (using GUI) or ohphone (command line).
Under Linux you have to install:
With Internet Switchboard (and with other application) you can:
This card is very similar to the previous, it supports also gateway feature.
We only notice that we have to download PSTNGx application (for Linux and Windows) or we use Internet Switchboard to gateway feature.
Follow README file for more help.
I personally haven't tested VoiceTronix products so please contact VoiceTronix web site for support.
In this chapter we try to setup VoIP system, simple at first, then more and more complex.
A (Sound card) - - - B (Sound card) 192.168.1.1 - - - 192.168.1.2 192.168.1.1 calls 192.168.1.2 and viceversa.
A and B should have
In this kind of view A can make a H323 call to B (if B has server side application active) using B IP address. Then B can answer to it if it wants. After accepting call, VoIP data packets start to flow.
Under Microsoft Windows a NetBIOS name can be used instead of an IP address.
A - - - B 192.168.1.1 - - - 192.168.1.2 John - - - Alice John calls Alice.
This is possible cause John call request to Alice is converted to IP calling by the NetBIOS protocol.
The above 2 examples are very easy to implement but aren't scalable.
In a more big view such as Internet it is impossible to use direct calling cause, usually, the callers don't know the destination IP address. Furthermore NetBIOS naming feature cannot work cause it uses broadcast messages, which typically don't pass ISP routers .
You can also use DNS to solve name in IP address: for example you can call ''box.domain.com''.
The NetBIOS name calling idea can be implemented also in a Internet environment, using a WINS server: NetBIOS clients can be configured to use a WINS server to resolve names.
PCs using the same WINS server will be able to make direct calling between them.
A (WINS Server is S) - - - - I - - - - B (WINS Server is S) N T E - - - - - S (WINS Server) C (WINS Server is S) - - - - R N E - - - - D (WINS Server is S) T Internet communication
A, B, C and D are in different subnets, but they can call each other in a NetBIOS name calling fashion. The needed is that all are using S as WINS Server.
Note: WINS server hasn't very high performance cause it use NetBIOS feature and should only be used for joining few subnets.
ILS is a kind of server which allows you to solve your name during an H323 calling: when you start VoIP application you first register to ILS server using a name, then everyone will be able to see you using that name (if he uses same Server ILS!).
A problem of few IPs is commonly solved using the so called masquering (also NAT, network address translation): there is only 1 IP public address (that Internet can directly "see"), the others machines are "masqueraded" using all this IP.
A - - - B - - - Router with NAT - - - Internet C - - - This doesn't work
In the example A,B and C can navigate, pinging, using mail and news services with Internet people, but they CANNOT make a VoIP call. This because H323 protocol send IP address at application level, so the answer will never arrive to source (that is using a private IP address).
A - - - Router with NAT B - - - + - - - Internet C - - - ip_masq_h323 module This works
A - - - B - - - PhonePatch - - - Internet C - - - This works
"ohphone -l|--listen [options]"
"ohphone [options]... address"
Also, when you start ohphone, you can give command to the interpreter directly (like decrease AEC, Automatic Echo Cancellation).
Gnomemeeting is an application using GUI interface to make call using VoIP. It is very simple to use and allows you to use ILS server, chat and other things.
You can also experiment gatekeeper feature
Example (Terminal H323) A - - - \ (Terminal H323) B - - - D (Gatekeeper) / (Terminal H323) C - - - Gatekeeper configuration
We have to notice that the Gatekeeper is able only to solve name in IP address, it couldn't join hosts that aren't reachable each other (at IP level), in other words it couldn't act as a NAT router.
You can find gatekeeper code here: openh323 library is also required.
Program has only to be launch with -d (as daemon) or -x (execute) parameter.
In addition you can use a config file (.ini) you find here.
As we said, gateway is an entity that can join VoIP to PSTN lines allowing us to made call from Internet to a classic telephone. So, in addition, we need a card that could manage PSTN lines: Quicknet LineJack does it.
From OpenH323 web site we download:
If executable doesn't work you need to download source code and openh323 library, then install all in a home user directory.
After that you only need to launch PSTNGw to start your H323 gateway.
First Matrix refers to:
_____________________________________________________________________________________________________________________ | | Netmeeting |SwitchBoard | Simph323 | OhPhone | LinPhone |Speak-Freely|HW PhoneJACK|HW LineJACK | |____________|____________|____________|____________|____________|_____________|____________|____________|____________| | Netmeeting | V V V V X X V V |____________|____________|____________|____________|____________|_____________|____________|____________|____________| |SwitchBoard | V V V V X X V V |____________|____________|____________|____________|____________|_____________|____________|____________|____________| | Simph323 | V V V V X X X X |____________|____________|____________|____________|____________|_____________|____________|____________|____________| | OhPhone | V V V V X X V V |____________|____________|____________|____________|____________|_____________|____________|____________|____________| | LinPhone | X X X X V X X X |____________|____________|____________|____________|____________|_____________|____________|____________|____________| |SpeakFreely | X X X X X V X X |____________|____________|____________|____________|____________|_____________|____________|____________|____________| |HW PhoneJACK| V V X V X X _ _ |____________|____________|____________|____________|____________|_____________|____________|____________|____________| |HW LineJACK | V V X V X X _ _ |____________|____________|____________|____________|____________|_____________|____________|____________|____________|
Second Matrix refers to Gateway softwares that manage LineJACK card.
___________________________________________________________ | |HW LineJACK GW| SwitchBoard | PSTNGW | |______________|______________|______________|______________| |HW LineJACK GW| _ | V | V | |______________|______________|______________|______________| | SwitchBoard | V | _ | _ | |______________|______________|______________|______________| | PSTNGW | V | _ | _ | |______________|______________|______________|______________|
VoIP becomes very interesting when you start to use PSTN lines to call other people in the world, directly to their home telephone.
A typical application is like that:
Home telephone1 -- (PSTN) -- PC1 -- (Internet) -- PC2 -- (PSTN) -- Home telephone2
So your decision will be taken considering PSTN line costs. In fact what VoIP does is the convert this:
Home Telephone1 --- (PSTN) --- Home Telephone2 PSTN great distance calling cost
Home Telephone1 --- (PSTN) --- PC1 + PC2 ---- (PSTN) --- Home Telephone2 = -------------------------------------- 2 PSTN short distance calling costs
To save money you need that:
2 PSTN short distance calling costs < PSTN great distance calling cost
Typically "short distance calling" refers to a "city cal" while "great distance calling" can be an "intercontinental call"!
From all we said before we noticed that we still have not solved problems about bandwidth, how to create a real time streaming of data.
We know we couldn't find a solution unless we enable a right real-time manager protocol in each router we cross, so what do we can do?
First we try to use a very (as more as possible) high rate compression algorithms (like LPC10 which only consumes a 2.5 kbps bandwidth, about 313 bytes/s).
Then we starts classify our packets, in TOS field, with the most high priority level, so every router help us having urgently.
Important: all that is not sufficient to guarantee our conversation would always be ok, but without an great infrastructure managing shaping, bandwidth reservation and so on, it is not possible to do it, TCP/IP is not a real time protocol.
A possible solution could be starts with little WAN at guaranteed bandwidth and get larger step by step.
We finally have to notice a thing: also the so called guaranteed services like PSTN line could not manage all clients they have: for example a GSM call is not able to manage more that some hundred or some thousand of clients.
Anyway for a starting service, limited to few users, VoIP can be a valid alternative to classic PSTN service.
PSTN: Public Switched Telephone Network
VoIP: Voice over Internet Protocol
LAN: Local Area Network
WAN: Wide Area Network
TOS: Type Of Service
ISP: Internet Service Provider
RTP: Real Time Protocol
RSVP: ReSerVation Protocol
QoS: Quality of Service