MP3
is a popular digital audio encoding and lossy compression format
invented and standardized in 1991 by a team of engineers working in
the framework of the ISO/IEC MPEG audio committee under the
chairmanship of Professor Hans Musmann (University of Hannover -
Germany). It was designed to greatly reduce the amount of data
required to represent audio, yet still sound like a faithful
reproduction of the original uncompressed audio to most listeners. In
popular usage, MP3 also refers to files of sound or music
recordings stored in the MP3 format on computers.
Overview
MP3
is a lossy compression format. It provides a representation of
pulse-code modulation-encoded (PCM) audio data in a much smaller size
by discarding portions that are considered less important to human
hearing (similar to JPEG, a lossy compression for images).
A
number of techniques are employed in MP3 to determine which portions
of the audio can be discarded, including psychoacoustics. MP3 audio
can be compressed with different bit rates, providing a range of
tradeoffs between data size and sound quality.
The
MP3 format uses, at its heart, a hybrid transformation to transform a
time domain signal into a frequency domain signal:
-
32-band
polyphase quadrature filter
-
36
or 12 tap MDCT; size can be selected independent for sub-band
0...1 and 2...31
-
Aliasing
reduction postprocessing
MP3
Surround, a version of the format supporting 5.1 channels for surround
sound, was introduced in December 2004. MP3 Surround is backward
compatible with standard stereo MP3, and file sizes are similar.
In
terms of the MPEG specifications, AAC (Advanced audio coding) from
MPEG-4 is to be the successor of the MP3 format, although there has
been a significant movement to create and popularize other audio
formats. Nevertheless, any succession is not likely to happen for a
significant amount of time due to MP3's overwhelming popularity (MP3
enjoys extremely wide popularity and support, not just by end-users
and software but by hardware such as DVD and CD players).
History
and Development
MPEG-1
Audio Layer 2 encoding began as the Digital Audio Broadcast (DAB)
project managed by Egon Meier-Engelen of the DFVLR (later on called
DLR = Deutsche Luft und Raumfahrt = German Aerospace Agency) in
Germany. This project was financed by the European Union as a part of
the EUREKA research program where it was commonly known as EU-147.
EU-147 ran from 1987 to 1994.
In
1991, there were two proposals available: Musicam (known as Layer 2),
and ASPEC (Adaptive Spectral Perceptual Entropy Coding). The Musicam
technique, as proposed by Philips (The Netherlands), CCETT (France),
IRT (Germany) was chosen due to its simplicity and error robustness,
as well as its low computational power associated to the encoding of
high quality compressed audio. The Musicam format based on subband
coding was key to settle the basis of the MPEG Audio compression
format (sampling rates, structure of frames, headers, number of
samples per frame). Its technologies and ideas were fully incorporated
into the definition of ISO MPEG Audio Layer I and Layer II and further
on of the Layer III (MP3) format. Under the chairmanship of Professor
Mussmann (University of Hannover) the editing of the standard was made
under the responsibilities of L. van de Kerkhof (Layer I) and G. Stoll
(Layer II).
Further
on a working group consisting of J. D. Johnston (US), Gerhard Stoll
(Germany), Yves-François Dehery (France), Karlheinz Brandenburg
(Germany) took ideas from Musicam and ASPEC, added some of their own
ideas and created MP3, which was designed to achieve the same quality
at 128 kbit/s as MP2 at 192 kbit/s.
All
algorithms were finalized in 1992 as part of MPEG-1, the first
standard suite by MPEG, which resulted in the international standard ISO/IEC
11172-3, published in 1993. Further work on MPEG audio was
finalized in 1994 as part of the second suite of MPEG standards,
MPEG-2, more formally known as international standard ISO/IEC
13818-3, originally published in 1995.
Compression
efficiency of encoders is typically defined by the bit rate because
compression rate depends on the bit depth and sampling rate of the
input signal. Nevertheless, there are often published compression
rates that use the CD parameters as references (44.1 kHz, 2 channels
at 16 bits per channel or 2x16 bit). Sometimes the Digital Audio Tape
(DAT) SP parameters are used (48 kHz, 2x16 bit). Compression ratios
with this reference are higher, which demonstrates the problem of the
term compression ratio for lossy encoders.
Karlheinz
Brandenburg used a CD recording of Suzanne Vega's song Tom's Diner
to assess the MP3 compression algorithm. This song was chosen because
of its softness and simplicity, making it easier to hear imperfections
in the compression format during playbacks. Some more serious and
critical audio excerpts (glockenspiel, triangle, accordion, ...) were
taken from the EBU V3/SQAM reference compact disc and have been used
by professional sound engineers to assess the subjective quality of
the MPEG Audio formats.
MP3
goes public
A
reference simulation software written in C language known as ISO
11172-5 was developed by the members of the ISO MPEG Audio committee
in order to produce bit compliant MPEG Audio files (Layer 1, Layer 2,
Layer 3). Working in non real time on a number of operating systems it
was able to demonstrate the first real time hardware decoding (DSP
based) of compressed audio. Some other real time implementation of
MPEG Audio encoders were available for the purpose of digital
broadcasting (radio DAB, television DVB) towards consumer receivers
and set top boxes.
Later
on, on July 7, 1994 the Fraunhofer Society released the first software
MP3 encoder called l3enc. The filename extension .mp3 was
chosen by the Fraunhofer team on July 14, 1995 (previously, the files
had been named .bit). With the first real-time software MP3
player Winplay3 (released September 9th, 1995) many people were able
to encode and playback MP3 files on their PCs. Because of the
relatively small hard drives back in that time (~500 MB) the
technology was essential to store music for listening pleasure on a
computer.
MP2
and MP3 and the Internet
In
October 1993, MP2 (MPEG-1 Audio Layer 2) files appeared on the
Internet and were often played back using the Xing MPEG Audio
Player, and later in a program for Unix by Tobias Bading called
MAPlay, which was initially released on February 22nd, 1994 (MAPlay
was also ported to the Microsoft Windows OS).
Initially
the only encoder available for MP2 production was the Xing Encoder,
accompanied by the program CDDA2WAV, a CD ripper that transformed CD
audio tracks to computer data files.
The
Internet Underground Music Archive (IUMA) is generally recognized as
the start of the on-line music revolution. IUMA was the Internet's
first high-fidelity music web site, hosting thousands of authorized
MP2 recordings before MP3 or the web was popularized. IUMA was started
by Rob Lord (who later headed pioneering Nullsoft) and Jeff Patterson,
both from the University of California, Santa Cruz, in 1993. Other
founding members include Jon Luini, Brandee Selck, and Ahin Savara.
In
the first half of 1995 through the late 1990s, MP3 files began
flourishing on the Internet. MP3 popularity was mostly due to, and
interchangeable with, the successes of companies and software packages
like Nullsoft's Winamp(released in 1997), mpg123, and Napster
(released in 1999). Those programs made it very easy for the average
user to playback, create, share, and collect MP3s.
Controversies
regarding peer-to-peer file sharing of MP3 files have flourished in
recent years — largely because high compression enables sharing of
files that would otherwise be too large and cumbersome to share. Due
to the vastly increased spread of MP3s through the Internet some major
record labels reacted by filing a lawsuit against Napster to protect
their Copyrights.
Commercial
online music distribution services (like the iTunes Music Store)
usually prefer other/proprietary music file formats that support
Digital Rights Management (DRM) to control and restrict the use of
digital music. This preference is most likely chosen in an attempt to
prevent piracy of copyright protected materials, but most users with
at least an intermediate understanding of computers will know that
it's just a matter of time before someone else makes it easy to
convert such proprietary file formats.
Quality
of MP3 audio
Because
MP3 is a lossy format, it is able to provide a number of different
options for its "bit rate"—that is, the number of bits of
encoded data that are used to represent each second of audio.
Typically rates chosen are between 128 and 256 kilobit per second. By
contrast, uncompressed audio as stored on a compact disc has a bit
rate of about 1400 kbit/s.
MP3
files encoded with a lower bit rate will generally play back at a
lower quality. With too low a bit rate, "compression
artifacts" (i.e., sounds that were not present in the original
recording) may appear in the reproduction. A good demonstration of
compression artifacts is provided by the sound of applause: it is hard
to compress because it is random, therefore the failings of the
encoder are more obvious, and are audible as ringing.
As
well as the bit rate of the encoded file, the quality of MP3 files
depend on the quality of the encoder and the difficulty of the signal
being encoded. For average signals with good encoders, many listeners
accept the MP3 bit rate of 128 kibit/s as near enough to compact disc
quality for them, providing a compression ratio of approximately 11:1.
However, listening tests show that with a bit of practice many
listeners can reliably distinguish 128 kbit/s MP3s from CD originals;
in many cases reaching the point where they consider the MP3 audio to
be of unacceptably low quality. Yet other listeners, and the same
listeners in other environments (such as in a noisy moving vehicle or
at a party) will consider the quality acceptable.
Fraunhofer
Gesellschaft (FhG) publish on their official webpage the following
compression ratios and data rates for MPEG-1 Layer 1, 2 and 3,
intended for comparison:
-
Layer
1: 384 kbit/s, compression 4:1
-
Layer
2: 192...256 kbit/s, compression 6:1...8:1
-
Layer
3: 112...128 kbit/s, compression 10:1...12:1
The
differences between the layers are caused by the different
psychoacoustic models used by them; the Layer 1 algorithm is typically
substantially simpler, therefore a higher bit rate is needed for
transparent encoding. However, as different encoders use different
models, it is difficult to draw absolute comparisons of this kind.
Many
people consider these quoted rates as being heavily skewed in favour
of Layer 2 and Layer 3 recordings. They would contend that more
realistic rates would be as follows:
-
Layer
1: excellent at 384 kbit/s
-
Layer
2: excellent at 256...384 kbit/s, very good at 224...256 Kbit/s,
good at 192...224 Kbit/s
-
Layer
3: excellent at 224...320 Kbit/s, very good at 192...224 Kbit/s,
good at 128...192 Kbit/s
When
comparing compression schemes, it is important to use encoders that
are of equivalent quality. Tests may be biased against older formats
in favour of new ones by using older encoders based on out-of-date
technologies, or even buggy encoders for the old format. Due to the
fact that their lossy encoding loses information, MP3 algorithms work
hard to ensure that the parts lost cannot be detected by human
listeners by modeling the general characteristics of human hearing
(e.g., due to noise masking). Different encoders may achieve this with
varying degrees of success.
A
few possible encoders:
-
LAME
first created by Mike Cheng in early 1998. It is (in contrast to
others) a fully LGPL'd MP3 encoder, with excellent speed and
quality, rivaling even MP3's technological successors.
-
Fraunhofer
Gesellschaft: Some encoders are good, some have bugs.
Many
early encoders that are no longer widely used:
Good
encoders produce acceptable quality at 128 to 160 Kibit/s and
near-transparency at 160 to 192 kbit/s, while low quality encoders may
never reach transparency, not even at 320 kbit/s. It is therefore
misleading to speak of 128 kbit/s or 192 kbit/s quality, except in the
context of a particular encoder or of the best available encoders. A
128 kbit/s MP3 produced by a good encoder might sound better than a
192 kbit/s MP3 file produced by a bad encoder.
It
is important to note that quality of an audio signal is subjective. A
given bit rate suffices for some listeners but not for others.
Individual acoustic perception may vary, so it is not evident that a
certain psychoacoustic model can give satisfactory results for
everyone. Merely changing the conditions of listening, such as the
audio playing system or environment, can expose unwanted distortions
caused by lossy compression. The numbers given above are rough
guidelines that work for many people, but in the field of lossy audio
compression the only true measure of the quality of a compression
process is to listen to the results.
If
your aim is to archive sound files with no loss of quality (or work on
the sound files in a studio for example), then you should use Lossless
compression algorithms, currently capable of compressing 16-bit PCM
audio to 38% while leaving the audio identical to the original, such
as Lossless Audio LA,
Apple Lossless, FLAC,
Windows Media Audio 9 Lossless (wma) and Monkey's Audio (among
others). Lossless formats are strongly preferred for material that
will be edited, mixed, or otherwise processed because the perceptual
assumptions made by lossy encoders may not hold true after processing.
The losses produced by multiple stages of coding may also compound
each other, becoming more evident when the signal is reencoded after
processing. Lossless formats produce the best possible result, at the
expense of a lower compression ratio.
Some
simple editing operations, such as cutting sections of audio, may be
performed directly on the encoded MP3 data without necessitating
reencoding. For these operations, the concerns mentioned above are not
necessarily relevant, as long as appropriate software (such as mp3DirectCut
and MP3Gain) is used to prevent extra decoding-encoding steps.
Bit
rate
The
bit rate is variable for MP3 files. The general rule is that more
information is included from the original sound file when a higher bit
rate is used, and thus the higher the quality during play back. In the
early days of MP3 encoding, a fixed bit rate was used for the entire
file.
Bit
rates available in MPEG-1 Layer 3 are 32, 40, 48, 56, 64, 80, 96, 112,
128, 160, 192, 224, 256 and 320 kibit/s, and the available sample
frequencies are 32, 44.1 and 48 kHz. 44.1 kHz is almost always used
(coincides with the sampling rate of compact discs), and 128 Kbit/s
has become the de facto "good enough" standard, although 192
Kbit/s is becoming increasingly popular over peer-to-peer file sharing
networks. MPEG-2 and [the non-official] MPEG-2.5 includes some
additional bit rates: 8, 16, 24, 32, 40, 48, 56, 64, 80, 96, 112, 128,
144, 160 Kibit/s
Variable
bit rates (VBR) are also possible. Audio in MP3 files are divided into
frames (which have their own bit rate) so it is possible to change the
bit rate dynamically as the file is encoded (although not originally
implemented, VBR is in extensive use today). This technique makes it
possible to use more bits for parts of the sound with higher dynamics
(more sound movement) and fewer bits for parts with lower
dynamics, further increasing quality and decreasing storage space.
This method compares to a sound activated tape recorder that reduces
tape consumption by not recording silence. Some encoders utilize this
technique to a great extent.
Design
limitations of MP3
There
are several limitations inherent to the MP3 format that cannot be
overcome by using a better encoder.
Newer
audio compression formats such as Vorbis and AAC no longer have these
limitations.
In
technical terms, MP3 is limited in the following ways:
-
Bitrate
is limited to a maximum of 320 kbit/s
-
Time
resolution can be too low for highly transient signals
-
No
scale factor band for frequencies above 15.5/15.8 kHz
-
Joint
stereo is done on a frame-to-frame basis
-
Encoder/decoder
overall delay is not defined, which means lack of official
provision for gapless playback; gaps may be introduced between
tracks, although this can be avoided to a degree by using LAME to
encode.
Nevertheless,
a well-tuned MP3 encoder can perform competitively even with these
restrictions.
Encoding
of MP3 audio
The
MPEG-1 standard does not include a precise specification for an MP3
encoder. The decoding algorithm and file format, as a contrast, are
well defined. Implementers of the standard were supposed to devise
their own algorithms suitable for removing parts of the information in
the raw audio (or rather its MDCT representation in the frequency
domain). This is the domain of psychoacoustics, which aims at
understanding how human acoustical perception works (both in our ears
and in our brain).
As
a result, there are many different MP3 encoders available, each
producing files of differing quality. Comparisons are widely
available, so it is easy for a prospective user of an encoder to
research the best choice. It must be kept in mind that an encoder that
is proficient at encoding at higher bitrates (such as LAME, which is
in widespread use for encoding at higher bitrates) is not necessarily
as good at other, lower bitrates.
Decoding
of MP3 audio
Decoding,
on the other hand, is carefully defined in the standard. Most decoders
are "bitstream compliant", meaning that the uncompressed
output they produce from a given MP3 file will be the same (within a
specified degree of rounding tolerance) as the output specified
mathematically in the standard document. The MP3 file has a standard
format which is a frame consisting of 384, 576, or 1152 samples
(depends on MPEG version and layer) and all the frames have associated
header information(32 bits) and side information(9, 17, or 32 bytes,
depending on MPEG version and stereo/mono).The header and side
information help the decoder to decode the associated huffman encoded
data correctly.
Therefore,
for the most part, comparison of decoders is almost exclusively based
on how computationally efficient they are (i.e., how much memory or
CPU time they use in the decoding process).
ID3
and other tags
-
Main
articles: ID3 and APEv2 tag
A
"tag" is data stored in an MP3 (as well as other formats)
that contains metadata such as the title, artist, album, track number
or other information about the MP3 file to be added to the file
itself. The most widespread standard tag formats are currently the ID3
ID3v1 and ID3v2 tags, and the more recent APEv2 tag.
APEv2
was originally developed for the MPC file format (see the
APEv2 specification). APEv2 can coexist with ID3 tags in the same
file, but it can also be used by itself.
Volume
normalization
As
compact discs and other various sources are recorded and mastered at
different volumes, it is useful to store volume information about a
file in the tag so that at playback time, the volume can be
dynamically adjusted.
A
few standards for encoding the gain of an MP3 file have been proposed.
The idea is to normalize the volume (not the volume peaks) of
audio files, so that the volume does not change between consecutive
tracks.
The
most popular and widely used solution for storing replay gain is known
simply as "Replay Gain". Typically, the average volume and
clipping information about an audio track is stored in the metadata
tag.
Alternative
technologies
Many
other lossy audio codecs exist, including:
-
MPEG-1/2
Audio Layer 2 (MP2), MP3's predecessor;
-
Ogg
Vorbis from the Xiph.org Foundation, a free software and patent
free codec.
-
MPC,
also known as Musepack (formerly MP+), a derivative of MP2;
-
mp3PRO
from Thomson Multimedia combining MP3 with SBR;
-
AC-3,
used in Dolby Digital and DVD;
-
ATRAC,
used in Sony's Minidisc;
-
MPEG-4
AAC, used by Apple's iTunes
Music Store and iPod
-
Windows
Media Audio (WMA) from Microsoft.
-
QDesign,
used in QuickTime at low bitrates;
-
AMR-WB+
Enhanced Adaptive Multi Rate WideBand codec, optimized for
cellular and other limited bandwidth use;
-
RealAudio
from RealNetworks, frequently in use for streaming on websites;
-
Speex,
free software and patent free codec based on CELP specifically
designed for speech and VoIP.
Mp3PRO,
MP3, AAC, and MP2 are all members of the same technological family and
depend on roughly similar psychoacoustic models. The Fraunhofer
Gesellschaft owns many of the basic patents
underlying these codecs, with Dolby Labs, Sony,
Thomson Consumer Electronics, and AT&T holding other key patents.
There
are also some lossless audio compression methods used on the Internet.
While they are not similar to MP3, they are good examples of other
compression schemes available. These include:
Listening
tests
have attempted to find the best-quality lossy audio codecs at certain
bitrates. The tests have suggested that for some audio samples, newer
audio codecs including Ogg Vorbis, mp3PRO, AC-3, Windows Media Audio,
MPC and RealAudio perform better than MP3. Generally, these codecs
achieve the equivalent of MP3 128kbit/s at around 80kbit/s. At
128kbit/s, Ogg Vorbis and MPC performed marginally better than other
codecs. At 64kbit/s, AAC and mp3pro performed marginally better than
other codecs. At high bitrates (128kbit/s+), most people do not hear
significant differences. What is considered 'CD quality' is quite
subjective; for some 128kbit/s MP3 is sufficient, while for others
192kbit/s MP3 is necessary.
Though
proponents of newer codecs such as WMA and RealAudio have asserted
that their respective algorithms can achieve CD quality at 64 kbit/s,
listening tests have shown otherwise; however, the quality of these
codecs at 64 kbit/s is definitely superior to MP3 at the same bitrate.
The developers of the patent-free Ogg Vorbis codec claim that their
algorithm surpasses MP3, RealAudio and WMA sound quality, and the
listening tests mentioned above support that claim. Thomson claims
that its mp3PRO codec achieves CD quality at 64 kbit/s, but listeners
have reported that a 64 kbit/s mp3PRO file compares in quality to a
112 kbit/s MP3 file and does not come reasonably close to CD quality
until about 80 kbit/s.
MP3,
which was designed and tuned for use alongside MPEG-1/2 Video,
generally performs poorly on monaural data at less than 48 kbit/s or
in stereo at less than 80 kbit/s.
Licensing
and patent issues
Thomson
Consumer Electronics controls licensing of the MPEG-1/2
Layer 3 patents in countries that recognize software patents,
including the United States and Japan, but not EU countries. Thomson
has been actively enforcing these patents.
In
September 1998, the Fraunhofer Institute sent a letter to several
developers of MP3 software stating that a license was required to
"distribute and/or sell decoders and/or encoders". The
letter claimed that unlicensed products "infringe the patent
rights of Fraunhofer and THOMSON. To make, sell and/or distribute
products using the [MPEG Layer-3] standard and thus our patents, you
need to obtain a license under these patents from us."
These
patent issues significantly slowed the development of unlicensed MP3
software and led to increased focus on creating and popularizing
alternatives such as WMA and Ogg Vorbis. Microsoft, the makers of the
Windows operating system, chose to move away from MP3 to their own
proprietary Windows Media formats to avoid the licensing issues
associated with the patents. Until the key patents expire, open source
/ free software encoders and players appear to be illegal for
commercial use in countries that recognize software patents.
For
information about licensing fees see here
and here.
In
spite of the patent restrictions, the perpetuation of the MP3 format
continues; the reasons for this appear to be the network effects
caused by:
-
familiarity
with the format, not knowing alternatives exist,
-
the
fact that these alternatives do not universally provide a definite
advantage over MP3,
-
the
large quantity of music now available in the MP3 format,
-
the
wide variety of existing software and hardware that takes
advantage of the file format,
-
the
lack of DRM-protection technology, which makes MP3 files easy to
edit, copy and distribute over networks,
-
the
majority of home users not knowing or not caring about the
software patent controversy, which is in general irrelevant to
their choice of the MP3 format for personal use.
Sisvel
S.p.A. and Audio MPEG, Inc., are suing Thomson for patent infringement
on MP3 technology. Audio MPEG also starts licensing MP3 to vendors of
MP3, so legal status of MP3 is unclear.
Online
music resources
Tools
such as iRate try to make it easier to find music that matches the
listener's tastes. There are several online music stores. Apple's
iTunes store is presently the most popular commercial online music
offering. Independent artists are able to use smaller sites to provide
distribution. A controversial MP3 portal is the Russian site
AllOfMP3.com, which through their country's copyright laws can legally
distribute music by any label or artist. The music industry has closed
down many file sharing networks and the publics urge for free mp3
downloads has made the way for sites as bestmp3links.com and Erik
Brown's MP3 Links that list links to free legal mp3 download sites.
There
are also several online columnists who edit news sites focused on
digital music and the grassroots community it spawned. They include
Richard Menta's MP3 Newswire, an early MP3 news site started in 1998,
Jon Newton's P2Pnet,
and Thomas Mennecke's Slyck.com. Other sites like Download.com and
Vitaminic.com which allow artists to choose to post their own music
for free download.
LINKS:
CONTACT
US
|