Utilization of High Definition Audio to Create CD-ROM-Based Multimedia for Ear Training and Microphone Technique
Doug Mitchell
Introduction
This paper serves as an introduction to the use of audio and various audio
file formats for use in integrated multimedia productions. It will also
detail the process of putting together a multimedia CD-ROM designed to examine
microphone techniques. The goal of the multimedia audio producer is to utilize
a variety of techniques for recording and formatting audio while preserving
the highest possible quality for playback in the various multimedia formats.
In many cases, this goal is compromised by the conflicting demands of data
preservation, available data real estate and conventional data throughput
on eventual end-user platforms
The field of multimedia production and deployment on CD-ROM continues to
change rapidly with advances in technology - both for authoring as well
as deployment. The techniques, formats and considerations illustrated in
this paper, then, should be considered as a static mile marker on the winding
highway of technological progress in electronic multimedia.
Specialized CD-ROM and the Music and Professional
Audio Target Market
Many CD-ROM titles available today target interests in music. Microsoft
has produced a popular title on musical instruments. Todd Rundgren, David
Bowie, The Rolling Stones, The Cranberries, Sarah McGlaughlin, Peter Gabriel,
The Beastie Boys, and the Residents have all produced popular music CD-ROM
titles or Enhanced CD+ titles. CD+ is a hybrid, mixed-mode CD with the first
track representing data (sometimes indicated as track one and certainly
to be avoided for playback over loudspeakers), and the balance of tracks
(2 through 99) represented as Red Book digital audio. Other approaches to
the use of Enhanced CDs include the use of a "zero" track integrated
with index and pre-gap area where data can be stored (this type of disc
has been referred to a CD-ROM-ready and is proposed for inclusion in the
Blue Book standard) and a "Multisession" approach where Red Book
audio is stored in session one on the disc and data is stored as session
two. Specialized software drivers are required by many CD-ROM players to
recognize the second session on the disc. [1], [2] The Todd Rundgren, Peter
Gabriel, and David Bowie CD titles feature various levels of interaction
with the user. Classical music is well represented with CD-ROM titles available
on Beethoven, Brahams, Stravinsky, Mozart and a host of others which feature
not only the music, but information on the scores, reviews by musicologists,
period art, and so forth. The Voyager company has released a CD-ROM version
of "A Hard Day's Night" with the complete film displayed in QuickTime
movie format with the script, photos, and a running dialogue with the director,
Richard Lester. The list of available titles in this genre grows daily.
More recently, some multimedia producers have begun to tackle various aspects
of professional audio. A good example of this type of target market approach
is the "Allen Sides Microphone Cabinet" produced by Light Rail
Communications for Cardinal Business Media, publishers of Mix Magazine.
[3] This title utilizes a mixed-mode method of CD-ROM authoring which allows
the user to access the audio for various microphone examples displayed via
Red Book audio tracks. This title has proven to be an excellent supplemental
tool for teaching classes in recording technology.
Light Rail Communications also produces a quarterly CD-ROM professional
audio magazine called "Control". [4] This CD-ROM, like the Allen
Sides' title, is also produced as a mixed-mode title. It contains both data
and Red Book audio in its articles on various products and people in the
recording industry. This has also proved to be a wonderful teaching aid.
Most recording studios utilize at least one computing platform. CD-ROM and
other multimedia titles are also proving to be good for use as professional
audio equipment sales tools and for use as on-line instruction manuals,
calibration tools, reference recordings and so forth.
Attempts to Utilize Audio in Multimedia
Since the term multimedia has been utilized in the computer world it has
always included sound as one of the principle media involved. The Apple
Macintosh computer unveiled in January, 1984 included sound capabilities.
However, until the AV Macintosh line was introduced in 1993, the Macintosh
sound recording and playback capability was limited to 8-bit sampling. The
Multimedia Personal Computer standard (MPC) implemented in 1991 also included
sound. [5] The current MPC level 2 which included a 16-bit audio standard
was formalized in 1993. [6], [7] Most sound recording and playback hardware
available today for either the Macintosh or the PC platform now support
compact disc-quality audio capabilities with 44.1 kHz sampling and 16-bit
sample depth. However, processing full bandwidth audio in stereo requires
a system throughput of 176.4 kilobits per second. This data rate is feasible,
but can cause conflicts with other data attempting to stream off a hard
drive or a CD-ROM simultaneously.
Most of us are familiar with the early uses of audio in computer systems.
On PCs, the internal speaker can be caused to make sounds when programming
a single bit controlling the rate of a dc voltage. This produces the familiar
system beeps. Some developers began to utilize this programmability to produce
simple sounds for games or other applications. However, once the sound card
was available for the PC, numerous other applications for sound on the computer
began to appear.
Low Bit Rate Audio
Perhaps the most severe compromise in working with multimedia audio is the
need to utilize low definition audio characterized by low sample rate and
truncated bit depth. Although it is possible to produce audio files which
exhibit some degree of fidelity at bit depths of 8 bit and even less, the
procedure to do so will generally require that the dynamic range of the
audio be severely limited. The noise resulting from lower bit sampling is
masked as long as the sampled sound amplitude remains high.
Reducing sampling rate also reduces data. So does conversion of stereo to
mono. This may appear to be obvious, but it can be disheartening for the
audio engineer to listen to the sound once it has been down-sampled, bit-reduced,
and converted to mono.
Fortunately there are some alternatives which are beginning to make an impact
on data-reduced audio in multimedia. Most of these are lossy compression
algorithms based upon perceptual coding. [8] Although much research is being
directed toward reduced bit rate perceptual coding schemes, there are several
encoder/decoders (codecs) which are beginning to make their way into use
for multimedia audio applications. Included among these systems are Dolby
Laboratories' AC3 [9], Motion Picture Experts Group (MPEG) Audio Layer 1
and Layer 2 [10], [11], and The International Multimedia Association (IMA)
Adaptive Delta Pulse Code Modulation (ADPCM) format for cross-platform compatibility.
[12], [13]
Although each of these codecs can be compiled via software, the processing
of the sound file is CPU intensive. Depending upon which algorithm is utilized,
the heaviest processing may take place at the encode stage allowing decoded
playback to occur fairly rapidly. Each of these codecs can achieve data
reduction rates of anywhere from 4:1 to 22:1 depending upon the dynamic
nature of the file to be coded and the algorithm employed. All of the codecs
mentioned above are also being implemented in computing hardware to expedite
the encode/decode process. As of this writing, however, hardware solutions
to the use of codecs for multimedia audio are not widely implemented. The
Intel Corporation has recently introduced a new pentium-based processor
chip called MMX which allows the acceleration of codec operations. [14]
In order for any hardware solution to have an effect on the use of perceptually-coded
files for multimedia will require users to add to, or replace, existing
hardware on their computing systems. A good indication of the potential
for long term success of these file formats is that the manufacturers involved
with each of the systems are continuing to do research on perceptual coding
and are maintaining an open format for forward and backward compatibility
with new development.
Problems in Aural Multimedia Utilization
Unfortunately for those who regard audio quality with high esteem, audio
is generally the least significant element in many multimedia productions.
Video, text, and graphic content usually take predominance when construction
of multimedia is underway. All too often, the producer of audio elements
for multimedia productions is left to work within severe constraints regarding
disk space and memory utilization. Therefore, the audio on many finished
multimedia projects becomes a caricature. Once imagined majestic themes
are ported to General MIDI soundcards with their oft distinguishing trait
of tinny FM synthesis. Dynamic, full bandwidth audio samples are compressed,
re-sampled, and truncated to fit into ever smaller packages. In the end,
there is sound. But the price paid besmears the quality overall. This situation
has improved recently with advancements in lossy compression technology
and parallel advancements in application of sampling technology to MIDI
synthesis in the multimedia domain. Additionally, there have been extreme
and rapid advancements in the hardware used to play multimedia applications.
File Formats
Perhaps one of the most daunting problems facing the author of multimedia
presentations is the number of different audio file formats which exist.
Needless to say, not all of these file types are compatible. As with text
and graphic documents, some sound formats are proprietary to the platform
on which they were developed. For the multimedia author, this presents a
variety of problems including cross-platform viability.
Additionally, the sample rate of some formats are incompatible with those
in other formats. For example, prior to Macintosh system 7, system sounds
were encoded at 22.254 and 11.127 kHz rather than at the evenly divisible
CD audio compact disc rates of 22.05 and 11.025 kHz. [15]
Regardless of format, however, most sound files are stored as raw pulse
code modulation (PCM) data. What differs in many of these formats is file
header information which tells the computing platform what type of data
to expect. Additionally, files are stored differently on various platforms.
For example, Macintosh platforms require a resource fork in addition to
the data. Windows and UNIX systems, on the other hand, do not require this
fork, but do require an extension in the file name.
Format Conversion
It is possible to perform sound file format conversions from one file type
to another. Unfortunately many of these conversions produce audible artifacts.
Occasionally these artifacts are extraneous data once used in header information
and once converted are being played as audio. Given time and patience, these
types of artifacts can be edited out. Quite often, however, it is more realistic
to simply re-record the data on the other platform and save it in the required
format.
Some format conversions require down-sampling to distill the data in the
file. Unless proper noise shaping (anti-aliasing) and re-dithering are included
in the transfer process, the resulting sound file will evidence extreme
amounts of noise.
For the multimedia author, there are a number of useful format conversion
tools available. Some products for sound format conversion such as Waves
Waveconvert [16] and Turtle Beach Wave for Windows [17] are available commercially.
Others, like Sound Hack [18] and Cool Edit [19] are shareware and are available
for downloading via anonymous file transfer protocol (ftp). Brian's Sound
Tool [20] is available for downloading as freeware. As is true with so many
aspects of multimedia, it will pay to experiment with a variety of these
products to determine which is going to give you the best results.
Batch conversion is extremely handy for sound file format conversion --
especially when all of the sounds for a multimedia title need to be converted
from one platform format to another for cross platform compatibility. Some
programs can convert any number of file formats in a single process and
rename the files with proper extensions at the same time.
Some cross-platform compatibility can be obtained by simple renaming of
files. However, in addition to renaming the file, say from an AIFF to a
WAV file, it may also be necessary to eliminate a data fork. The process
of integrating separate channel information in graphics applications and
the process of interlacing separate data streams for audio and video in
Quicktime movie applications is termed "flattening" . The flattening
process can also be used to remove superfluous resource forks when conversions
are being made for cross-platform playability. When importing files into
the Macintosh and Power PC, a resource data fork may be added by utilizing
the ResEdit [21] software utility. ResEdit prompts the user for an acknowledged
data Type and data Creator for use with the data file. It helps to have
previously examined properly-formatted files in ResEdit to determine what
to place in the Type and Creator fields. If improper Types and Creators
are associated with data files, a system crash may result when the file
is accessed.
Software Incompatibility
For the author of a multimedia presentation, one of the more frustrating
problems to encounter is the proprietary use of sound files within different
authoring platforms. While one platform may make use of WAV files, another
may allow only importation of AIFF files. On the Macintosh platform some
authoring systems require the inclusion of sound files as resources (SND)
to the presentation file itself.
When designing a multimedia presentation, a primary consideration is to
determine at the outset the computing platform upon which the application
will be authored. The decision here can have manifold effects including
whether or not the application will play on other computing systems. Until
recently, most authoring systems were designed to work only on the computing
platform they were authored on. There has been a move, however, toward cross-platform
compatibility with "players" which will allow playback of the
multimedia presentation on other systems. Unfortunately, problems are still
encountered. Quite often artifacts may occur when playback is attempted
on another "non-native" platform. These artifacts may be observed
in the graphics, video and in the sound. In order to alleviate the problems
associated with format incompatibility, many projects are simply "re-authored"
rather than "ported" to be played on another platform. This re-authoring
generally requires the conversion of files used in the presentation to file
formats utilized by the other system. It is here that batch file conversion
software has made this process significantly less tedious.
Hardware Problems
Regardless of how much craft is expended in the sound design of a multimedia
project, it will eventually be subjected to playback conditions that are
far from optimal. The typical sound card installation in most PCs places
audio output in extreme hazard. The inside of a computer is a high frequency
RF factory with data busses, clocking circuits, etc. Many of these RF noises
can find their way into the audio output of the sound card -- further tarnishing
the sound even prior to amplification. Very few commercially-available sound
cards expend much effort in shielding the card from stray RF and ground
noises. The digital to analogue converters and operational amplifiers used
in the circuits of many sound cards are compromises at best. Tests have
indicated that the signal-to-noise ratio of a typical sound card may be
as low as 63 dB and that frequency response at the analogue outputs of some
sound cards may be as narrow as 300 to 15,000 Hz. [22], [23]. Sound card
electronics and signal path are not the only problem. The jacks for input
and output to the card are typically the 1/8th inch stereo "mini"
plug. Contact is never very good and intermittent signals may result. Only
a few sound card manufacturers produce a sound card that exhibits both full
bandwidth and dynamic range capabilities of the 16-bit, 44.1 kHz. PCM digital
audio signal at the analogue outputs.
Audio in Multimedia -- An Example
As most of these things begin, the initial impetus for this project began
as an idea. A rather good idea, I thought. After teaching concepts of audio
recording and production for many years, I thought that integrating a multimedia
approach could lend some interesting classroom support. As it was, I was
already drawing upon a number of resources in various media formats to help
teach concepts in audio production. In class situations I was using video
tapes, compact discs, cassette tapes, digital audio (DAT) tapes and, in
some cases, even vinyl. Additionally I was also using a number of graphics
-- some from various textbooks, many more which I had produced myself. Of
course there was a fair amount of text. In traditional classroom situations
this text was represented as spoken material, however, much of this was
drawn from lecture notes. My idea was that if everything could be combined
into a single platform of some type, I could be more effective in efforts
to teach the material. But, it was the essence of the material which represented
the problem -- the audio itself.
This situation was especially problematic when careful observation of timbre
was necessary on the part of the student. Examination of timbre is elemental
to the study of recording processes. Unfortunately, many of the currently
implemented procedures to incorporate audio in multimedia presentations
involve processes which rob the sound of distinguishing timbral characteristics.
If it was my intention to produce a multimedia project which would be capable
of demonstrating the quality of various microphones and various microphone
techniques, then I was certainly facing a challenge. The planning process
for this project would begin opposite where many other multimedia presentations
begin -- with the audio being considered first.
The goal of this particular project was to produce a multimedia title which
could serve as a training and educational tool based upon the sound quality
it could transmit. Therefore, it would be most important that whatever technique
was utilized to replay sound in the presentation that it be done with as
much faithfulness to the original recordings as possible.
This was to be a multimedia presentation which would allow students to listen
carefully to a number of different microphones and different microphone
techniques in order to evaluate their relative merit with regard to timbre.
It was also anticipated that others might use this presentation to determine
which techniques and microphones to utilize in upcoming recording sessions.
To further test the various sound qualities possible in a multimedia project,
I wanted to design the presentation so that each soundfile might be playable
in a variety of different soundfile formats and compression algorithms.
This part of the presentation would be done for my own edification. Final
versions of the project would contain only the soundfiles which I felt best
represented the timbral quality of the original and which satisfied some
minimum performance parameters.
Test Parameters
Recording Procedure
Since this would be a study of timbre realization, I determined that the
recording should be of a piano. For this recording a 5' studio grand was
utilized. The process involved a simultaneous recording of a number of microphones
placed in a variety of different microphone patterns. Keeping all limitations
of multimedia technology in mind, it was determined that the reference piano
recording should be short. But while short, should demonstrate as much of
the subtle sonority of the instrument. Therefore, I had the musician play
a strident chord, engage the sostenuto pedal and play a short descending
arpegiated scale to a sustained tonic chord. This exercise would allow for
good imaging in the stereo spectrum and would allow the listener to compare
the timbral differences of both microphone and placement.
For this demonstration, the outputs of 12 pairs of microphones were fed
to the microphone preamp inputs of a recording console for level balance
and then bussed directly to the analogue inputs of a 24-track digital tape
machine. Included among the stereo microphone configurations utilized were
coincident pair, near coincident, mid-side, and spaced pair. The pairs were
situated for close field, mid field and far field pick up in a 1600 square
foot recording studio. In order to produce two-channel matrixed stereo outputs,
only the mid-side combinations utilized console electronics other than the
microphone preamp. Once the selection was recorded, each stereo pair of
tracks was output via digital AES/EBU bus to a DAT machine.
Test File Formats
To produce audio segments for the multimedia presentation five computing
platforms were utilized. A Macintosh IIci equipped with Digidesign Sound
Designer and ProTools software was used to create edited Sound Designer
(SNDII) format files. This platform was also used to convert the SNDII format
files to AIFF files at various bit rates and sampling frequencies. A Silicon
Graphics Indigo 2 was used to create edited 16-bit AIFF files from which
other format files could be produced including MPEG Layer 1, AU, and Audio
Interchange File-Compressed AIFC format files. A Gateway PC was utilized
to create edited WAV format files and to produce conversions to ADPCM WAV
files. A Macintosh Power PC equipped with Macromedia Sound Edit 16 was used
to create Macintosh Audio Compression/Expansion (MACE) files and QuickTime/IMA
ADPCM audio files. Another Power PC equipped with a gigabyte hard disc and
a CD-ROM recorder was used to create mixed-mode CDs. All audio was transferred
digitally -- either via AES/EBU or SCSI.
This procedure may serve to indicate yet another problem when attempting
to produce audio for multimedia -- the need to have many computing resources
at hand. This particular project could have been accomplished with much
less equipment, but not with the same efficacy and speed.
All of these files were edited to length and saved or converted to the following
formats:
Format
Chan-nels
Sample Frequency
Bit Depth
Length
Resulting File Size
(bytes)
* Sample frequency and bit depth prior to compression
In order to create reduced bit files, a number of conversion steps were
required. Some of the application programs employed would perform format
conversion automatically, but whether performed automatically or manually,
the best results occurred when the following steps were followed:
( Import the full bandwidth file via AES/EBU or SCSI.
( Normalize (not compress) the file so that amplitudes are at highest bit
order.
( Filter (low pass anti-alias) the file just below Nyquist point for destination
sample rate (i.e. files to be downsampled to 22.05 kHz were filtered with
the low pass set at 10 kHz.).
( Convert sampling frequency (i.e. 44.1 kHz to 22.05 kHz).
-- And/Or --
( Re-dither (i.e. 16-bit to 8-bit).
Listening Test
Following format conversion each file was listened to in all of the final
sound formats created. Although reduction of data through compression or
bit/sample rate reduction would have been fine for many applications, none
of the compressed file formats compared favorably enough with the original
full bandwidth 16-bit samples to be considered for a critical listening
evaluation. The truncated bit files were eliminated for use in the project
due to noise problems associated with the dynamic nature of the original
sound file. Reduced sample rate files were eliminated from consideration
due to the loss of timbral characteristic resulting from low pass filtering.
The MPEG 1 layer 2 lossy-compressed files compared very favorably to the
original files, but not on all platforms. Since the performance of the same
file varied from platform to platform, the decoding algorithm (shareware)
and the software compatibility with the hardware employed was probably at
fault. Additionally, on all platforms with the exception of the Silicon
Graphics, the MPEG file launched decoding software. In one case, the file
decoding process took 20 minutes. The IMA ADPCM Quicktime and ADPCM WAV
compressed files proved to be quite good for casual listening over multimedia-style
loudspeakers. However, both formats evidenced problems with dynamic response
characteristics and exhibited noise at lower amplitude levels. These artifacts
were especially noticeable on a good quality monitoring system and with
headphones.
The No-Compromise Approach
In the end, it was decided to author the project as a mixed-mode CD. Since
the project would be authored to CD-ROM regardless, and since the disc would
be capable of storing upward of 650 Mbytes, all of the data for this project
would fit easily on a single disc. Therefore, the key would be to transfer
the original sound files in 16-bit 44.1 kHz format to the CD-ROM in such
a way that little time would be lost each time a file was accessed. To do
this, all of the files which are accessed regularly including data files
would be written to the disc first so that these files would be on the innermost
tracks of the disc. This way, the laser can read them much faster and a
much shorter wait occurs. All of the audio files followed as separate files
in Red Book audio format. It was important to realize, in the authoring
stage, that little else should be happening while audio was playing off
the drive. Since the project would play entirely from the CD-ROM, users
with little RAM (less than 8 MB) would experience interruptions as the disc
began to stream audio off the CD. In constructing the CD with Red Book Audio
tracks separated following the data, the only wait experienced would be
the normal delay as the laser moved from track to track on the disc. This
solution also proved to solve some of the problems associated with cross-platform
compatibility. In the CD recording process, all of the file names followed
the ISO 9660 convention (eight characters with an extension) so that the
disc could be read on either Macintosh or PC platform.
Conclusion
The use of multimedia is rapidly expanding and the tools with which it is
constructed are continuing to improve. As is so true with the integration
of new technologies, those who are early adopters often run into frustrating
circumstances. However, it is the resolution of these frustrations which
advances the art of the technology. The use of sound in multimedia will
no doubt improve. As discussed, many new and innovative uses of technology
are being applied to digital audio recording and its potential for use in
multimedia.
References:
[1] More Than Music, Rock 'n' ROM / The Track 1 Problem, Don Menn, Multimedia
World, August, 1995, pp. 65, 67
[2] Mastering CD-ROM Technology, John Wiley & Sons, Inc., Larry Boden,
1995, pg. 65
[3] The Allen Sides Microphone Cabinet, Cardinal Business Media, Inc., 1995.,
Music and Entertainment Group, 6400 Hollis Street,
Suite 12, Emeryville, CA 94608
[4] Control, Light Rail Communications, Inc., 625 Second Street, #410, San
Francisco, CA 94017
[5] Multimedia PC Level 1 Specification, Multimedia PC Marketing Council,
1730 M Street NW, Suite 707, Washington D.C. 20036
[6] Multimedia PC Level 2 Specification, Multimedia PC Marketing Council,
1730 M Street NW, Suite 707, Washington D.C. 20036
[7] An Overview of Audio Technology for the Multimedia Personal Computer,
Jim Heckroth, presented at the 97th Convention of the Audio
Engineering Society, 1994, pre-print 3875
[8 ] Signal Compression Based on Models of Human Perception, Nikil Jayant,
James Johnston, Robert Safranek, Proceedings of the
IEEE October, 1993, Volume 81, Number 10, pp. 1385 - 1421
[9] AC-3 Operation, Bitstream Syntax, and Features, Mark F. Davis and Craig
C. Todd, presented at the 97th Convention of the Audio
Engineering Society, 1994, pre-print 3910
[10] Overview of MPEG Audio: Current and Future Standards for Low Bit-Rate
Audio Coding, Karlheinz Brandenburg and Marina Bosi,
presented at the 99th Convention of the Audio Engineering Society, 1995,
pre-print 4130
[11] The ISO/MPEG Audio Coding Standard, Leon M. Van de Kerkhof, Aldo G.
Cugnini, Widescreen Review, June/July, 1994, pp. 58 -
61
[12] IMA ADPCM Recommended Minimum Capabilities of a Compliant Platform,
1995, Interactive Multimedia Association,
http:// www.ima.org
[13] IMA Recommended Practices for Enhancing Digital Audio Portability (Revision
3.0 - 10/21/92), 1995, Interactive Multimedia
Association, http:// www.ima.org/
[14] How MMX Technology Works, John Clyman and Nick Stam, PC Magazine, February
4, 1997, p 104
[15] CD-ROMs Should Sound as Good as They Look, Bob Currier, Computer Video,
September/October, 1995, pg. 22
[16] Waves, 4302 Papermill Road, Knoxville, TN 37909 USA http://www.usit.net/waves
[17] Turtle Beach Systems, http://www.tbeach.com/products/wave.htm
[18] Sound Hack, ftp://music.calarts.edu/pub/SoundHack/SH0868.hqx
[19] Cool Edit, Syntrillium Software Corporation, http://www.netzone.com/syntrillium
[20] Brian's Sound Tools, ftp://src.doc.ic.ac.uk/packages/info-mac/gst/snd/brians-sound-tool-13.hqx
[21] ResEdit ver. 2.1.3, Sumit Bando and Samiran Basak, 1984 - 1994, Apple
Computers, Inc.
[22] An Earful of Sound Boards, John R. Quain, PC Magazine, March 28, 1995,
pp. 167 - 190
[23] Sound Cards -- Audio Performance Spec Tests, Michael Marans, Keyboard
Magazine, October, 1994, pp. 49 - 53
[TOP]