Utilization of High Definition Audio to Create CD-ROM-Based Multimedia for Ear Training and Microphone Technique

Doug Mitchell





Introduction
This paper serves as an introduction to the use of audio and various audio file formats for use in integrated multimedia productions. It will also detail the process of putting together a multimedia CD-ROM designed to examine microphone techniques. The goal of the multimedia audio producer is to utilize a variety of techniques for recording and formatting audio while preserving the highest possible quality for playback in the various multimedia formats. In many cases, this goal is compromised by the conflicting demands of data preservation, available data real estate and conventional data throughput on eventual end-user platforms

The field of multimedia production and deployment on CD-ROM continues to change rapidly with advances in technology - both for authoring as well as deployment. The techniques, formats and considerations illustrated in this paper, then, should be considered as a static mile marker on the winding highway of technological progress in electronic multimedia.

Specialized CD-ROM and the Music and Professional Audio Target Market
Many CD-ROM titles available today target interests in music. Microsoft has produced a popular title on musical instruments. Todd Rundgren, David Bowie, The Rolling Stones, The Cranberries, Sarah McGlaughlin, Peter Gabriel, The Beastie Boys, and the Residents have all produced popular music CD-ROM titles or Enhanced CD+ titles. CD+ is a hybrid, mixed-mode CD with the first track representing data (sometimes indicated as track one and certainly to be avoided for playback over loudspeakers), and the balance of tracks (2 through 99) represented as Red Book digital audio. Other approaches to the use of Enhanced CDs include the use of a "zero" track integrated with index and pre-gap area where data can be stored (this type of disc has been referred to a CD-ROM-ready and is proposed for inclusion in the Blue Book standard) and a "Multisession" approach where Red Book audio is stored in session one on the disc and data is stored as session two. Specialized software drivers are required by many CD-ROM players to recognize the second session on the disc. [1], [2] The Todd Rundgren, Peter Gabriel, and David Bowie CD titles feature various levels of interaction with the user. Classical music is well represented with CD-ROM titles available on Beethoven, Brahams, Stravinsky, Mozart and a host of others which feature not only the music, but information on the scores, reviews by musicologists, period art, and so forth. The Voyager company has released a CD-ROM version of "A Hard Day's Night" with the complete film displayed in QuickTime movie format with the script, photos, and a running dialogue with the director, Richard Lester. The list of available titles in this genre grows daily.

More recently, some multimedia producers have begun to tackle various aspects of professional audio. A good example of this type of target market approach is the "Allen Sides Microphone Cabinet" produced by Light Rail Communications for Cardinal Business Media, publishers of Mix Magazine. [3] This title utilizes a mixed-mode method of CD-ROM authoring which allows the user to access the audio for various microphone examples displayed via Red Book audio tracks. This title has proven to be an excellent supplemental tool for teaching classes in recording technology.


Light Rail Communications also produces a quarterly CD-ROM professional audio magazine called "Control". [4] This CD-ROM, like the Allen Sides' title, is also produced as a mixed-mode title. It contains both data and Red Book audio in its articles on various products and people in the recording industry. This has also proved to be a wonderful teaching aid.

Most recording studios utilize at least one computing platform. CD-ROM and other multimedia titles are also proving to be good for use as professional audio equipment sales tools and for use as on-line instruction manuals, calibration tools, reference recordings and so forth.

Attempts to Utilize Audio in Multimedia

Since the term multimedia has been utilized in the computer world it has always included sound as one of the principle media involved. The Apple Macintosh computer unveiled in January, 1984 included sound capabilities. However, until the AV Macintosh line was introduced in 1993, the Macintosh sound recording and playback capability was limited to 8-bit sampling. The Multimedia Personal Computer standard (MPC) implemented in 1991 also included sound. [5] The current MPC level 2 which included a 16-bit audio standard was formalized in 1993. [6], [7] Most sound recording and playback hardware available today for either the Macintosh or the PC platform now support compact disc-quality audio capabilities with 44.1 kHz sampling and 16-bit sample depth. However, processing full bandwidth audio in stereo requires a system throughput of 176.4 kilobits per second. This data rate is feasible, but can cause conflicts with other data attempting to stream off a hard drive or a CD-ROM simultaneously.

Most of us are familiar with the early uses of audio in computer systems. On PCs, the internal speaker can be caused to make sounds when programming a single bit controlling the rate of a dc voltage. This produces the familiar system beeps. Some developers began to utilize this programmability to produce simple sounds for games or other applications. However, once the sound card was available for the PC, numerous other applications for sound on the computer began to appear.


Low Bit Rate Audio
Perhaps the most severe compromise in working with multimedia audio is the need to utilize low definition audio characterized by low sample rate and truncated bit depth. Although it is possible to produce audio files which exhibit some degree of fidelity at bit depths of 8 bit and even less, the procedure to do so will generally require that the dynamic range of the audio be severely limited. The noise resulting from lower bit sampling is masked as long as the sampled sound amplitude remains high.

Reducing sampling rate also reduces data. So does conversion of stereo to mono. This may appear to be obvious, but it can be disheartening for the audio engineer to listen to the sound once it has been down-sampled, bit-reduced, and converted to mono.

Fortunately there are some alternatives which are beginning to make an impact on data-reduced audio in multimedia. Most of these are lossy compression algorithms based upon perceptual coding. [8] Although much research is being directed toward reduced bit rate perceptual coding schemes, there are several encoder/decoders (codecs) which are beginning to make their way into use for multimedia audio applications. Included among these systems are Dolby Laboratories' AC3 [9], Motion Picture Experts Group (MPEG) Audio Layer 1 and Layer 2 [10], [11], and The International Multimedia Association (IMA) Adaptive Delta Pulse Code Modulation (ADPCM) format for cross-platform compatibility. [12], [13]

Although each of these codecs can be compiled via software, the processing of the sound file is CPU intensive. Depending upon which algorithm is utilized, the heaviest processing may take place at the encode stage allowing decoded playback to occur fairly rapidly. Each of these codecs can achieve data reduction rates of anywhere from 4:1 to 22:1 depending upon the dynamic nature of the file to be coded and the algorithm employed. All of the codecs mentioned above are also being implemented in computing hardware to expedite the encode/decode process. As of this writing, however, hardware solutions to the use of codecs for multimedia audio are not widely implemented. The Intel Corporation has recently introduced a new pentium-based processor chip called MMX which allows the acceleration of codec operations. [14] In order for any hardware solution to have an effect on the use of perceptually-coded files for multimedia will require users to add to, or replace, existing hardware on their computing systems. A good indication of the potential for long term success of these file formats is that the manufacturers involved with each of the systems are continuing to do research on perceptual coding and are maintaining an open format for forward and backward compatibility with new development.

Problems in Aural Multimedia Utilization
Unfortunately for those who regard audio quality with high esteem, audio is generally the least significant element in many multimedia productions. Video, text, and graphic content usually take predominance when construction of multimedia is underway. All too often, the producer of audio elements for multimedia productions is left to work within severe constraints regarding disk space and memory utilization. Therefore, the audio on many finished multimedia projects becomes a caricature. Once imagined majestic themes are ported to General MIDI soundcards with their oft distinguishing trait of tinny FM synthesis. Dynamic, full bandwidth audio samples are compressed, re-sampled, and truncated to fit into ever smaller packages. In the end, there is sound. But the price paid besmears the quality overall. This situation has improved recently with advancements in lossy compression technology and parallel advancements in application of sampling technology to MIDI synthesis in the multimedia domain. Additionally, there have been extreme and rapid advancements in the hardware used to play multimedia applications.

File Formats
Perhaps one of the most daunting problems facing the author of multimedia presentations is the number of different audio file formats which exist. Needless to say, not all of these file types are compatible. As with text and graphic documents, some sound formats are proprietary to the platform on which they were developed. For the multimedia author, this presents a variety of problems including cross-platform viability.

Additionally, the sample rate of some formats are incompatible with those in other formats. For example, prior to Macintosh system 7, system sounds were encoded at 22.254 and 11.127 kHz rather than at the evenly divisible CD audio compact disc rates of 22.05 and 11.025 kHz. [15]

Regardless of format, however, most sound files are stored as raw pulse code modulation (PCM) data. What differs in many of these formats is file header information which tells the computing platform what type of data to expect. Additionally, files are stored differently on various platforms. For example, Macintosh platforms require a resource fork in addition to the data. Windows and UNIX systems, on the other hand, do not require this fork, but do require an extension in the file name.

Format Conversion
It is possible to perform sound file format conversions from one file type to another. Unfortunately many of these conversions produce audible artifacts. Occasionally these artifacts are extraneous data once used in header information and once converted are being played as audio. Given time and patience, these types of artifacts can be edited out. Quite often, however, it is more realistic to simply re-record the data on the other platform and save it in the required format.

Some format conversions require down-sampling to distill the data in the file. Unless proper noise shaping (anti-aliasing) and re-dithering are included in the transfer process, the resulting sound file will evidence extreme amounts of noise.

For the multimedia author, there are a number of useful format conversion tools available. Some products for sound format conversion such as Waves Waveconvert [16] and Turtle Beach Wave for Windows [17] are available commercially. Others, like Sound Hack [18] and Cool Edit [19] are shareware and are available for downloading via anonymous file transfer protocol (ftp). Brian's Sound Tool [20] is available for downloading as freeware. As is true with so many aspects of multimedia, it will pay to experiment with a variety of these products to determine which is going to give you the best results.

Batch conversion is extremely handy for sound file format conversion -- especially when all of the sounds for a multimedia title need to be converted from one platform format to another for cross platform compatibility. Some programs can convert any number of file formats in a single process and rename the files with proper extensions at the same time.

Some cross-platform compatibility can be obtained by simple renaming of files. However, in addition to renaming the file, say from an AIFF to a WAV file, it may also be necessary to eliminate a data fork. The process of integrating separate channel information in graphics applications and the process of interlacing separate data streams for audio and video in Quicktime movie applications is termed "flattening" . The flattening process can also be used to remove superfluous resource forks when conversions are being made for cross-platform playability. When importing files into the Macintosh and Power PC, a resource data fork may be added by utilizing the ResEdit [21] software utility. ResEdit prompts the user for an acknowledged data Type and data Creator for use with the data file. It helps to have previously examined properly-formatted files in ResEdit to determine what to place in the Type and Creator fields. If improper Types and Creators are associated with data files, a system crash may result when the file is accessed.

Software Incompatibility
For the author of a multimedia presentation, one of the more frustrating problems to encounter is the proprietary use of sound files within different authoring platforms. While one platform may make use of WAV files, another may allow only importation of AIFF files. On the Macintosh platform some authoring systems require the inclusion of sound files as resources (SND) to the presentation file itself.

When designing a multimedia presentation, a primary consideration is to determine at the outset the computing platform upon which the application will be authored. The decision here can have manifold effects including whether or not the application will play on other computing systems. Until recently, most authoring systems were designed to work only on the computing platform they were authored on. There has been a move, however, toward cross-platform compatibility with "players" which will allow playback of the multimedia presentation on other systems. Unfortunately, problems are still encountered. Quite often artifacts may occur when playback is attempted on another "non-native" platform. These artifacts may be observed in the graphics, video and in the sound. In order to alleviate the problems associated with format incompatibility, many projects are simply "re-authored" rather than "ported" to be played on another platform. This re-authoring generally requires the conversion of files used in the presentation to file formats utilized by the other system. It is here that batch file conversion software has made this process significantly less tedious.


Hardware Problems
Regardless of how much craft is expended in the sound design of a multimedia project, it will eventually be subjected to playback conditions that are far from optimal. The typical sound card installation in most PCs places audio output in extreme hazard. The inside of a computer is a high frequency RF factory with data busses, clocking circuits, etc. Many of these RF noises can find their way into the audio output of the sound card -- further tarnishing the sound even prior to amplification. Very few commercially-available sound cards expend much effort in shielding the card from stray RF and ground noises. The digital to analogue converters and operational amplifiers used in the circuits of many sound cards are compromises at best. Tests have indicated that the signal-to-noise ratio of a typical sound card may be as low as 63 dB and that frequency response at the analogue outputs of some sound cards may be as narrow as 300 to 15,000 Hz. [22], [23]. Sound card electronics and signal path are not the only problem. The jacks for input and output to the card are typically the 1/8th inch stereo "mini" plug. Contact is never very good and intermittent signals may result. Only a few sound card manufacturers produce a sound card that exhibits both full bandwidth and dynamic range capabilities of the 16-bit, 44.1 kHz. PCM digital audio signal at the analogue outputs.

Audio in Multimedia -- An Example
As most of these things begin, the initial impetus for this project began as an idea. A rather good idea, I thought. After teaching concepts of audio recording and production for many years, I thought that integrating a multimedia approach could lend some interesting classroom support. As it was, I was already drawing upon a number of resources in various media formats to help teach concepts in audio production. In class situations I was using video tapes, compact discs, cassette tapes, digital audio (DAT) tapes and, in some cases, even vinyl. Additionally I was also using a number of graphics -- some from various textbooks, many more which I had produced myself. Of course there was a fair amount of text. In traditional classroom situations this text was represented as spoken material, however, much of this was drawn from lecture notes. My idea was that if everything could be combined into a single platform of some type, I could be more effective in efforts to teach the material. But, it was the essence of the material which represented the problem -- the audio itself.

This situation was especially problematic when careful observation of timbre was necessary on the part of the student. Examination of timbre is elemental to the study of recording processes. Unfortunately, many of the currently implemented procedures to incorporate audio in multimedia presentations involve processes which rob the sound of distinguishing timbral characteristics. If it was my intention to produce a multimedia project which would be capable of demonstrating the quality of various microphones and various microphone techniques, then I was certainly facing a challenge. The planning process for this project would begin opposite where many other multimedia presentations begin -- with the audio being considered first.

The goal of this particular project was to produce a multimedia title which could serve as a training and educational tool based upon the sound quality it could transmit. Therefore, it would be most important that whatever technique was utilized to replay sound in the presentation that it be done with as much faithfulness to the original recordings as possible.

This was to be a multimedia presentation which would allow students to listen carefully to a number of different microphones and different microphone techniques in order to evaluate their relative merit with regard to timbre. It was also anticipated that others might use this presentation to determine which techniques and microphones to utilize in upcoming recording sessions. To further test the various sound qualities possible in a multimedia project, I wanted to design the presentation so that each soundfile might be playable in a variety of different soundfile formats and compression algorithms. This part of the presentation would be done for my own edification. Final versions of the project would contain only the soundfiles which I felt best represented the timbral quality of the original and which satisfied some minimum performance parameters.

Test Parameters

Recording Procedure
Since this would be a study of timbre realization, I determined that the recording should be of a piano. For this recording a 5' studio grand was utilized. The process involved a simultaneous recording of a number of microphones placed in a variety of different microphone patterns. Keeping all limitations of multimedia technology in mind, it was determined that the reference piano recording should be short. But while short, should demonstrate as much of the subtle sonority of the instrument. Therefore, I had the musician play a strident chord, engage the sostenuto pedal and play a short descending arpegiated scale to a sustained tonic chord. This exercise would allow for good imaging in the stereo spectrum and would allow the listener to compare the timbral differences of both microphone and placement.

For this demonstration, the outputs of 12 pairs of microphones were fed to the microphone preamp inputs of a recording console for level balance and then bussed directly to the analogue inputs of a 24-track digital tape machine. Included among the stereo microphone configurations utilized were coincident pair, near coincident, mid-side, and spaced pair. The pairs were situated for close field, mid field and far field pick up in a 1600 square foot recording studio. In order to produce two-channel matrixed stereo outputs, only the mid-side combinations utilized console electronics other than the microphone preamp. Once the selection was recorded, each stereo pair of tracks was output via digital AES/EBU bus to a DAT machine.

Test File Formats
To produce audio segments for the multimedia presentation five computing platforms were utilized. A Macintosh IIci equipped with Digidesign Sound Designer and ProTools software was used to create edited Sound Designer (SNDII) format files. This platform was also used to convert the SNDII format files to AIFF files at various bit rates and sampling frequencies. A Silicon Graphics Indigo 2 was used to create edited 16-bit AIFF files from which other format files could be produced including MPEG Layer 1, AU, and Audio Interchange File-Compressed AIFC format files. A Gateway PC was utilized to create edited WAV format files and to produce conversions to ADPCM WAV files. A Macintosh Power PC equipped with Macromedia Sound Edit 16 was used to create Macintosh Audio Compression/Expansion (MACE) files and QuickTime/IMA ADPCM audio files. Another Power PC equipped with a gigabyte hard disc and a CD-ROM recorder was used to create mixed-mode CDs. All audio was transferred digitally -- either via AES/EBU or SCSI.

This procedure may serve to indicate yet another problem when attempting to produce audio for multimedia -- the need to have many computing resources at hand. This particular project could have been accomplished with much less equipment, but not with the same efficacy and speed.

All of these files were edited to length and saved or converted to the following formats:
Format
Chan-nels
Sample Frequency
Bit Depth
Length
Resulting File Size
(bytes)

* Sample frequency and bit depth prior to compression
In order to create reduced bit files, a number of conversion steps were required. Some of the application programs employed would perform format conversion automatically, but whether performed automatically or manually, the best results occurred when the following steps were followed:

( Import the full bandwidth file via AES/EBU or SCSI.

( Normalize (not compress) the file so that amplitudes are at highest bit order.

( Filter (low pass anti-alias) the file just below Nyquist point for destination sample rate (i.e. files to be downsampled to 22.05 kHz were filtered with the low pass set at 10 kHz.).

( Convert sampling frequency (i.e. 44.1 kHz to 22.05 kHz).

-- And/Or --

( Re-dither (i.e. 16-bit to 8-bit).

Listening Test

Following format conversion each file was listened to in all of the final sound formats created. Although reduction of data through compression or bit/sample rate reduction would have been fine for many applications, none of the compressed file formats compared favorably enough with the original full bandwidth 16-bit samples to be considered for a critical listening evaluation. The truncated bit files were eliminated for use in the project due to noise problems associated with the dynamic nature of the original sound file. Reduced sample rate files were eliminated from consideration due to the loss of timbral characteristic resulting from low pass filtering. The MPEG 1 layer 2 lossy-compressed files compared very favorably to the original files, but not on all platforms. Since the performance of the same file varied from platform to platform, the decoding algorithm (shareware) and the software compatibility with the hardware employed was probably at fault. Additionally, on all platforms with the exception of the Silicon Graphics, the MPEG file launched decoding software. In one case, the file decoding process took 20 minutes. The IMA ADPCM Quicktime and ADPCM WAV compressed files proved to be quite good for casual listening over multimedia-style loudspeakers. However, both formats evidenced problems with dynamic response characteristics and exhibited noise at lower amplitude levels. These artifacts were especially noticeable on a good quality monitoring system and with headphones.

The No-Compromise Approach

In the end, it was decided to author the project as a mixed-mode CD. Since the project would be authored to CD-ROM regardless, and since the disc would be capable of storing upward of 650 Mbytes, all of the data for this project would fit easily on a single disc. Therefore, the key would be to transfer the original sound files in 16-bit 44.1 kHz format to the CD-ROM in such a way that little time would be lost each time a file was accessed. To do this, all of the files which are accessed regularly including data files would be written to the disc first so that these files would be on the innermost tracks of the disc. This way, the laser can read them much faster and a much shorter wait occurs. All of the audio files followed as separate files in Red Book audio format. It was important to realize, in the authoring stage, that little else should be happening while audio was playing off the drive. Since the project would play entirely from the CD-ROM, users with little RAM (less than 8 MB) would experience interruptions as the disc began to stream audio off the CD. In constructing the CD with Red Book Audio tracks separated following the data, the only wait experienced would be the normal delay as the laser moved from track to track on the disc. This solution also proved to solve some of the problems associated with cross-platform compatibility. In the CD recording process, all of the file names followed the ISO 9660 convention (eight characters with an extension) so that the disc could be read on either Macintosh or PC platform.

Conclusion
The use of multimedia is rapidly expanding and the tools with which it is constructed are continuing to improve. As is so true with the integration of new technologies, those who are early adopters often run into frustrating circumstances. However, it is the resolution of these frustrations which advances the art of the technology. The use of sound in multimedia will no doubt improve. As discussed, many new and innovative uses of technology are being applied to digital audio recording and its potential for use in multimedia.

References:
[1] More Than Music, Rock 'n' ROM / The Track 1 Problem, Don Menn, Multimedia World, August, 1995, pp. 65, 67

[2] Mastering CD-ROM Technology, John Wiley & Sons, Inc., Larry Boden, 1995, pg. 65

[3] The Allen Sides Microphone Cabinet, Cardinal Business Media, Inc., 1995., Music and Entertainment Group, 6400 Hollis Street,
Suite 12, Emeryville, CA 94608

[4] Control, Light Rail Communications, Inc., 625 Second Street, #410, San Francisco, CA 94017

[5] Multimedia PC Level 1 Specification, Multimedia PC Marketing Council, 1730 M Street NW, Suite 707, Washington D.C. 20036

[6] Multimedia PC Level 2 Specification, Multimedia PC Marketing Council, 1730 M Street NW, Suite 707, Washington D.C. 20036

[7] An Overview of Audio Technology for the Multimedia Personal Computer, Jim Heckroth, presented at the 97th Convention of the Audio
Engineering Society, 1994, pre-print 3875

[8 ] Signal Compression Based on Models of Human Perception, Nikil Jayant, James Johnston, Robert Safranek, Proceedings of the
IEEE October, 1993, Volume 81, Number 10, pp. 1385 - 1421

[9] AC-3 Operation, Bitstream Syntax, and Features, Mark F. Davis and Craig C. Todd, presented at the 97th Convention of the Audio
Engineering Society, 1994, pre-print 3910

[10] Overview of MPEG Audio: Current and Future Standards for Low Bit-Rate Audio Coding, Karlheinz Brandenburg and Marina Bosi,
presented at the 99th Convention of the Audio Engineering Society, 1995, pre-print 4130

[11] The ISO/MPEG Audio Coding Standard, Leon M. Van de Kerkhof, Aldo G. Cugnini, Widescreen Review, June/July, 1994, pp. 58 -
61

[12] IMA ADPCM Recommended Minimum Capabilities of a Compliant Platform, 1995, Interactive Multimedia Association,
http:// www.ima.org

[13] IMA Recommended Practices for Enhancing Digital Audio Portability (Revision 3.0 - 10/21/92), 1995, Interactive Multimedia
Association, http:// www.ima.org/

[14] How MMX Technology Works, John Clyman and Nick Stam, PC Magazine, February 4, 1997, p 104

[15] CD-ROMs Should Sound as Good as They Look, Bob Currier, Computer Video, September/October, 1995, pg. 22

[16] Waves, 4302 Papermill Road, Knoxville, TN 37909 USA http://www.usit.net/waves

[17] Turtle Beach Systems, http://www.tbeach.com/products/wave.htm

[18] Sound Hack, ftp://music.calarts.edu/pub/SoundHack/SH0868.hqx

[19] Cool Edit, Syntrillium Software Corporation, http://www.netzone.com/syntrillium

[20] Brian's Sound Tools, ftp://src.doc.ic.ac.uk/packages/info-mac/gst/snd/brians-sound-tool-13.hqx

[21] ResEdit ver. 2.1.3, Sumit Bando and Samiran Basak, 1984 - 1994, Apple Computers, Inc.

[22] An Earful of Sound Boards, John R. Quain, PC Magazine, March 28, 1995, pp. 167 - 190

[23] Sound Cards -- Audio Performance Spec Tests, Michael Marans, Keyboard Magazine, October, 1994, pp. 49 - 53

[TOP]