What technical specifications should be set out for music?

Question

I'm writing the music for a video game. At some point, I'm going to need to work with the programmers to determine the technical specifications for the audio files I'm going to hand over to them. What kinds of details are we going to need to talk about?

For example, a few questions I think we might need to work out are:

What format should the audio files be in?
- Uncompressed? Lossless? Compressed?
- What's the balance between high-quality audio and small files sizes?
How loud should the music be?
- What should be the average volume level of each track?
- What should be the loudest volume level?
What should be the frequency distribution?
- For example, should I make a point of writing music that doesn't have much bass so that low-pitched foley can be easily heard?

Are the questions I thought of important ones to ask? Are there any other questions we should discuss?

I think it's a great question, as it would form a useful resource for others. — Engineer, Apr 19 '14 at 17:50
one point that comes to mind is priority of sounds. This is pretty apparent in a game like Diablo where you have lots of effects spawning all bombarding your speakers at the same time. In order to avoid it becoming noise when lots of action is going on you need to tune down some of the less important sounds like footsteps and ambient in order to make important ones like key-spells the player has to react to noticeable, and up the volume of them once a fight has quite down. It also depends on the hardware capabilities how many channels you can use. — ScrambledRK, Apr 20 '14 at 09:56

score 6 · Accepted Answer · answered Apr 22 '14 at 03:18

If I understand correctly, you have two different questions in here:

What kind of technical (as in programming) considerations should be set out?
What kind of technical (as in audio engineering) considerations should be set out?

For both questions, your best bet is to ask the person in charge. For the first one, it could be the lead audio programmer, while for the second one it could be the audio director.

Now, I suppose that since you're asking this in here, you are on a small team with no such roles, and your programmer probably doesn't know what you're talking about, so I'm going to answer with some general guidelines that could work for most projects. But in any case, I wouldn't recommend making decisions on your own. Discuss with your other team members to reach a conclusion.

That said, let's first talk about some technical (programming) considerations.

Usually, the format in which your audio files will be distributed is already determined by the platform or game engine you're using. If you're going to be doing multi-platform, it is likely that each platform will have completely independent resource directories, and the characteristics of audio files may be different for each platform.

For example, you may be required to turn in Ogg Vorbis files for Android, MP3 or AAC for iOS and WMA for Windows.

Sometimes, even the format is dependent on the purpose of each file. While iOS plays nice with MP3 and AAC files for any purpose, PlayStation Mobile requires MP3 for BGM, but only supports uncompressed PCM or Microsoft ADPCM.

Do ask your programmer what format must the files be in.
Most of the times, the basic specifications for the files will already be defined, so make sure you ask your programmer about this. For example, BGM files are usually distributed in stereo, while SFX and voice files are usually distributed in mono (so they can be easily positioned inside the game)
For bitrates and other parameters, once again, they are usually dependent on the engine or platform. If you have some freedom in here, consider that sound data is usually the biggest component of a game, and if the game is to be distributed via digital downloads, your programmer (and your customers) will thank you if you make the files as small as possible.

If you will be distributing on physical media, then the maximum distribution size is predetermined, and you will have to do some serious negotiations with the rest of the content creation team.
Some games require your sounds (mostly BGM) to be loopable; other games require your sounds to have strict timing constraints (like rhythm games). Most lossy compression schemes will add padding silence at the beginning and end of the track which may break loops and strict timing. Discuss these issues with your programmer.

There are too many ways to compress audio, so what I recommend you is to output a master version of each sound file in uncompressed PCM at 44.1KHz 16 bit Stereo, and then use that file to create the actual files for each platform. (Optimally, you could create a transform script which can be applied to each file so you don't have to reconvert each file each time you make a small change in the master file).

Now, in terms of technical (as in audio engineering) considerations, your programmer will most likely don't care, so you will have some more freedom in this area. There are some things you can do though to make your programmer's life easier:

All the sounds in the game should have normalized loudness. At the very least, all BGM files should have the same loudness between themselves, and all SFX files should have the same loudness between themselves. Don't make your programmer have to tweak the volumes inside the code.
Playback code is very limited, and can usually only do linear amplification. Because of this, between high and low loudness, high might be preferred, as reducing volume during playback usually leads to better results than amplifying it. It's difficult to say what should be the exact loudness, but as a rule of thumb, the greater the overall peak-to-peak amplitude is, the better you're using the sampling range.

You might want to load other games and play their sounds at their maximum volume settings and compare your sounds with theirs.
In terms of equalization, I would have to say this really depends on what other sounds your game will have. Apart from discussing this with the sound director, you might want to also take into consideration the playback hardware you will be using: while desktop computers may have a huge variety of playback hardware, handheld devices usually have very small speakers, and you may want to compensate for that. Arcade machines usually have much different requirements as well.

One thing to point out, for SFX I hate stereo files. I need to combine them to mono in the engine so I can put them into the 3D sound engine properly. As a result all may original SFX, for example foley sounds are mono. — rioki, Apr 24 '14 at 07:57

rioki · Answer 2 · 2014-04-24T07:52:42.210

2

I differentiate between two situations, what I keep around during development and later (storage) and what I put in the final delivery.

For storage I use the highest quality possible, that is for audio some format that is lossless. As long as the compression is lossless it does not really matter; WAV or FLAC will do. I basically am OK with a 44.1 kHz / 16 bit recording and the volume should be "natural" as possible, so basically original recording with minimal gain if needed. (If you really want to you can go with 96 kHz or 192 kHz recordings.) Size does not really matter, hard disks are not that expensive and you can always throw more at that problem. The main rationale is that you can always compress into a lossy format from lossless one, but once you only have lossy files, you can never move to higher quality.

For delivery that is totally a different story. Here you need to make some hard choices about quality vs. space vs. speed. For small SFX I use WAV files because they load the fastest. Music on the other hand size does matter, so I normally use MP3 with variable bitrate. You can get away with 64 kbps files, but that is a matter of taste. The sounds should basically be at their natural volume, though balanced against each other. The game engine will then do any adjustment as needed, especially if you are doing something like 3D sound.

edited Apr 24 '14 at 07:52

answered Apr 21 '14 at 19:08

rioki

2,866
1
21
28

You do know that uncompressed PCM at 44.1KHz 16 bit stereo goes at roughly 1300kbps, right? 320kbps of uncompressed audio will sound hideous. – Panda Pajama Apr 22 '14 at 02:17
True that, I was mostly referring to compressed audio. 128kbps FLAC sound quite fine. – rioki Apr 22 '14 at 11:56
"kbps" as an encoder setting makes no sense for FLAC or other lossless encoders. Size vs (sound) quality only makes sense on lossy compression. Are you sure that's FLAC what you're talking about? – Panda Pajama Apr 23 '14 at 13:22
You are correct, that is the programmer talking. The "bitrate" is based on the original recording. Which I would normally do a 44 kHz, which is fine. You could get anal and do 96kHz or 192 kHz recordings. I will change what I really meant... – rioki Apr 24 '14 at 07:52
1

44KHz, 96KHz and 192KHz are "sampling rates", not "bitrates". It's not a good idea to get them mixed up because sampling rate * sample size * channel count = bitrate. That's why the bitrate for 44.1KHz 16bit stereo is exactly 1411200 bits/s. 320kbps of 16 bit stereo sound would have to be sampled at 10KHz, and therefore my "will sound hideous" remark. – Panda Pajama Apr 24 '14 at 10:47
Yes you are correct they are sampling rates; BUT they have a direct correlation with bitrates. The higher the sampling rate and sample resolution the higher the bitrate. The lossless compression only reduces the total size, but does not break the correlation. Lossy compression either reduces the sample resolution or the sample rate or both. If you aggressively compress sound, it is like it was recorded with lower resolution. The smart thing that for example MP3 does is remove the "unimportant" bits. – rioki Apr 25 '14 at 09:32
rioki: No. Lossy compression doesn't achieve lower bitrates by reducing sample resolution nor sample rate. It's not "like it was recorded with lower resolution". Lossy compression works by modifying the soundwave to get one that responds better to compression, but in no point is the sample resolution or sample rate modified. Please do not post inaccurate information as an answer. – Panda Pajama Apr 25 '14 at 09:42
I hope that is was clear that for large recordings I select an appropriate sample rate and size (44.1 kHz / 16bit) and then compress it with FLAC. Small recordings remain as WAV files. When delivering the small recordings remain WAV and the large ones are then converted to MP3 128 kbps variable bitrate. I honestly don't care what actual bitrate the files in storage are. When writing the answer I was mixing up things, because I had recently programmed a large chunk of the audio processing engine. When streaming files, you care about how many bytes you need to read before the buffer exhausts... – rioki Apr 25 '14 at 09:42
@PandaPajama Do you know how MP3 compression actually works? The first part where size is reduced, is by reducing the sample size. but this is done in a smart way, as such that it is restricted to a window of values. This is then varied and altered as to reduce the overall error. Also you can employ down sampling to reduce the size and then you can add pattern compression. What is "altering the soundwave" different than changing it into a shape that fits into a lower sample rate and size. Just because it is done for small time intervals, does not mean it is not done. – rioki Apr 25 '14 at 09:54
MP3 compression has nothing to do with what you're saying. Conceptually, the soundwave is preprocessed with several filters, most commonly including a polyphase quadrature filter. It is thenconverted into the frequency domain with a discrete cosine transform. Then using psychoacoustics, temporal masking and other information, some frequencies are modified, so the resulting data (already in frequency domain) can be quantized, and then losslessly coded and compressed. Note that none of this has anything to do with reducing sampling rate or sample size. – Panda Pajama Apr 25 '14 at 10:22
Also note that what is lossy is not the compression itself, but the modifications performed on the frequency domain data, and the quantization step. The whole point of this is so the resulting data responds better to the coding and compression steps. – Panda Pajama Apr 25 '14 at 10:25
And what is that, except in layman terms? Mostly what I described. A polyphase filter is a special band pass on multiple bands, and what is that when it is in discrete samples? Yes less bits per sample, just not uniformly scaled over the amplitude range. Temporal masking, are small windows of time. And if you transform a sample into frequency range, remove some "unimportant" frequencies as done in compression and convert them back what is the result? Some sample are "missing", just not uniformly. It all sounds fancy, this are not analog sounds, but discrete samples. – rioki Apr 25 '14 at 10:39
I don't know anymore if you're actually believing all those things you're saying, or if you're just messing with me. Either way this conversation is not going anywhere, so this is where we part different ways. Have a good day, sir. – Panda Pajama Apr 25 '14 at 11:50
Originally I did not want to reply, but since we mutually agree this is going nowhere I will bid you farewell with the following: I actually have programmed a number of DSP algorithms. With quantized values, if you reduce the number of significant bits, information theory dictates that you can never fully restore that information. Using analog terminology for digital signal processing is highly misleading, since in many cases it is only weak approximation of the analog counterpart. If ever our ways cross I shall invite you to your beverage of choice. – rioki Apr 27 '14 at 13:51

What technical specifications should be set out for music?

2 Answers2