Best file format for short audio clips in a video game

Question

I am writing a video game to teach kids how to read, and as part of that I have to play short audio clips that make basic sounds. For example, the /t/ sound (the first sound in "Tiger").

I will also play longer syllable sounds, like "strong" and "est". And finally I will play full word sounds, like "brontosaurus" (a little over 1 second in length).

I've read that latency can be an issue with shorter sounds, so it's best to use a non-compressed file format like .wav for short files. The files will be larger, but the impact on disk storage won't be so much since the audio files are short.

For longer sounds like "brontosaurus", I would be inclined to keep using the same file format (.wav) just for consistency of sound assets. But I will have over 5000 words being said in my program, and so disk space may become an issue. If so, I could use a compressed format and perhaps preload it.

I have spoken dialogue in the game as well, but plan to just use mp3 for that. A small delay there would be fine.

Does it make sense to use .wav for all shorter sounds? Is there a number of seconds at which it makes sense to switch to a compressed audio format?

The answers cover a lot of the points to consider but it's also worth noting that uncompressed audio like wav doesn't have to be uncompressed when stored on disk. You could pack/zip the wav files and decompress them into memory when your game loads. It won't be as efficient as compression tuned for audio use, but it would significantly reduce the disk footprint without compromising on using wavs [you'd either need enough ram for all the wavs, or know in advance which ones to load in the background] — Basic, Mar 04 '23 at 13:14

score 1 · Answer 1 · answered Mar 04 '23 at 04:12

It is ultimately your call. One notable consideration is where and how you will deploy it.

It should not be hard to extrapolate the size of wav files based on their duration and bit rate since they are not compressed. And use that information to make your decision.

A quick back of the envelope computation tells me that all the audio should be less than a couple gigas in storage. Which might be acceptable, or might not. You decide.

Yet, I have something to add: consider ogg. The codec is free, gratis, libre. And it gives you good quality for its compression, which you can take advantage of to have less playback latency than mp3 of similar quality.

At the same bit rate the ogg might actually be larger, but sound better to the human ear, than the mp3. So might be able to go for a lower bit rate, and it will have less playback latency chiefly by being lighter. Tweaking and testing required.

Does it make sense to use .wav for all shorter sounds?

Sure.

Is there a number of seconds at which it makes sense to switch to a compressed audio format?

Nope.

Instead, for music you usually don't worry about playback latency… Unless you need to sync the game with music (i.e. rhythm games). A workaround being having the music compressed so it takes less storage, but when you need a song, you delaying the gameplay until you uncompress the music in memory and get it ready for playback.

But for a sound effects - in particular in action heavy games - you do care that they have low playback latency (e.g. you want to hear an impact when you see it on the screen, similarly you don't want to hear an in game gun shot late).

Thus, it is first a matter of use case. Not of length of the audio.

Your game might not be an action heavy one. Thus, it could be that timing does not matter as much for your game. In which case, you can go ahead and use audio compression.

Or it might be that audio quality does not matter as much… If it is all uniform clear speech without much noise, you might get some good compression while still having an acceptable quality. Again, tweaking and testing required.

Similarly, if you need to deploy to a platform that is sensible to storage (e.g. mobile) you might have to compress the audio to an acceptable size despite the latency and drop in quality.

score 1 · Answer 2 · answered Mar 04 '23 at 12:53

It's not too difficult to decompress an audio file at load time. There's obviously some CPU cost, but for short files that shouldn't be significant.

I'd therefore suggest:

Record all the audio as uncompressed wav files. That way you can play with compression settings to trade off file size vs. quality later on.
Start with the simple option of compressing all of the audio using something like Ogg Vorbis or mp3.
For any cases where you have performance/latency issues, consider:
- Loading latency sensitive audio files into memory in advance instead of streaming them from disc. The file I/O will probably add more latency than the decompression.
- If need be decompress them at load time too. Playing uncompressed audio also helps with the CPU cost.
- If loading takes too long, then consider storing them decompressed, or loading them on a background thread.

One thing to note with latency is that a typical game rendering process will have some latency, which can easily add up to 50ms or more at 60 FPS, so having a little bit of audio latency can actually mean that you're better synchronized with the graphics!

Best file format for short audio clips in a video game

2 Answers2