A few ways of “watermarking” mp3 files

Recently there has been talk by some music labels that they will be releasing non-DRMed mp3 files, followed by updates that they are going to “watermark” these files.

I choose to write “watermark” quoted, as this is really more tagging than watermarking, because I doubt they will be applying on-the-fly digital audio processing to watermark, followed by encoding each mp3 specifically for each customer. Let’s assume they’re going to just modify stocks of ready-encoded mp3s.

MP3 file format layout

Before continuing it’s important to understand how MP3 files are stored, so let’s take a closer look at how they are structured internally:

ID3v2 tag (optional)
LAME header (optional)
MPEG frame 1
MPEG frame 2
..
MPEG frame N
ID3v1/ID3v1.1 tag (optional)

ID3 tags

The ID3 tags contain various metadata about the file, which is normally either entered automatically by the encoder or edited manually by the person encoding the files. Two standards exist:

ID3v1 - which later got a slight modification and became ID3v1.1 - is a 128-byte fixed length structure located at the end of files.

ID3v2 which is a more dynamic structure, typically located at the beginning of the file. This allows for a plethoria of different information to be stored, with several extensions being made all the time. (e.g. Lyrics3 which allows the lyrics transcript of a song to be embedded into the file)

MPEG frames

MPEG frames are small chunks of audio data. The size of these frames will depend on the bitrate used in encoding the file, but each header will be prefixed with a 4 byte frame header.

These headers contain information needed to interpret and make use of the encoded data (e.g. which MPEG encoding method is used, which bitrate the frame is encoded at, the sampling frequency etc) but also non-functional data such as a “protection bit”, “private bit”, “copyright bit”, “original bit” etc.

For a full description of the MPEG headers see e.g. this article at mp3-tech.org or search the web.

LAME header

Some encoders, such as LAME (and Xing), add frames that appear as regular MPEG frames, but that actually contain additional meta data about the encoding parameters used etc.

SO WHERE DOES THIS LEAD US?

So where does this lead us? Let’s have a look at a few (but probably not all) ways of tagging mp3 files with some kind of watermarks. (with a varying depth of detail)

1. Adding out-of-stream data to the MPEG stream

Because MPEG frame headers contain a sync marker so that a player can check if the next location it is about to read is a valid MPEG frame, this means that you can place out-of-stream data in between frames.

This will just make players skip a few bytes until it finds a valid MPEG frame header sync marker, but other applications can choose to store custom data in here.

2. Using the unused MPEG frame header bits

As mentioned above, MPEG frame headers Bits like the “original bit”, “private bit” etc are not of much use to players, so for each frame of the MP3 file you can store up to several bits of information.

Spread across the entire file, depending on the length of the song, this would allow for quite a lot of data which can be used as tracking markers once read by a special program.

3. Using the IDv1 tag

I find this not very likely, but as a method you could choose to utilize the “comment” field of the ID3v1 header to add some sort of numeric ID etc.

4. Using the ID3v2 tag

There’s primarily two methods here:

a) Using an unused or little-used, or even creating a custom ID3v2 block

b) Using the unused padding space of the ID3v2 tag

5. Constructing special MPEG frames like the LAME header

You could divide this into two methods also:

a) Frames using one of the undefined or unallowed bit combinations of a frame header to mark it as invalid, so that the player will skip it. Then custom data can be stored in the actual data portion of the frame. (I think this is what LAME does)

b) Special valid encoded frames consising of an audio watermark (e.g. some audio wave) could be added. This would probably play back as a short click or noise, though, so it might not end up sounding too good.

6. Using the MPEG frame header CRC

The encoder can optionally add checksums to each MPEG frame, so the validity of the file can be tested for corrupted frames etc.

Seeing that not many players actually test for this even if it is present, this could be used to insert 2 bytes of data per frame header.

It could be a bit risky doing this, though, should any players actually choose to test the validity of frames against the CRC.

7. Using variations of volume in each MPEG frame

Quite honestly this is beyond what I know too much about, but I know certain tools (e.g. mp3gain) can normalize or change the volume of an mp3 without transcoding (re-encoding), and thus offers a non-destructive way of doing this.

This is based on the fact that data is (I think) stored as floating point values. These values again are stored using a “sign * mantissa * radix ^ exponent” format, which means that you can increment or modify these by fixed values back and forth and introduce gradual change with the option of still getting back to the original values (up to a certain level I guess).

Utilizing this, I guess you could somehow introduce short changes in volume from frame to frame that would go by unnoticed by the listener, but through analysis could be detected. E.g. think morse code.

I don’t know if this would work. Maybe. It’s just an idea..

CONCLUSION

There’s a lot of places you can hide information in an MP3 file. However, it does not take more than having 2 different watermarked copies of the same file to figure out where data is being stored.

Of course, one could combine all the methods described above, or even additional ones, but in the end all of this information could be stripped away leaving only the MP3 audio data left.

We’ll find out soon enough as these files hit the streets..

Update
Someone pointed out the fact that on-the-fly watermarking and transcoding would be perfectly well possible on smaller ranges of the files, an option I never considered…..

5 Responses to “A few ways of “watermarking” mp3 files”

  1. arun Says:

    The Explanation given is good and can you please elaborate more on the first method i.e., on “Adding out-of-stream data to the MPEG stream”.

  2. cygn Says:

    But all these methods can be defeated by reencoding the file…

  3. Rune Bjerke Says:

    @arun: It simply means that data can be added in between the valid MPEG frames, as long as that data does not look like valid MPEG frames with an MPEG frame sync header etc. Most players will just scan on and look for the next valid sync signature and ignore the junk in between.

    That’s why you e.g. can sometimes take a Flash file, an .avi file, or something else with an MPEG stream in it and drag it into Winamp or some other player and still get some sensible playback. Of course there are bits and pieces in between that will look as valid frames and cause blips and blops and noise..

  4. Roy Says:

    The framesync header is identified by 11 consecutive bits set to “one”. Out-of-stream data is data appearing after the data payload, but before the next frame, that is the next section starting with 11 consecutive bits set to “one”.

    When I created a script calculating the length of an mp3, I calculated that on a frame-to-frame basis. That is finding the length of the first header, based on that skipping to the next, etc. If I didn’t find any header sync word where I expected it to be, I simply looked forward till I found the next valid sync header and verified that the header seemed valid.

    You could also calculate the whole play time of an mp3 simply by looking at the fisrt header and then multiply that length with a formula based on the file size minus id3 and Lame header, but this will only work on constant bitrate files and woild certainly break if you put out-of-stream data into the mp3 stream.

    Also such oos data is not part of the mp3 format, thus it is not given how mp3 players behaves in such situations. It could work just fine, stop playing, play a little warbled sound or even crash where the oos-data resides.

    Also worth to mention is that every other frame is one bit longer than the other frames, indicated by the “padding bit”.

  5. Roy Says:

    @cygn: You don’t even need to re-encode the data, just reformat the headers and removing oos-data, but I’ve heard of watermarking methods where the watermarking is actually encoded into the audio stream although disputed sice it would alter the audio quality. The primary goal of watermarking is to identify the source of a media, even when the media is distorted somehow. That is for instance when you copy a DVD onto a VHS or a CD to a MC. (Why the hell would you do that?? ;) )

Leave a Reply