13 March 2009

UUID and Byte Order

Say you have a UUID stored in memory as an array of 16 contiguous bytes like this

const unsigned char bytes[16] = {
    0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77,
    0x88, 0x99, 0xAA, 0xBB, 0xCC, 0xDD, 0xEE, 0xFF
};

And say you need a function that can take a pointer to the first byte of such an array and return the equivalent curly-bracketed string representation of that UUID

std::string uuid_string(const unsigned char *);

You know the function should return a string something like this
    {XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX}

But do you know precisely what the string should be for the array given above?

UUIDs are documented in RFC 4122. Here is an extract from that RFC:
In the absence of explicit application or presentation protocol
specification to the contrary, a UUID is encoded as a 128-bit object,
as follows:

The fields are encoded as 16 octets, with the sizes and order of the
fields defined above, and with each field encoded with the Most
Significant Byte first (known as network byte order).  Note that the
field names, particularly for multiplexed fields, follow historical
practice.

0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                          time_low                             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       time_mid                |         time_hi_and_version   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|clk_seq_hi_res |  clk_seq_low  |         node (0-1)            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                         node (2-5)                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

So, assuming our 16-byte array is encoded in the format recommended in RFC 4122, each field will be stored most-significant byte first and our function should return the string
    {00112233-4455-6677-8899-AABBCCDDEEFF}

However...

“In the absence of explicit application or presentation protocol specification to the contrary...”, so in other words you cannot write a universally applicable 16-byte array to UUID string function because the encoding of the UUID has not been mandated by the RFC.

RFC 4122 may be perfectly clear, but I believe the freedom it permits could cause confusion. Here is an example where the default encoding described in RFC 4122 was not adopted:
Although RFC 4122 recommends network byte order for all fields, the PC industry (including the ACPI, UEFI, and Microsoft specifications) has consistently used little-endian byte encoding for the first three fields: time_low, time_mid, time_hi_and_version. The same encoding, also known as wire format, should also be used for the SMBIOS representation of the UUID. System Management BIOS (SMBIOS) Specification v2.6

But the UUID field was introduced in SMBIOS specification version 2.1 and the above quoted clarification was absent from that specification; one was just told “UUID; 16 BYTEs” (and something about what all FFs and all 00s meant) and left to get on with it. And you would have been wrong to read those 16 bytes assuming they were big-endian. Had you done so you would at some point have found that your UUID for a given computer did not match the UUID someone else obtained for that same computer via some other means, say WMI. And you might have felt both annoyed and embarrassed about it; embarrassed that you didn’t make more effort to check that your interpretation was correct in the first place and annoyed that the documentation was not more explicit - so annoyed that you’d feel the need to blog about how unfair it all was.

To summarise: if the encoding is big-endian, as described in RFC 4122, the UUID represented by the bytes
const unsigned char bytes[16] = {
    0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77,
    0x88, 0x99, 0xAA, 0xBB, 0xCC, 0xDD, 0xEE, 0xFF
};

will be
    {00112233-4455-6677-8899-AABBCCDDEEFF}

But if the encoding is little-endian, as described in SMBIOS V2.6, the same 16 bytes will represent the UUID
    {33221100-5544-7766-8899-AABBCCDDEEFF}

Perhaps your system uses some other UUID encoding...


UPDATE June 2012

When I read the note in the SMBIOS v2.6 specification I interpreted it not as saying the format was changing from big- to little-endian, but as saying that the UUID had always been stored little-endian, but this had hitherto not been explicitly stated. But now it has been explicitly stated that the value is stored little-endian what should you do if you've got code out in the field that's written assuming it was stored big-endian? I think these are your choices:

a) You could leave your code as it is. But that would mean your code may get a different value to that obtained through another means, such as WMI. This is what I described above. The upside is that your code is consistent, which may or may not matter in your application.

b) You could change your code so that it always reads the UUID in little-endian byte order. This is may make it more likely to be consistent with the value obtained through other means. But it does mean your code is not consistent, which may or may not matter in your application.

c) You could sometimes read it little-endian and sometimes big-endian, depending on the SMBIOS version number. For example, here is what is done in dmidecode:


    from dmidecode.c:
    /*
     * As off version 2.6 of the SMBIOS specification, the first 3
     * fields of the UUID are supposed to be encoded on little-endian.
     * The specification says that this is the defacto standard,
     * however I've seen systems following RFC 4122 instead and use
     * network byte order, so I am reluctant to apply the byte-swapping
     * for older versions.
     */
    if (ver >= 0x0206)
        printf("%02X%02X%02X%02X-%02X%02X-%02X%02X-%02X%02X-%02X%02X%02X%02X%02X%02X",
            p[3], p[2], p[1], p[0], p[5], p[4], p[7], p[6],
            p[8], p[9], p[10], p[11], p[12], p[13], p[14], p[15]);
    else
        printf("%02X%02X%02X%02X-%02X%02X-%02X%02X-%02X%02X-%02X%02X%02X%02X%02X%02X",
            p[0], p[1], p[2], p[3], p[4], p[5], p[6], p[7],
            p[8], p[9], p[10], p[11], p[12], p[13], p[14], p[15]);


It doesn't seem right to me that the value of the SMBIOS UUID should depend on the programmer's judgement in this way.

Even after you get past this byte-order issue, there are other problems with the SMBIOS UUID you should be aware of.

index of blog posts

2 comments: