KUID: Compressed Universally Unique Identifier

KUID, a universally unique identifier, is essentially a shorter version of a UUID and is commonly used to identify records in various applications.

What’s UUID?

A UUID (Universally Unique Identifier) is a unique identifier that is widely used across various applications and systems to identify records. It is a 128-bit number typically represented as a sequence of 32 hexadecimal digits, grouped in 5 sections, separated by hyphens, as in the following format: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.

UUIDs are designed to be globally unique, meaning that it is extremely unlikely for two UUIDs to be the same, even when generated independently by different systems. This makes UUIDs useful for identifying and tracking records in distributed systems, databases, and other contexts where unique identification is necessary.

There are different variants and versions of UUIDs, each with its own way of generating unique identifiers. Some commonly used variants include UUIDv1, UUIDv4, and UUIDv5. The specific variant and version of a UUID can be determined from the format and content of the identifier.

KUID is generated in two steps

  • Generate UUID (use any algorithm like V1,V2,V3,V4 etc)
  • Encode 128-bit number

KUID = Generate UUID and encode it in the higher base like base 32, 36, or 62 or even higher.

func encoder
   input: number # 128 bit number 
   output: encoded string 
   

func KUID:
  input UUID, encoder
  output KUID 
  return encode.encode( UUID )

The length of the KUID depends on the encoding scheme employed. If the base62 encoding scheme ([0–9A-Za-z]) is used, KUID can be encoded to 22 bytes. However, if you wish to store KUID in upper case, then encoding it using the base36 encoding scheme, which utilizes characters [0–9A-Z], will result in an encoded representation that can be stored in 26 bytes. For this post, we’ll base 62 encoding.

Why Should We Use KUID?

KUID generates a smaller string, which results in less storage space and faster processing of many operations.

NOTE: Following statements are only valid when KUID is treated as a string, not as 128-bit number.

  • Hashing: Using KUID with its shorter length of 22 bytes reduces the amount of data that needs to be hashed. This results in fewer instructions being required to perform the hashing operation compared to UUID, which requires a hash of 36 bytes.
  • Comparison: Due to the shorter length of KUID, only 22 bytes need to be compared instead of 36 bytes, resulting in fewer instructions needed for comparison.
  • Read/Write: Using KUID, which can be encoded in 22 bytes, results in a reduction in the memory and disk space required for read and write operations. This is in contrast to UUID, which requires 36 bytes for the same operations. With KUID, fewer instructions are needed to read or write data due to its smaller size.
  • IO Calls: Sending a UUID over a network consumes 36 bytes of data (without considering compression), while a KUID only requires 22 bytes. Therefore, KUID is a more efficient choice for network transmission due to its smaller size.
  • Database indexing: Due to the smaller size of KUID, it leads to a decrease in the size of the database index. In addition, combining KUID with another column for secondary indexing can be beneficial as databases have a limited index width (e.g., 767 bytes in MySQL) that is dependent on the encoding used for the column or table. For instance, MySQL’s UTF8 encoding may limit the index width to as little as 255 bytes.
  • URL safe: Like UUID, KUID is URL safe since it only uses [0–9A-Za-z] characters to represent the data. KUID is commonly used as a unique column, which means that URLs containing KUIDs may be used to represent specific resources, such as /api/v1/products/:productId. As the KUID is included in the URL, it must be URL safe to avoid issues such as encoding errors or unintended interpretation by the web server or browser.
  • Smaller Memory/Disk Usage: Using KUID results in lower memory/disk usage compared to UUID. Since KUID only requires 22 bytes, it uses less storage space than UUID, which uses 36 bytes. This can be beneficial for systems that require frequent read/write operations or have limited storage capacity. For instance, if we consider the storage of 200 million records, assuming each KUID is stored as a 22-byte string, the space consumed by KUIDs would be approximately 4.4 GB. In contrast, the space required for UUIDs would be around 7.2 GB. Therefore, using KUIDs can result in a storage saving of approximately 39%.

Java Implementation:

/*
 * Copyright (c) 2023 Sonu Kumar, sonunitw12@gmail.com
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * You may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     https://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and limitations under the License.
 *
 */
package sonus21.github.com;

import java.util.UUID;

/**
 * KUID, a universally unique identifier, is essentially a shorter version of a UUID due to BASE 62 encoding
 * <p>
 * This KUID implementation always generate 22 bytes string, but it can be further reduced.
 * <p>
 * For details read <a href="https://sonus21.medium.com/8aaa5d83368">KUID: Compressed Universally Unique Identifier</a>
 */

public class KUID implements java.io.Serializable, Comparable<KUID> {
    private static final long serialVersionUID = -485684636193249489L;

    private static final String BASE62 = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
    private static final int BASE = BASE62.length();
    private final long msb;
    private final long lsb;


    private KUID(long msb, long lsb) {
        this.msb = msb;
        this.lsb = lsb;
    }

    // A naive method to encode number to Base62
    private static String encodeLong(long value) {
        int size = 11;
        StringBuilder sb = new StringBuilder(size);
        while (value != 0) {
            int index = (int) (value % BASE);
            value = value / BASE;
            // for negative number flip the order instead of error it out as we need to encode all 64 bits
            if (index < 0) {
                index = BASE + index;
            }
            sb.insert(0, BASE62.charAt(index));
        }
        while (sb.length() < size) {
            sb.insert(0, BASE62.charAt(0));
        }
        return sb.toString();
    }


    /**
     * Static factory to retrieve a KUID from UUID
     *
     * @param uuid the UUID object
     * @return KUID object
     */
    public static KUID fromUUID(UUID uuid) {
        if (uuid == null) {
            throw new IllegalArgumentException("uuid can not be null");
        }
        return new KUID(uuid.getMostSignificantBits(), uuid.getMostSignificantBits());
    }

    /**
     * Static factory to retrieve a pseudo randomly generated KUID
     * <p>
     * The {@code KUID} is generated using a random UUID.
     *
     * @return A randomly generated {@code KUID}
     */
    public static KUID randomKUID() {
        return fromUUID(UUID.randomUUID());
    }

    @Override
    public int compareTo(KUID o) {
        return (msb < o.msb ? -1 :
                (msb > o.msb ? 1 :
                        (Long.compare(lsb, o.lsb))));
    }


    /**
     * Returns a {@code String} object representing this {@code KUID}.
     *
     * @return A string representation of this {@code KUID}
     */
    @Override
    public String toString() {
        return encodeLong(msb) + encodeLong(lsb);
    }

    /**
     * Returns a hash code for this {@code KUID}.
     *
     * @return A hash code value for this {@code KUID}
     */
    @Override
    public int hashCode() {
        long xor = msb ^ lsb;
        return ((int) (xor >> 32)) ^ (int) xor;
    }


    /**
     * Compares this object to the specified object.  The result is {@code
     * true} if and only if the argument is not {@code null}, is a {@code KUID}
     * object, and contains the same value.
     *
     * @param obj The object to be compared
     * @return {@code true} if the objects are the same; {@code false} otherwise
     */
    public boolean equals(Object obj) {
        if (!(obj instanceof KUID)) {
            return false;
        }
        KUID id = (KUID) obj;
        return msb == id.msb && lsb == id.lsb;
    }
}

KUID Usage:

public class Main {
    public static void main(String[] args) {
        for (int i = 0; i < 10; i++) {
            KUID kuid = KUID.randomKUID();
            System.out.println(kuid.toString());
        }

    }
}

This will produce something like this:

LJVa90PC5N0LJVa90PC5N0
8SYCyPz9NXp8SYCyPz9NXp
8ucdhnCeYdk8ucdhnCeYdk
7QaHvqldPIL7QaHvqldPIL
8031jRPF3DS8031jRPF3DS
9oec5uzuIpg9oec5uzuIpg
4EAtrqG7CmF4EAtrqG7CmF
1vywu4ZzLz71vywu4ZzLz7
19YC4JE4W7619YC4JE4W76
8al7UWmt4Q28al7UWmt4Q2

If you found this post helpful, please share and give a thumbs up.


Source link