Introducing .toUUID()

NOTE

Since toUUID was released it has gone through some changes which is not reflected in this post. While most of the points made in this remain relevant, there are some syntactical changes, and other minor updates (also a re-branding from .toUUID() to toUUID).

For the latest information, take a look at the toUUID Github repository.

For a better writeup on the pros and cons of toUUID, with more examples, check out the why toUUID? file on Github.

.toUUID

If I had to choose which aspect of software development I am most passionate about it would probably be how we make our products, which includes things like what kind of standards we follow when writing code, which tools we use, how we get our final code into production in a safe manner. I care about the long term sustainability of what we’re making. Therefore, to nobody’s surprise, I care a lot about:

  • Code readability
  • Automated testing

Both of which .toUUID() attempts to help with!

Content

The problem

What should be an ID and what shouldn’t is a complicated subject, especially in a database where rigid rules are difficult to change after the fact, especially in modern development where both data and systems are distributed. Due to these complexities developers often use UUIDs (GUID for you C# developers). For more information about what a UUID is, please check out this article from Baeldung.

The tl;dr is that developers tend to use UUIDs because:

  • UUIDs are good at being unique (I generally don’t have to worry about one record colliding with another in a database)
  • Easy to generate

I’m not going to do the pro’s and con’s of UUIDs, for that read Jeff Atwood’s blog post on the topic. I’ve seen some strong opinions on the subject of UUIDs and GUIDs, and this post (or library) is not an argument for or against them. Instead, this is an attempt to make life more liveable for those who do use UUIDs.

Let's consider what attributed we want for the values we want to use in our automated tests:

  • Repeatable: Our tests must use repeatable values to have reliable and predictable assert. Random values are hard to verify and can quickly lead to flakey tests, so developers tend to stay away from randomness when writing tests.
  • Human readable: Tests should be easy to read, which is why we should avoid constructs which break the flow of reading the code or making the code more complicated.

The way UUIDs work in Java is at odds with these two values. Let's take a look at how we generate a new random UUID:

UUID uuid = UUID.randomUUID();
//Generates a random UUID, like c0e8c64b-0178-4a92-9ed9-77d7d92fb322

The code above is short and easy, but not in-line with what we need for our tests. After all, our tests need something repeatable (i.e. no random numbers), and something easy for us humans to read. The Java UUID class can only be instantiated through a handful of ways, and for testing purposes, the most accessible way which kinda achieves both is the “fromString)” method:

UUID uuid = UUID.fromString("00000000-0000-0000-0000-000000000001");
//Generates a UUID which translates to the value of 00000000-0000-0000-0000-000000000001

The string generated UUID is way more readable and predictable than the random one, so it is preferred, but it isn't good enough:

  • The code line is long and takes up a lot of space
  • All those 0 causes a lot of noise. Developers only really care about the 1, yet have to deal with all the 0
  • The dashes must be placed in the correct place, so developers tend to copy and paste these values around in the code
  • It’s just not pretty

Consider the following code:

@Test
public void VerifyThatCorrectStudentsIsListedInClass() {
    //arrange 
    List<Student> expectedResult = Arrays.asList(
        createStudent(UUID.fromString("00000000-0000-0000-0000-000000000001"), "CS"),
        createStudent(UUID.fromString("00000000-0000-0000-0000-000000000002"), "CS"));
    List<Student> studentsFromQuery = Arrays.asList(
        expectedResult.get(0),
        createStudent(UUID.fromString("00000000-0000-0000-0000-000000000003"), "History"),
        expectedResult.get(1));
    when(classDao.listStudentsBySchool("Harvard")).thenReturn(studentsFromQuery); //<- Setting up a mock

    //act
    List<Student> result = studentService.listStudentsByClass("CS", "Harvard");

    //assert
    assertThat(result) //<- Using assertJ style asserts
        .extracting(studentId)
        .containsExactlyInAnyOrder(
            UUID.fromString("00000000-0000-0000-0000-000000000001"),
            UUID.fromString("00000000-0000-0000-0000-000000000002")
        );
}

private Student createStudent(UUID studentId, String className) {
    return Student.builder().
        .studentId(studentId))
        .schoolClass(className)
        //Some more values
        .build();
}

The example above is based on a real test I’ve come across, but with some added comments and translated into a different context. As with most code, this test has other issues than just the UUIDs; however, let’s tackle one issue at a time and focus on the UUIDs. Human minds like seeing repeating patterns, so our eyes naturally like to rest on these long repeating lines of 0. The UUIDs takes focus away from the test itself.

There is no current solution to this other than abstracting out the UUID creation itself. Which is why I’ve seen some resort to making static hard-coded UUID fields:

public abstract class TestUUIDs{
    public static UUID uuid1 = UUID.fromString("00000000-0000-0000-0000-000000000001");
    public static UUID uuid2 = UUID.fromString("00000000-0000-0000-0000-000000000002");
    public static UUID uuid3 = UUID.fromString("00000000-0000-0000-0000-000000000003");
    //Etc...
}

This approach isn’t wrong, but it is not very elegant, either. While it does clean up the tests, it is also not a very pretty pattern. I have worked in projects doing this and, for the most part, it works just fine, but I’m a firm believer that we should aim for something that works better than “just fine”.

What if there was something dynamic, yet didn’t sacrifice any performance? What if there was something which removed the noise from our tests while maintaining the vital information for our tests?

.toUUID() to the rescue!

The core idea of .toUUID() is to simplify how we deal with UUIDs in our automated tests. So rather than doing this:

UUID.fromString("00000000-0000-0000-0000-000000000001");

We can do this in Java:

toUUID(1);

or in Kotlin we can do this:

1.toUUID() //<-- See what I did there?

All three of these solutions will generate a UUID with the same value.

How does .toUUID() work?

The easiest way to explain how .toUUID() does this is to say that it appends the integer at the end of the UUID, but that isn’t entirely true. If you take a look at the actual implementation, we see that .toUUID() translates the int to a long and does a bitwise and operation on it. The reason for this is that the .fromString method isn’t very efficient for our use case.

When we call UUID.fromString("00000000-0000-0000-0000-000000000001") each value (between the dashes) are eventually translated into a long. The long is then sent trough some validation, and some calculations are applied. While being a good implementation for general-purpose usage, it simply isn’t optimal for something like .toUUID(). Java’s int cannot hold a value larger than 2147483647. The “biggest” UUID can ever be in .toUUID() is 00000000-0000-0000-0000-002147483647, which means we are doing a lot of unnecessary work if we use the fromString method.

Since all values of a positive integer will always result in a valid UUID, then I decided to skip all the validation and do the calculation myself.

In practical terms, it is easier to think that .toUUID() adds the integer at the end of the UUID. So:

  • 1 becomes 00000000-0000-0000-0000-000000000001
  • 2 becomes 00000000-0000-0000-0000-000000000002
  • 12 becomes 00000000-0000-0000-0000-000000000012
  • 120 becomes 00000000-0000-0000-0000-000000000120
  • 100000 becomes 00000000-0000-0000-0000-000000100000
  • And so forth...

Java

If we take the first example and rewrite it to use .toUUID(), then it would look something like this:

import static io.github.atomfinger.touuid.UUIDs.*;

@Test
public void VerifyThatCorrectStudentsIsListedInClass() {
    //arrange 
    List<Student> expectedResult = Arrays.asList(
        createStudent(toUUID(1), "CS"),
        createStudent(toUUID(2), "CS"));
    List<Student> studentsFromQuery = Arrays.asList(
        expectedResult.get(0),
        createStudent(toUUID(3), "History"),
        expectedResult.get(1));
    when(classDao.listStudentsBySchool("Harvard")).thenReturn(studentsFromQuery); //<- Setting up a mock

    //act
    List<Student> result = studentService.listStudentsByClass("CS", "Harvard");

    //assert
    assertThat(result).extracting(studentId).containsExactlyInAnyOrder(toUUIDs(1, 2));
}

private Student createStudent(UUID studentId, String className) {
    return Student.builder().
        .studentId(studentId))
        .schoolClass(className)
        //Some more values
        .build();
}

Other Java methods

Generate a list of UUIDs from a Collection of integers:

List<UUID> uuids = toUUIDs(Arrays.asList(1, 2, 3, 4, 5));
uuids.forEach((it) -> System.out.println(it.toString()));
//Output:
//00000000-0000-0000-0000-000000000001
//00000000-0000-0000-0000-000000000002
//00000000-0000-0000-0000-000000000003
//00000000-0000-0000-0000-000000000004
//00000000-0000-0000-0000-000000000005

Generate a list of UUIDs based on integer varargs:

List<UUID> uuids = toUUIDs(1, 2, 3, 4, 5);
uuids.forEach((it) -> System.out.println(it.toString()));
//Output:
//00000000-0000-0000-0000-000000000001
//00000000-0000-0000-0000-000000000002
//00000000-0000-0000-0000-000000000003
//00000000-0000-0000-0000-000000000004
//00000000-0000-0000-0000-000000000005

Generate a list of UUIDs based on a range:

List<UUID> uuids = toUUIDsFromRange(1, 5);
uuids.forEach((it) -> System.out.println(it.toString()));
//Output:
//00000000-0000-0000-0000-000000000001
//00000000-0000-0000-0000-000000000002
//00000000-0000-0000-0000-000000000003
//00000000-0000-0000-0000-000000000004
//00000000-0000-0000-0000-000000000005

Kotlin

.toUUID() were initially conceived as a Kotlin library. Kotlin has some extra bells and whistles, most notably extension function which gives the Kotlin API an additional set of functions. In its current form, the Kotlin API works as a wrapper around the Java code in an attempt to “kotlinfy” the code and make it cleaner.

NOTE
Do note that Kotlin can use the same functions as the Java code can (but not the other way around). When doing this project, I didn’t want people having to deal with extra dependencies, which is why I am deliberately leaving the Kotlin libraries out of the Java implementation of the code. The result of this is that Kotlin can use the Kotlin code, since the Kotlin libraries are already provided, but Java code will get an exception due to missing dependencies. How Kotlin compiles extension functions, and how Java interprets them is a bit funky anyway, so I think that a Java developer would be more comfortable with the Java API regardless.

Rewriting the first example in Kotlin we get the following:

import io.github.atomfinger.touuid.toUUID
import io.github.atomfinger.touuid.toUUIDs

@Test
fun VerifyThatCorrectStudentsIsListedInClass() {
    //arrange 
    val expectedResult = listOf(
        createStudent(1.toUUID(), "CS"), 
        createStudent(2.toUUID(), "CS")
    )
    val historyStudent = createStudent(3.toUUID(), "History")
    val studentQueryResult = expectedResult.toMutableList().apply{ add(1, historyStudent) }
    every { classDao.listStudentsBySchool("Harvard") } returns studentQueryResult

    //act
    val result = studentService.listStudentsByClass("CS", "Harvard")

    //assert
    assertThat(result).extracting(studentId).containsExactlyInAnyOrder((1, 2).toUUIDs());
}

private fun createStudent(studentId: UUID, className: String) =
    Student(
        studentId,
        className
        //Some more values
    )

Other Kotlin functions

Generate list of UUIDs based on a collection:

val uuids = listOf(1, 2, 3, 4, 5).toUUIDs()
uuids.forEach { println(it.toString()) }
//Output:
//00000000-0000-0000-0000-000000000001
//00000000-0000-0000-0000-000000000002
//00000000-0000-0000-0000-000000000003
//00000000-0000-0000-0000-000000000004
//00000000-0000-0000-0000-000000000005

Generate list of UUIDs based on range:

val uuids = (1..5).toUUIDs()
uuids.forEach { println(it.toString()) }
//Output:
//00000000-0000-0000-0000-000000000001
//00000000-0000-0000-0000-000000000002
//00000000-0000-0000-0000-000000000003
//00000000-0000-0000-0000-000000000004
//00000000-0000-0000-0000-000000000005

Generate list of UUIDs based on a sequence:

val uuids = uuids().take(5)
uuids.forEach { println(it.toString()) }
//Output:
//00000000-0000-0000-0000-000000000001
//00000000-0000-0000-0000-000000000002
//00000000-0000-0000-0000-000000000003
//00000000-0000-0000-0000-000000000004
//00000000-0000-0000-0000-000000000005

How to get .toUUID()

.toUUID() can be found on maven central repository, so for maven projects you only need to incldue it as a dependency:

<dependency>
    <groupId>io.github.atomfinger</groupId>
    <artifactId>atomfinger-touuid</artifactId>
    <version>1.0.0</version>
    <scope>test</scope>
</dependency>

Or .toUUID() can be imported through Gradle:

testCompile group: 'io.github.atomfinger', name: 'atomfinger-touuid', version: '1.0.0'

Do make sure to limit .toUUID() to the test scope. While .toUUID() won't and should not cause any havoc in production code it is not meant to be used in production code. It is meant to ease automated testing and not as a legit way of generating UUIDs.

You may also check out .toUUID() out on Github.

How do you pronounce .toUUID()?

I don’t know. I attempted to find something catchy on a text-to-speech nightmare fuel website, but that sounded more like “toad”. So, don’t ask, and if you have to pronounce it, say it like “to-u-id” or more like “to-you-id”, or even “two-you-id”. Your guess is as good as mine.

.toUUID() is meant to be parsed, not said.

Previous
Previous

ORM framework anti-patterns

Next
Next

Blogging on Squarespace is a pain (and here's how I fixed it)