Don’t use transient fields

Most ORM frameworks allow developers to code in values into the models which is actually not a part of the database model. Java calls this transient fields, which is the word I will use throughout this post as well. I aim to explain why transient fields are toxic, and how to avoid them.

To make the definition clearer, assume that we have some sort of entity class be it in C#’s entity framework, Java’s JPA/Hibernate or Python’s SQLAlchemy. When we request this object from the database it has a set of values from the database. Some might be lazy loaded, some might be aggregated values and some might be mapped directly from the table’s columns. When this object is returned to us, every value which our application must set to exist is considered a transient value. Lazy loaded values are not transient, because the ORM framework will make sure to load these values when we request them. Other automated ways of loading data might already be available, but again, this is automatic. This post is only talking about values which must be set by the application itself after fetching the object.

Note: Throughout this post I will be using Java and JPA as an example. We will also mix in some lombok just to remove some of Java’s bloat. The points brought up does relate to other languages and frameworks as well.

The Problem

Before we can discuss solutions we must acknowledge the problem of transient fields. Transient fields are not inherently bad, but as we will see throughout this post, they are also not useful. At best they cause confusion about the state of an object, at worst they can cause bugs. 

Let’s go through an example. Pretend that we are making a system for a school. It is likely that we would have a table in our database for the students. It would also make sense that we have a model linked to that table called “Student”. The requirements for this system is that we must store some data about the students, stuff like name, age and so forth. The initial implementation might look like this:

@Entity
@Getter //Lombok - Generates getters
@Setter //Lombok - Generates setters
public class Student {

    @Id
    @GeneratedValue(strategy = GenerationType.AUTO)
    private long studentId;

    private String firstName;

    private String secondName;

    private LocalDate dateOfBirth;

    private String ssn;

}

Let’s look at the implementation of the StudentServiceImpl:

public class StudentServiceImpl implements StudentService {

    //A DAO (Data Access Object) is responsible for getting data from an database
    private StudentDao studentDao;

    public StudentServiceImpl(StudentDao studentDao) {
        this.studentDao = studentDao;
    }

    @Override
    public Optional<Student> findStudent(Long studentId) {
        return studentDao.find(studentId);
    }

    @Override
    public List<Student> listStudents() {
        return studentDao.list();
    }
}

Neither of these two classes are especially complicated. We have two methods, one to find a specific student by an ID, and the other to simply list out all the students.

Let’s say that we need to interface with a third party system which requires a special ID. This ID is built up by combining the student’s SSN, date of birth and student ID number. Obviously this is a completely arbitrary example, but adding transient fields often comes as a result of changed requirements, or additional requirements.

The developers, upon seeing this new requirement, have two options:
Adding a new column to the database. This operation will include:

  • Making a new column

  • Adding data to the column and verify that the data is correct

  • Changing the save logic to make sure that the field is populated correctly

  • Adding the list call

The other option is to make the field calculated. I.e since the call is comprised of data already found in the table, we can quickly create it as we list it out:

  • Adding a transient field

  • Adding the list call and make sure the transient field is set

In this scenario many developers might do the latter, as the former is more invasive and more prone to error. After all, developers should fear making changes to the database.

When the changes has been made, the model and service looks like this:

@Entity
@Getter //Lombok - Generates getters
@Setter //Lombok - Generates setters
public class Student {

    @Id
    @GeneratedValue(strategy = GenerationType.AUTO)
    private long studentId;

    private String firstName;

    private String secondName;

    private LocalDate dateOfBirth;

    private String ssn;

    @Transient
    private String studentIdentityKey;

}

public class StudentServiceImpl implements StudentService {

    //A DAO (Data Access Object) is responsible for getting data from an database
    private StudentDao studentDao;

    public StudentServiceImpl(StudentDao studentDao) {
        this.studentDao = studentDao;
    }

    @Override
    public Optional<Student> findStudent(Long studentId) {
        return studentDao.find(studentId);
    }

    @Override
    public List<Student> listStudents() {
        List<Student> students = studentDao.list();
        for (Student student : students)
            setIdentityKey(student);
        return students;
    }

    //This is where we are setting our transient field
    private void setIdentityKey(Student student) {
        student.setStudentIdentityKey(
                student.getStudentId() + "-" + student.getSsn() + "-" + student.getDateOfBirth()
        );
    }
}

This example might seem a bit arbitrary, but believe me it happens. If it wasn’t a thing I’ve seen or experienced I wouldn’t be writing this post right now.

The most obvious issue is probably mutability. We’re taking an object and we’re changing it. This might not be the worst offender when it comes to mutability. I am a big proponent for immutability and believe the benefits it provides is valuable. By adding transient fields we are forcing mutability on our models.

The second, and bigger, issue comes later when the application scales. One thing is having one transient field, but as with many other things, when we do a quick fix once, we probably accept doing the same quick fix again. While it might seem like a slippery slope argument, I have seen it happen on multiple occasions in different teams with different technologies. When the number of transient fields grows it quickly becomes difficult to know when a value is set or not. In our example the list call returns with the value, but the find call does not. Both methods return the same object. That means we cannot trust our model, nor our interfaces to tell us what is actually going to happen, we are forced to look at the implementation.

As the application grows we might introduce logic that either relies on the transient fields being set, and some relies on them not being set. This might seem like logic which should never exist, but I find that situation which allows for such traps in the first place should be reconsidered. 

Especially in Java we often work with interfaces. This makes it easier to write unit tests and mock, as well as it is the general way we work with dependency injections. Here is the “StudentService” interface:

public interface StudentService {
    Optional<Student> findStudent(Long studentId);

    List<Student> listStudents();
}

By looking at the service it is impossible to tell which of these two methods includes “studentIdentityKey”. By looking at the “Student” class we cannot see when the “studentIdentityKey” field is set. We can only determine when a field is set by reading the service implementation. What we have achieved with transient fields is that the values within the “Customer” class is not a pure representation of what is in the database. Instead parts are defined by the database, and other parts are defined by the service class. This pattern quickly becomes difficult to track when we get multiple transient fields.

The Solution(s)

Use entity relationships

Sometimes these transient fields are put into the code as a way to link objects together, but to have complete control when something is loaded rather than trusting the ORM framework. I get the thought, but this is not the way. I am myself very skeptical to connect objects together and have the ORM framework do its thing. Experience has shown me that it leads to “load one object, get the whole database” type of situations. (Note to self, write a post about why I am skeptical of in-object relational mapping).

Use built in features in the ORM framework

ORM tools have to encounter a vast amount of scenarios, especially the big ones. It is very likely that someone has had the same issue and the ORM framework made something to make solve that thing happen. For our example we could probably have gotten away with using @Embedded and @Embeddable and avoided the whole issue. 

Hibernate for example has the @Formula which can do a bunch of things for us.

Use wrapper objects

Sometimes we simply cannot get away from having to calculate some values. Business logic can be tricky and some scenarios might be messy to solve on the database side, or through the ORM framework. This is the time where transient fields might be appropriate, but I still think we can get away from it and be mutable. 

Such a implementation would look something like this:

@Entity
@Getter //Lombok - Generates getters
@Setter //Lombok - Generates setters
public class Student {

    @Id
    @GeneratedValue(strategy = GenerationType.AUTO)
    private long studentId;

    private String firstName;

    private String secondName;

    private LocalDate dateOfBirth;

    private String ssn;

}

@Getter
public class StudentListElement {

    private Student student;
    private String studentIdentityKey;

    public StudentListElement(Student student, String studentIdentityKey) {
        this.student = student;
        this.studentIdentityKey = studentIdentityKey;
    }
}

public class StudentServiceImpl implements StudentService {

    //A DAO (Data Access Object) is responsible for getting data from an database
    private StudentDao studentDao;

    public StudentServiceImpl(StudentDao studentDao) {
        this.studentDao = studentDao;
    }

    @Override
    public Optional<Student> findStudent(Long studentId) {
        return studentDao.find(studentId);
    }

    @Override
    public List<StudentListElement> listStudents() {
        return studentDao.list()
                .stream()
                .map(student -> new StudentListElement(student, getIdentityKey(student)))
                .collect(Collectors.toList());
    }

    private String getIdentityKey(Student student) {
        return student.getStudentId() + "-" + student.getSsn() + "-" + student.getDateOfBirth();
    }
}

In general I am not a fan of using wrapper objects, but they are better than transient fields (IMHO). Using a wrapper object in this scenario allows us to:

  • Explicitly declare that certain extra values will be provided

  • Not pollute our model with values which is just not always required

  • Remove confusing about when a value is set or not

  • Return an object which accurately reflects the content of that object. We can easily spot the difference between “findStudent” and “listStudent”

  • Remain immutable, as we do not change the student after fetching it

The downside is that we are making a new object just for this purpose. I admit that this is not a perfect solution, but I believe it is better than the downsides transient fields brings.

Move the transient operation to where it is actually needed

Sometimes transient fields exists simply due to laziness. Rather than including the logic of figuring out “studentIdentityKey” where it is supposed to be used, the logic is instead pushed closer to the database layer.

In our example we were supposed to interface with a third party system which needed the ID. This means that the logic which figures out the ID probably belongs with the code which is talking with this other system.

Conclusion

I can’t think of a reason why we ever actually need to use transient fields, nor can I think of a time where transient fields give more value than the other solutions provided. Maybe there are cases where transient is the only possible choice where the suggested solutions above cannot work or would be the worse option. I’d love to see a case where transient fields are the better solution.

If one has to use aggregated one should make sure that the transient fields are always included no matter the call. If we ever allow transient fields to sometimes be set and other times not will only cause confusion.

This post was inspired by working with and seeing multiple projects in my career which abuses transient functionality. I’ve seen instances where transient has been used because one fears that doing it any other way might cause issues in other places, often related to performance. I have seen instances where one has used transient where other approaches would be way more suitable. When transient fields are used they become very difficult to change afterwards and in my opinion it is best to avoid whenever possible.

Just because something exists in a library, doesn’t automatically mean one should use them, Consider how any feature can impact the application in multiple different ways. By functional standards transient fields works perfectly fine, but from a readability and maintainability point of view it is terrible. My advice when it comes to transient fields is simply not to use them.

Previous
Previous

A brain dump on schema updates

Next
Next

Useful tools: USE Together - Remote pair programming made easy