Designing for Evolution
The first step to designing an evolutionary architecture is to design our software so that it can evolve easily. Yes, it’s easier said than done.
What does it mean to design a system that can easily evolve with the continuous requests from the client? Let’s start with the classic legacy code. Often, I hear the term “legacy code” associated with an old application that needs to be rewritten using a new framework, new technologies, and current languages to adapt to new requirements. Nothing could be further from the truth. Code can be considered legacy even if it was written last week. What characterizes legacy code is that the application in question is important to the company and profitable, but its maintenance or evolution is complicated due to the high dependency of objects within it, to the point that modifying one requires changing the entire project, which, of course, scares everyone.
As always, Domain-Driven Design comes to our rescue. The best way to make an application easy to maintain is to divide it into many small modules, not necessarily microservices, each associated with a specific business area. For this, we’ve seen how strategic patterns, particularly Ubiquitous Language and Bounded Context, are highly supportive. But are they sufficient? The Bounded Context still represents a fairly large area of our business model; therefore, the risk of handling something complex remains. Indeed, E. Evans has presented a series of tactical patterns, much more oriented towards the development of the application than its understanding and division. We won’t list all the tactical patterns, which you can find in Evans’ book, but we’ll focus on those that interest us most in creating an application that can last over time and be maintained without too much anxiety.
Aggregate
Among all the tactical patterns, the one we are interested in analyzing is the aggregate. The aggregate is a simplified representation of the business model we aim to solve. It is a complex and fundamental pattern within Domain-Driven Design. It consists of one or more entities and one or more value objects.
Described like this, it seems almost simple to design an aggregate, but in reality, it is one of the most complex phases of the development process. Being a simplified representation of the business model, it must stay as close as possible to the real business model. Without delving too much into philosophical discussions, our software will solve the business problems in question as long as the two models do not diverge too much. That is, as long as any request from our client can be implemented in our code.
So, are we back to square one? Absolutely not. The aggregate is not just a container of properties that describe reality but implements all the methods necessary to describe its behavior. Everything related to the particular problem it solves passes through it and no other point within our application. At this point, it is easy to understand that any changes made to this aspect of the business within our code only involve this aggregate, without affecting other parts of the code. Each aggregate is a small isolated world that must certainly communicate with others, and we will soon see how, but it does not depend on others to make its decisions and validate them according to the implemented business rules, or invariants.
Communication within the Aggregate
To ensure this isolation, the only way we can communicate with objects within an aggregate is through a very particular entity called the Aggregate Root or Entity Root. In the classic example of a sales order, composed of a header and lines, we cannot freely access the lines to modify them. We must pass through the order header, indicated as the Aggregate Root, which will call the methods exposed by the lines to make the requested changes. Upon completion, the Aggregate Root, i.e., the order header, will verify that the business rules, or invariants, have been respected. For example, that the new order total does not exceed the customer’s budget.
Facilitating Maintenance and Data Consistency
Concentrating all behaviors in one part of the code is strategic for its maintenance. Reducing the aggregate’s intervention area helps us maintain uniformity and consistency around the problem it addresses and reduces the complexity of our code to the problem in question. And what about data consistency? The fact that the Aggregate Root checks that the business rules are respected with each state change ensures consistent data. Not by chance, the aggregate must always be in a consistent state from its first instance. Technically speaking, an aggregate with a parameterless factory is not acceptable in a reputable domain! And here the first sizing problems begin.
Sizing the Aggregate
If we make an aggregate too large, we introduce more complexity and more concurrency. It will be easier for different requests to be handled by this aggregate, which will have to manage many invariants, the business rules. Thus, the resulting code will not be easy to maintain because it is necessarily complex. However, we will likely gain in terms of consistency: by expanding the intervention area, many more data will remain consistent with each state change.
Conversely, if our aggregate is small, we will lose in “consistency” but gain in complexity and concurrency.
A practical example? Think about booking a seat in a multiplex cinema. If our aggregate is the daily schedule, we will have strong data consistency because we will have everything necessary within the aggregate itself, but we will also have a lot of concurrency. The aggregate will have to manage the rooms with scheduled titles, available seats, times, etc.
If, instead, our aggregate is the room, then everything will be much simpler. The only set of commands we will have to manage will be those related to seat reservations for a screening slot. Could we go even further? Certainly, we could promote the row to an aggregate.
What is advisable to do? What is the best choice? Obviously, it depends! While there are no aggregates too small, we must not overdo it and end up with an unnecessary workload.
Domain Event
Now that we understand how to isolate business behaviors to create more maintainable applications, how do we make individual aggregates communicate with each other?
If the aggregates all belong to the same Bounded Context, it’s pretty simple. Each aggregate has its Aggregate Root, responsible for maintaining relationships with the outside world, so a Domain Service that facilitates their communication will suffice. But when an aggregate needs to inform other aggregates outside its Bounded Context that its state has changed, how does it do it?
At this point, another pattern comes into play: the Domain Event. Before delving into this pattern, let’s briefly understand what it is. A Domain Event has two main characteristics. First, they express facts that have already happened, something that has already occurred in the past, and for this reason, the name of a domain event is always expressed using the past participle. The second characteristic is that a domain event is immutable! It cannot be modified under any circumstances, precisely because it expresses something that happened in the past, and like in real life, what happened cannot be changed. At most, we can compensate with some corrective action.
Domain events belong to the domain model, and here we begin to get to the heart of the matter. They cannot be distributed to just anyone. A question to determine if an event is a domain event could be: “Is this event of interest to domain experts?” If the answer is “yes,” then it is a domain event.
Now that we have clarified the characteristics of domain events, we might wonder why they were not present in Evans’ book. The real answer, of course, I don’t know! Certainly, events become important when transitioning from monolithic to distributed systems, that is, when entering the world of event-driven architectures. In a distributed system, where the various parts of our system must not be mutually coupled, message exchange becomes the solution. The advent of microservices and this type of architecture has decreed the success of Domain Events. When discussing events within Domain-Driven Design, one cannot help but think of the CQRS+ES pattern introduced by Greg Young.
CQRS
CQRS (Command Query Responsibility Segregation) is the evolution of the CQS (Command Query Separation) pattern by Bertrand Meyer, the author of the Eiffel language and one of the fathers of OOP, who theorized the clear separation between an action that modifies data within the database (Command) and one that reads the data present in the database itself.
The first is authorized to modify the state of a record, but does not necessarily have to return a result, except to indicate whether the operation was successful or not. The second, on the contrary, must not make any changes to the data, but must simply return the data that corresponds to the query. If no modifications are made, the query should always return the same result (idempotent).
Greg Young’s innovation to this pattern was the clear separation of the database into two phases: writing and reading data. The starting point is that if a database is optimized for writing, then it will not be optimized for reading, and vice versa. As often happens, a picture is worth a thousand words.
To ensure we’re aligned on the basic principles of this pattern, let’s explicitly clarify the meaning of above figure before proceeding.
- A Command is sent from the Client to request a state change in our Aggregate. An example might be “CreatePurchaseOrderFromPortal.” The command’s name should clearly state the business intent.
- The Command is handled by a CommandHandler. This component is responsible for invoking the methods described within our Domain Model (essentially our Aggregate). Remember, the Domain Model contains not only the properties of our object (otherwise we would have an Anemic Domain) but also the implementation of business behaviors, specifically the methods to manage its state changes.
- The Domain Model takes charge of the Command and has the authority to reject it if the requested action violates business rules. For example, if the order total exceeds the Customer’s budget. Upon completing the action associated with the Command, the Domain Model emits a Domain Event. Note that if your system is fully asynchronous, the Domain Model must emit a Domain Event whether the action succeeds or fails; otherwise, your UI will never be notified of the outcome.
- The emitted Domain Event is persisted in the EventStore. It is not strictly necessary to save the event in the EventStore, but if we want to work with event-based aggregates, it becomes essential; otherwise, we wouldn’t be able to reconstruct its state with each state change request. Similarly, it’s not necessary to have two separate databases for events and the Read Model. To keep things simple, a single database can be used for both purposes.
- The same Domain Event is published so that all services within the Bounded Context that are interested in receiving notifications of this Domain Model’s state change can subscribe to update the ReadModel, the model used for queries from the Client.
- Any query from the Client will act on the ReadModel, which will contain different projections based on the needs of our UI.
Domain Event
The Domain Event is, in all respects, part of the domain model and represents what has happened within the domain itself. Practically, instead of simply saving the state of our aggregate, we save the individual state changes, expressed through the Domain Event. Each time we need to interact with the aggregate to apply a new command or decide to reject it because it violates business rules, we read all the events related to this aggregate, apply them sequentially, until we achieve the current version of it. As I often like to say, it’s like moving from observing a photograph of our aggregate – the current state – to watching its movie, which tells us how it reached this state.
Not surprisingly, Domain Events, as we’ve already mentioned, are identified with past-tense names because what they express has already happened! No other element within our system can question them. This behavior guarantees us flexibility and peace of mind when implementing new behaviors or modifying existing ones within our code. All business logic is encapsulated within our Aggregate and not distributed across various layers or services scattered throughout our codebase. Therefore, once the modifications are made and our tests continue to pass, we can confidently release the new version, knowing that we won’t break anything outside our Bounded Context.
Anyone who subscribes to a Domain Event accepts it as is, without incorporating decision-making clauses in the code.
What a Domain Event is Like
But what is a Domain Event like? It is a DTO (Data Transfer Object) that contains the properties that have changed within our aggregate, and an implementation in C# could look like this figure
Let’s analyze this code. Firstly, it’s a sealed class, which means that no one can inherit from it and expand it within the code. All its properties are readonly. The reason is straightforward. To be sent to interested parties, our Domain Event will be serialized during sending and then deserialized upon reception. During these phases, no one can intervene to modify its content. We must ensure that what leaves our domain arrives intact at its destination. The keen-eyed may have noticed another important characteristic: the semantics used within the Domain Event. It’s noticeable that, to express the magnitudes of the properties, primitive types are not used, but customized types that reflect the language of the business in question, namely the Ubiquitous Language. If I define the order number with a string type or UUID, only the technicians will be able to understand and validate or not my Domain Event. However, if I use the custom type Order Number, all team members will understand what we are talking about, breaking down all communication barriers between business and technical people, reducing the possibility of misunderstanding, and therefore of producing code that is not exactly in line with the specifications, to a minimum
Modeling the Aggregate in Relation to Reality
When we model our Aggregate, the business model we’re about to address, we want to stay as faithful as possible to reality, and having a common language, as we’ve already stated and repeated, is crucial. Of course, we’ll need to simplify concepts because replicating a real model within our code is impossible. To avoid diverging too much, it’s essential to create aggregates as small as possible to remain as close to reality as possible. It’s a bit like considering the entire business process as a complex geometric figure, of which we’re unable to calculate the surface unless we divide it into infinite parts that we integrate to obtain a final result that’s not exactly precise (we’ll never be able to replicate reality), but certainly very, very similar. Similar enough to schematize and solve the problem itself.
Finally, but not less important, every Domain Event contains the ID of the aggregate to which it belongs and the optional IDs of the operations associated with it. Since these properties are repeated for every Domain Event, it’s good practice to group them into a Domain Event class. This helps us both simplify the writing of all domain events and understand, just by looking at the code, if the object we’re observing is a Domain Event!
Who can subscribe to a Domain Event?
At this point, we need to ask ourselves who are the parties interested in subscribing to a Domain Event, who is authorized. Since it’s an element of the Domain itself, it cannot, and should not, leave our Bounded Context! In the image related to the CQRS/ES pattern, we saw that the Domain Event is used to update our Read Model, which is the data portion that users of our application will query to make decisions. In a distributed system, this information can be redundant and replicated in different Read Models, each in different Bounded Contexts, and potentially in different microservices, i.e., different databases. What tool do we use to update all these Read Models that do not belong to our Bounded Context? The lazy programmer’s answer is: “The Domain Event,” and it’s here that our system, in the blink of an eye, transforms from Distributed to a Big Ball of Mud.
Sharing a Domain Event with other Bounded Contexts constitutes a semantic error because it involves sharing information expressed in a language typical of one Bounded Context with another that may not share the same language. Additionally, there may be information within the Domain Event that should not leave this context to be shared.
However, there’s also an error from a more implementative standpoint. Referring back to what was mentioned in part 4 of this series about Evolutionary Architecture, when two pieces of code share something, we encounter coupling, and therefore, we lose the ability to evolve independently. As mentioned before, aggregates are a representation of reality, a simplified schema of the business problem to solve.
Business problems, being human-related, are inherently mutable, and as such, sooner or later, they will evolve to a point where they can no longer be represented by our aggregate, which we will need to evolve to meet the new model. It’s not an error in initial modeling or premature optimization. We simply have to accept that things change! I know many of you are seeking references within Taleb‘s concept of the antifragility of systems, and indeed, it’s precisely that.
We talked about it when we introduced Evolutionary Architectures; the challenge isn’t to build a perfect system but a system capable of evolving, adapting to continuous changes, modifying even its small part to meet new needs. Perhaps it becomes much clearer why sharing our Domain Event goes exactly against this direction.
Before being considered a semantic error, it should be analyzed as an architectural error. If the contract with which I exchange information with other models is the same as I use internally to maintain consistency between Domain Model and Read Model, then I cannot afford the luxury of modifying my aggregate at will because doing so would change the communication contract with other Bounded Contexts. I would be forced to notify them, and they would need to be updated and published, in the case of a microservices architecture, concurrently with the modified Bounded Context, otherwise, communication incapability would arise due to the variation of the shared Domain Event. Suddenly, we have brought the complexity of distributed systems into our house, with the limitations of the coupled monolith. In summary, we have created the perfect Big Ball of Mud.
Integration Event
So how do I notify the “rest of the world” that the state of my Bounded Context has changed? By using an Integration Event, precisely an event that from a technical point of view is absolutely identical to the Domain Event but contains information that can be shared, expressed in a language common to all Bounded Contexts of my system.
But if the data of the Domain Event and the Integration Event are exactly the same? It doesn’t matter; I still publish an Integration Event because surely, sooner or later, the Domain Event will change. We can afford to do this because we will be sure that it is consumed only within our Bounded Context, and we can continue to emit the same Integration Event externally, without interrupting the sharing of information with the rest of the system.
In this way, we will have guaranteed the independence of the parts of our system, which can evolve freely without waiting for the approval of others.
Versioning
What happens if, due to a new request, our Domain Event needs to change? For example, if we need to add or remove one of the properties?
First of all, we need to understand if it’s truly a variation, and therefore simply adding properties, but the expressed meaning of our Domain Event will remain unchanged; in this case, we are truly facing a modification of it, a new version. If the changes are so invasive that they alter the semantic meaning of our Domain Event, then we are faced with a new Domain Event, probably with a different name, capable of communicating the new business intent.
With that said, it must be emphasized that a Domain Event should never be modified! Once again, the reasons are both semantic and technical. Semantically, as we’ve learned, the Domain Event expresses a business concept, and we want this concept to remain unchanged in our EventStore and Domain Model. Technically, we’ve mentioned that the Domain Event is first serialized and then deserialized; therefore, if we modify its structure after saving it, we risk not being able to retrieve events from the EventStore, and thus reconstruct the state of our Aggregate when we need to deserialize the saved events.
How do we resolve this situation? By versioning the events. We will have versions like V1, V2, … Vn of the Domain Event that has changed over time. Remember, provided that its meaning, from a business perspective, remains unchanged; otherwise, we’ll have to rename it. In case we have multiple versions of the same event, we’ll need to foresee in our code all the handlers for all versions or a handler capable of capturing them all, populating, where necessary, missing properties with standard values. Handling versioning is a complex issue, and an excellent book [2] on the topic has been written by Greg Young for years, which is certainly worth reading.
Conclusions
As a lazy programmer, if I have to share information among different Bounded Contexts, belonging to different microservices, I’m very tempted to do it through something that already exists in my code, in this case a Domain Event. I just have to add another event handler to the same topic, and that’s it.
Unfortunately, I’m not just making a semantic error, meaning I’m sharing information written in a language typical of one Bounded Context with another Bounded Context that may not necessarily know this language… otherwise, we wouldn’t talk about the Ubiquitous Language! Not to mention the fact that in a Domain Event, being an object that belongs to the Domain Model, there may be sensitive information that must not leave the boundary of the Bounded Context itself. Why is it not just a semantic error? Because the moment two functions share an object, any object, they become coupled to each other, and when I have to modify one, I’ll be forced to modify the other as well, despite the independence of the microservices hosting them. Try now to publish your microservice, following the implementation of a new feature that has modified the Domain Event, without notifying all the respective subscribers to it!
Even if the information to be exchanged, at first, is exactly the same as that contained in the Domain Event, create an Integration Event and share that. With a little effort, you will have gained the freedom to intervene on your Domain Model as you wish, without having to be accountable to anyone, and as they say, freedom is priceless!
References
[1] Vaughn Vernon, Implementing Domain-Driven Design. Addison-Wesley Professional, 2013
[2] Greg Young, Versioning in an Event Sourced System
https://leanpub.com/esversioning/read