The Issue of a monolithic architecture

We saw in my previous article my first architecture for the FyS MMORPG. We can easily see that I was doing most of the work in a single type of server which was called World Server. Despite other server types (Authentication / Fight Server / Database Server), all the game logic resided in the World Server implementation, which makes it one of the so-called monolith.

Idea of Warhammer monolith reference shamelessly taken from Francois Teychene

 

The pros of this approach:

  • One of the good thing about the monolithic approach, if architectured correctly by using known design patterns, is that the project is globally easily understandable.
  • A new feature requiring multiple modules are easier to put in place as the modules are all in the same application (we will see in the cons that it’s not obviously only a good point). A good example of this would be adding a trading item feature, if everything is in the same application, you have access to the positions of the players (movement module) and their inventories (inventory module). Adding the feature shouldn’t be too hard.

The cons of this approach:

  • Let’s take back the example of trading items between players, as previously said, it would be quite easy to implement. But it would increase the complexity of your code, and if you don’t pay attention, you could mix some data owned by the inventory module (inventory of players) into your movement module (positions of players). By doing so, you make your code tightly coupled and harder to maintain for the next developer coming after you implemented your feature.
    We can say that a component designed to be independent in the first place easily tends to have links with other components in a monolithic application.
  • A monolithic application is usually stateful, and all the game logic is inside the same application, which makes it hard (or even impossible) to scale.
    Scaling horizontally (adding new instances of your server to bear an increasing load of request) wouldn’t work as each server would have to manage and share states. A synchronism mechanism could be put in place to synchronize those states. But it would increase the complexity of the code a lot and may even greatly impact the performances because a single action would need to be propagated via the network to each member of the cluster.

Why did I end up in this pitfall?

When I started my MMORPG server, I was obsessed with the idea of performance. I was freshly going back into the world of C++ and wanted to play again with threads and the modern C++ features.
And this was my first mistake, I was not trying to make the fastest server I could, and to do so I directly told myself: “I shouldn’t have too many communication through the network” because it’s slower than doing everything in-memory.
It is not wrong from an objective point of view, but by doing that I lost the big pictures of my MMORPG. What really matters? Is it that the player gets an answer in 10ms instead of 50ms-100ms? Or is it that I could have potentially infinite players playing the game?
The answer should be obvious, it is even truer when looking at the gameplay planned for FyS (a turn per turn fighting system, and a world where the collision between the players isn’t a requirement).

In my opinion, what truly matters, is being able to handle the more player. Basically, if the server is not the fastest, but is playable, it’s enough. I have been hooked into my thirst of learning how to do efficient code, which is clearly not a bad thing. I learned how lock-free programming work for example, which is why I don’t regret “losing” my time on the previous architecture. But overall, I don’t think it was the way to go for an MMORPG server.

A more modern solution

What is microservices?

Here I am going to do a quick and naive introduction to the microservice principle as it has been done everywhere for the last years with the help of the following picture.

In the ’90s we were doing spaghetti code, copying code around and making a hardly understandable maze of functions call.
Then we began taking care of the architecture of our software and ended up doing layers, using some design patterns, for example, the famous MVC (Model View Controller) for graphical interfaces. It is the lasagna code, a monolithic architecture with a virtual split of the responsibilities thanks to layers. But in practice, features usually use data or logic of multiple layers, and this makes the code globally coupled and hard to maintain/improve.

The amount of work on the applications increasing in an exponential manner, another flaw of the monolithic lasagna code appeared, it hardly scales. In other words, it is hard to improve the number of requests a monolithic application has to deal with because of how coupled the features are. And this is how we came up with another type of architecture, the ravioli one, microservices.
The goal of such architecture is to split features in stateless little application that usually has one responsibility. By doing so, it becomes possible to scale indefinitely (theoretically) by adding new instances of this application. For example, if thousands of person are using a specific feature of your application, it is possible to duplicates the microservices responsible for this feature in order to make it able to handle the load.

The known issues when implementing microservices

The theory behind the microservice approach is very good but implies some mindset changes. Because it often happens during migration from monolithic to microservices (lasagna to ravioli) that we take our monolithic application and use the existing layers thinking “it’s already well split” to create the micro-services containing those logics with barely any modifications. Making things worse and keeping strong dependencies between the services newly created.

Believe me, when I make a dish of ravioli, I never do lasagna that I cut into pieces to stuff the ravioli.

Any cook out there

The same way a cook doesn’t begin by doing lasagna to stuff the ravioli. We shouldn’t make a monolithic application and split it afterwards.
I don’t have the pretention in this post to say how a micro-service should absolutely be done. But this is the kind of issue that has to be kept in mind when designing its application with a micro-service architecture. This requires a change of mindset when making a feature, which is not easy… But it’s fun and we are going to do our best on the FyS MMORPG.

Another “issue” is that usually, microservices need a very big technology stack in order to be working with it (Docker, Kubernetes, Istio, and literally dozens of others). This stack is used for logging, tracking and monitoring a call from its source and in which service it passed through. Those things are very simple in a monolithic application but become very hard with micro-services running everywhere.

I changed the architecture of FyS with this in mind. And it is a very interesting subject. I will present you right now the solution I ended up with, it is certainly not going to be the final one (as I will learn about my mistakes while doing it) but it is going to be closer to the scalable result I am expecting of the FyS project.

You can find the code of FyS (this is the current repository and will be the one to evolve in the future about the FyS project) HERE.

Splitting the lasagna into ravioli

For this project, the split seems to be pretty easy to do as I particularly thought about the concurrency issues I could have. Which actually made me go forward a strict separation between the responsibilities of my threads. Basically first, I could say that every thread I had before is going to become a different process (a microservice).

I will then I would be able to split even further those service into smaller one. But one thing I needed to be very careful about is to not create dependencies between my components to avoid the above-explained issue in case of modifications in tightly coupled microservices.

Network library

Before speaking about how I split the service for the MMORPG, it is mandatory to talk about what network library I am planning to use. I planned on using Boost::Asio as I did in my previous architecture and it worked fine. One issue with this choice is that it is required to manage connections by hand which is not a big issue with an architecture that has a limited set of servers that interact with each other.

But even with simple architecture, I encountered major issue to manage network error. For example when a server is crashing, after it restarts, it is required to manually handle the re-connection of the other servers to it.

It is basically why I tried other libraries that could easily manage connections, synchronization and error handling between servers. The library I decided to use is ZeroMQ, I won’t go into further details about ZMQ in this blog post as I think it deserves its own article.

What you need to know about ZeroMQ is that it manages asynchronous connections (the client can connect to a server that is not up yet) and that a single ZMQ socket can represent multiple connections and have very powerful built-in features. Thanks to that, we can concentrate on “how do I want my server to communicate with each other” instead of “how do I handle when a server is crashing”

The different services

In any professional environment, when working with micro-services, it is normal to use multiple technologies (Kubernetes, Docker, Asia, Graphana and literally dozens of other) that help you to manage your services which are often encapsulated into what we call containers. But I wanted to encounter the problematics (logging, load balancing…) that such system encounter without hiding them behind ready-to-use tools.

The concept of dispatcher

Because the number of services can change dynamically, the client has to connect to a specific server that is called a dispatcher. This server would then dispatch the request of the client to the service, in other words, this would be a proxy between the client and the service he wants to use, the dispatcher will also act as a reverse proxy in order to reply to the clients.

The advantage of developing its own tooling is that you can specialize it easily. Of course, as it is a “proxy” like server, a dispatcher shouldn’t implement complex and/or time-consuming tasks because it is going to be a pass-through for all the traffic for the service. But for some specific services, it could be required to route a client to a specific server for a given time (see Fight Service for an example), and this logic has to be developed at dispatcher level.

When a player connect to the authentication server, it retrieves a set of dispatcher on which it can connect to (to access different services) and the player will get a connection token.
Each dispatcher are going to check if the player is authenticated to the Auth server via the token.

World Service

This service has nearly the same architecture as the previous version of the FyS MMORPG which is explained into details into my previous blog post, for instance, the way it manages a smooth transition from one server to another has been kept. It is stateful and is going to manage the players moving for a specific portion of the overall world (that I call universe).

 

The dispatcher of the World Servers is communicating via a publisher/subscriber mechanism, each WorldService is handling a specific zone described by a code, and this code (on the above picture, ID1, ID2 etc…) is the channel on which the WorldService subscribe to.

A stateful microservice?
It’s a big contradiction with the principle on itself I know, and I don’t like that too. The way the WorldService is designed is more a distributed way (responsibility are split in a stateful manner). By doing so, it is not possible to have horizontal scalability (if thousands of player stands in the same zone, overloading the server, it won’t be possible to increase the number of instances to share the load).

To be honest, I really didn’t want to have to fetch for the position of a character each time it would have to move. Just fetching this data from a database (in-memory or not) would take milliseconds, plus to processing collision and replying to all nearby player to update the moving player position… Doing so would make the database access an important point of contention.
Maybe I overestimate the time it would take to do all that (and underestimate the database access issue), it is possible it is an early optimization that should not be done.
It is certainly something I will modify in the future in order to check how much database access would slow down the server.

Quest Service

A service responsible for the Quest management, validation of quest, accepting a quest from an NPC (Non-Playable Character), checking next quest available and so on…
In order to make this service stateless, it is required to access the player position, NPC position (preventing cheat when taking the quest) and player quests data (check that quest finished shouldn’t be available anymore and to know if the following quest in the suite is available for example). Those data are directly accessed from the database.

Chat Service

Stateless service that is going to manage the channel on which each player is connected and broadcast message following those channels. It only has to get the data concerning the channels from the database.

Fight Service

Specific service that is “to be defined” in terms of gameplay (will be subject to incoming articles). It is going to be an instance stateful service, which actually means that any fight service server can host “battle” from any World service server (without having to care about the zone it is responsible for). The dispatcher for the fight service is going to generate a specific token for a conversation with a specific server, the conversation created on one fight service server (beginning of a fight encounter) is going to be kept on the same server until the end of the conversation (end of the fight).

Doing things this way will make it possible to have this service horizontally scalable (adding service instances will reduce the impact of an increasing number of clients). This also keeps the efficiency of a stateful server, as each instance of a fight will keep states, like the life of the opponents, the order of turns, the timers, the item drops and so on.

Inventory Services

This split is more following the “state of the art”. Where the world service kept all the stateful part of the monolithic architecture (with its scalability issues), Inventory services work with some cache database (I chose Redis) and most of the services related to inventory are stateless, and if not, are following the example of the Fight Service.

  • Item Loot:ItemLootService handles for players that are retrieving objects dropped on the map, or if a player wants to drop an item on the map. To do so, it has to retrieve data related to the Map; the player position, the items around him.
  • Item NPC: ItemNPCService handles for buying or selling an object to an NPC.
    To do so, it has to retrieve data about the player position and NPC position (to prevent player selling NPC at the other side of the world) and the items this NPC can sell.
  • Item Trade: ItemTradeService handles for trading items between players, it will have some similarities with the Fight Service, like the Fight Service it is based on a conversation that is going to prepare the trade and acknowledge both agreements before doing the exchange of items. And like for FightService, any ItemTradeService can be requested for a trade (it doesn’t matter which one) but when one is selected, it is kept for the whole transaction.
  • Item Use: ItemUseService handles the usage of items by the player, it may require different data on top of the inventory of the player depending on the effect those items could have (which is not something defined yet).

Conclusion on the type of services

As we saw, we could split the FyS MMORPG services into 3 categories:

  • StatelessService: The “purest one” in my opinion, those services don’t need to store any state of any sort and get the needed information directly from a database. This makes it inherently slower, which is the reason why all the services are not following this pattern.
    It would be interesting to have more non-performance critical services in this category. The ItemTradingService for example, the “state” of the current transaction could be stored in a database, but again it would require benchmarking to know if this is a good idea or not.
  • StatefulService: The only service following this criteria is the WorldService managing the movements of the characters. We could say that it is a distributed system on the side that is responsible for the movement of players into a specific zone of the universe.
  • Instancesbased Service: We could say those services are “partially stateful” as they store the state for a specific conversation that is dropped when finished. The stateful part of the service begins when a conversation begins, which means that any server can handle a conversation for any client as there are no “cross-conversation” states.

The overview given by this article is quite general (particularly for the services explanations) and is going to be explained in more details in incoming articles.

Conclusion

Maybe something is wrong with the current implementation of my ravioli dish, but it seems to me that it is the way to go for an MMORPG. In my opinion, the service that may require drastic changes would be the “movement module” (World Service) as it is stateful and has the issue to not be able to scale horizontally on the same portion of the map. But maybe it would be interesting to keep this architecture and have multiple server managing the same area for a given set of players, this would make two players in the same zone but in different server unable to see each other… It is a possibility that we will explore in the future.

The micro-service approach is really an appealing approach as it forces me to re-think the project in a very compartmented way. For instance, if I want to re-implement the World Service differently, it shouldn’t affect any other services, and if it does, it means I did something wrong and I would need to improve the architecture until every service are independent?