by • June 24, 2021
A significant part of breaking a monolithic application into microservices involves a redesign of how the application manages data — usually through a data management strategy called “information hiding.” In this respect, breaking a monolith into microservices isn’t just about establishing clear separations between the code for different services. It’s also about creating clear separations between the data that each microservice interacts with.
Why information hiding? Because the massive centralized databases behind monolithic applications hinder the most important benefits of a microservices architecture, such as:
Below, we’ll help you understand why centralized databases are a problem for the benefits of microservices, how data hiding helps, and we’ll also explain how the DreamFactory platform assists developers to achieve information hiding with ease.
Sign up for our free 14 day hosted trial to learn how.
As useful as relational databases are — in carrying out transactions, queries, joining across data types, etc. — a large centralized relational database can also be problematic for microservices. The biggest problem relates to the habit of collecting more and more data in centralized databases until they become — in the words of Sam Newman — “too big to fail.”
While it’s true that monolithic app developers are accustomed to wrestling with large, centralized databases, microservices developers find centralized databases particularly challenging for the following reasons:
When a database is “too big to fail,” system data becomes highly centralized, and developers need to put more and more energy into working with it. That’s because a single change to the database can impact multiple systems. In time, working with these massive databases becomes so complex that developers avoid making changes because of the significant risks involved.
This is particularly troublesome when a microservices-based system depends on a centralized database that numerous microservices connect and interact with. It creates interdependencies. Changing one part of the database to accommodate a single microservice can negatively impact other services across the system. Now, microservices developers need to consider carefully how the smallest database change will affect other systems — and collaborate with other microservices teams to ensure that everyone agrees on a common strategy for managing the shared database. In this respect, true developer team independence can no longer be achieved.
Ultimately, a centralized database pulls too much development energy towards it, thereby invalidating the greatest benefits that come from a microservices-based system. As Newman puts it: “The data should be working for us, but all too frequently, we find that we are working for the data.”
Another challenge associated with a large database is the safety and security of the information it contains. With everything concentrated into the same database, it doesn’t matter how many safety precautions you take. There is always the risk of making a simple mistake that leaves a backdoor open — where hackers can access all of the information.
According to Newman, it doesn’t make sense to concentrate all of our sensitive data in the same place like this because it represents a security risk:
“It’s tantamount to saying, ‘We’ve got all of our valuables here, and we put a big sign on it that says, STEAL ME.’ If anyone breaches that database and gets a hold of your crown jewels, everything is gone! A large amount of the data breaches we see are about one database being accessed. If all of your most sensitive information is in one location, and that one location is breached, all of your data walks out the front door — and that is a real issue.”
Finally, large, centralized databases make it more difficult to satisfy data compliance standards — like HIPAA, GDPR, and PCI. This is especially true when we save sensitive data in the same database as nonsensitive data. Consider an organization that needs to satisfy PCI Level I compliance standards. The organization has a central operational database that mostly stores nonsensitive data; however, the database also stores a small amount of consumer credit card data. Whenever this system issues a read/write request — even if it doesn’t touch any sensitive data — the request will trigger a PCI auditable event (simply because the sensitive data is lumped together with the non-sensitive data).
Considering the above challenges, breaking the centralized database and establishing clear separation of data across microservices — on a need-to-use/access basis — starts to sound like an excellent idea that could support the most important benefits of microservices, right? Thus, we come to “data hiding.”
A single microservice contains all of the code and data it requires to run autonomously. This code and data is usually packaged into a container (like a docker container image). In this respect, the microservice “hides” its most important data and libraries within itself (hence the term “data hiding”). From there, the microservice uses a network protocol (like a REST API or KAFKA) to communicate with the other independent processes that make up the microservices-based system.
To better understand the concept of data hiding, let’s consider the relationship between an upstream microservice and a downstream microservice — where the downstream microservice contains specific data that the upstream microservice wants to interact with.
There are two ways you can allow this interaction:
The first way is to give the upstream microservice access to go inside the downstream microservice and interact with its data and objects directly. This is dangerous because there is the significant risk that the upstream microservice will make changes to the downstream microservice — and those changes will cause problems for the other services that need to interact with the downstream microservice. It will also expose data to other services and users that you may need to keep secret for security and compliance reasons.
These risks are particularly evident when multiple services interact with the same centralized database. In these circumstances, it’s difficult to police clear boundaries that determine what data is safe to change — and what data isn’t safe to change — within the database.
The second way is to use a data hiding strategy, where you establish a clear separation between:
By hiding the “unchangeable data that could break other services,” you can give the upstream microservice access to the first category of data that is safe to change, while keeping the other data secure and inaccessible. With this strategy, you want to “hide” as much data as possible behind the boundary of the microservice. By separating data like this, consumers of the downstream microservice cannot make changes that break external microservices.
With data hiding, it’s important to initially expose as little as possible. You can always expose this data at a later time, but once you let it out of the box, it’s a part of the contracts in place with upstream/downstream services — and re-hiding the data could break those services. So, be conservative whenever it comes to the data that a microservice exposes.
By clearly defining the API endpoints that a microservice exposes, developers get to control the data consumers can access (and what they can do with that data). By establishing a clear separation between changeable and unchangeable data like this, the entire microservices system is dramatically easier to manage.
For example, imagine you’re a developer working on an update to the Customer Service in a microservices based system. You know that you can make changes to the “hidden” parts of the Customer Service without breaking any upstream/downstream components that interact with it. At the same time, when the Customer Service exposes data or functionality through an API endpoint, you know that these things need to stay consistent — or you risk breaking other system components (so when it comes to exposed data, you need to be more careful and collaborate with others before changing it).
As a developer, hiding data supports the most important advantages of a microservices-based system:
Data hiding frees you to become as creative as you want within the confines of the hidden data and code — without collaborating with the teams that own other microservices. You can develop and deploy updates more independently with much less chance of negatively impacting other parts of the system. As a result, teams can work independently of each other and much more efficiently when they focus on the hidden elements of their systems.
Data hiding prevents certain data from ever leaving the service boundary of a particular microservice. By clearly defining what data an API endpoint exposes, developers can keep the most sensitive data in their systems absolutely secure to reduce the threat of data breaches. This makes it easier to adhere to government and industry data compliance rules and standards as well.
As you can see, it becomes extremely advantageous to break apart large centralized databases — into their respective service boundaries — when refactoring a monolith into microservices. The more data you can hide within the microservice, the more agile, resilient, and secure a microservices-based system becomes.
An advanced API gateway like DreamFactory can help developers realize the benefits of data hiding. DreamFactory supports data hiding in two important ways:
DreamFactory’s RBAC (role-based access control) manager supports row-level filtering. This allows you to apply specific filters to RBAC, which limits a client’s access to the specific rows associated with a specific customer ID — similar to a multi-tenant database.
DreamFactory’s scripting manager can intercept data en route from the data source to the client, and transform the data in any conceivable manner. In doing so, the scripting manager can obfuscate or entirely remove sensitive data from the response.
At DreamFactory, we’re passionate about removing the cost, time, and skill barriers from the process of developing microservices-based applications. With DreamFactory’s automatic API generator and ready-to-go integrations, you can instantly create and publish fully Swagger-documented REST APIs for virtually any database or service, and integrate new features into your projects in minutes.
Want to learn more about how DreamFactory can dramatically reduce the time to market and development costs for your projects? Schedule a call with our team and start a free hosted trial of the DreamFactory platform now.
Fascinated by emerging technologies, Jeremy Hillpot uses his backgrounds in legal writing and technology to provide a unique perspective on a vast array of topics including enterprise technology, SQL, data science, SaaS applications, investment fraud, and the law. Contact Jeremy at [email protected].
Join the DreamFactory newsletter list.