This interview was done for our Microservices for Startups ebook. Be sure to check it out for practical advice on microservices. Thanks to Isaac for his time and input!
For context, how big is your engineering team? Are you using microservices and can you provide a general overview of how you’re using them?
Our current team is quite small -- only 5 engineers -- but we work on an open source deployment tool called Spinnaker that was open sourced by Netflix in 2015. Netflix uses it to deploy over 2,000 microservices roughly 4,000 times a day. We help enterprise customers achieve that velocity with microservices and Spinnaker.
We see deployments as a critical component to obtaining the value of microservices. If the overhead of deployments is high, breaking up a monolith into smaller components only increases overall complexity. Having the right strategy and approach to tooling, including deployments, is critical to success
Did you start with a monolith and later adopt microservices? If so, what was the motivation to adopt microservices?
We see many Fortune 500 customers already on the path to microservices by breaking up existing monoliths. The value to many of these organizations is flexibility to add a feature to one component without affecting another. The assumption is that the software development life cycle per feature is reduced. At this time, it’s hardly an argument as to whether to move to microservices but instead which services will be migrated and when.
How did you approach the topic and definition of microservices as a team/engineering organization?
We see constant discussions regarding microservices, and what they mean within an organization. Because every organization is moving toward this type of architecture each company is at a different understanding of microservices and best practices regarding them. Additionally, any instance of a microservices is constantly changing such that we’re brought into organizations to help them deploy their microservices with best practices in mind.
Did you change the way your team(s) were organized or operated in response to adopting microservices?
This is almost always the case and becomes the inverse of Conway’s Law, "organizations which design systems...are constrained to produce designs which are copies of the communication structures of these organizations." Instead of communication driving software design, now software architecture drives communication. Not only do teams operate and organize differently, but it requires a new set of tooling and process to support this type of architecture, i.e. DevOps. As monoliths get broken into smaller components, smaller teams and their roles become much more apparent. Engineers best skilled for or who have a strong desire to work on a distinct type of problem can now be assigned to those microservices.
How much freedom is there on technology choices? Did you all agree on sticking with one stack or is there flexibility to try new?
Have you broken a monolithic application into smaller microservices? If so, can you take us through that process?
The first step is to include the people (or the team leads) that contribute to the application and to start discussing what are the groups of functionality. Over time, this becomes obvious and less controversial [despite the fact that] there is always tension when re-architecting an application. After discussing with influential people who work with the code, we begin discussing how the different components will be split.
The next step is to create a plan. Breaking up a monolith into microservices doesn’t happen overnight. Starting with the easiest non-controversial components is a great place to start.
Then we execute the plan. We start deploying the new service as quickly and iteratively as possible to show success. Demonstrating success to the non-engineering organization is critical because the support for this change must come by sacrificing product and features for additional stability and flexibility. We commonly measure steps to deploy, SLA/SLO and number of deployments to production as KPIs to demonstrate value outside of the engineering organization.
What were some unforeseen issues and lessons learned when breaking a monolith into microservices?
Some functionality and code becomes redundant within microservices. For example, services that communicate with the user-registration service will need to deserialize the user object. While this might be duplicated in a few code bases, that’s okay if the logic is simple enough.
A particular issue we’ve seen is where two microservices were responsible for a schema in a database. It was deceiving at first since one service was for read and another for write. This type of problem really slows you down when a deployment for one service required a deployment for another and defeats the purpose of microservices.
How do you determine service boundaries? What that discussion like within your team?
At first these conversations about where the groupings of functionality would live seemed obvious. But as time goes on, where new business logic will live become more difficult. In its own service, or would some logic be duplicated in two services? For example, we had a service which maintained user events and then shipped to a log. We had introduced functionality that processed those events and made runtime decisions. Should the decision live with the event engine itself or create a whole new service that handles this business logic? Ultimately our decision was to include it in the same service and if the product team requested additional functionality over time we would decide to move it later.
How you want data to be accessed is critical to determining these boundaries. At one customer site they maintained significant user data. But some data is more sensitive than others by law. For instance, email, address, and real name had to be split out into a different services so the data can be managed by one source only.
What lessons have you learned around sizing services?
The smaller the better, but it takes a disciplined approach to reusing much of the tooling. Containerizing everything helps because it internalizes application specific details and allows for the server environment to stay consistent across different devices. Build tooling, monitoring, alerting, etc. should be put into a common set of tools that reduces the overhead of developing, but more importantly operationalizing a micro-service in production. If you notice your microservices are still quite large it is likely due to lack of tooling. Investing in tooling vs “more features” was a common change we see within these organizations.
How have microservices impacted your development process or your ops and deployment processes? What were some challenges that came up and how did you solve them?
We had some challenges in the following areas: dependency management, deployments, monitoring, alerting, SLA/SLO, security, inter-team communication, logging, and debugging practices.
The way we solved many of these with our customers is to turn operations in a true DevOps shop. The reason this is critical is because microservices brings overhead in the form of duplication in many of the areas listed above. Building great tooling is the only way to make microservices work.
Secondly, it’s important to treat the application developers, the users of the tool, like customers. The result of doing so will not only improve the tooling for the internal developers but it’ll also motivate your team towards a success metric. For one of our customers we measured tooling/service adoption as a way to measure success. Simply asking “are they using the internal tool?” was a good enough metric to drive better internal tooling.
Auditing data sources has become significantly easier since the footprint of code that needs access to a datasource has become a lot smaller.
How have microservices impacted the way you approach testing?
Integration testing is very error prone and costly, in terms of man-hours. The ROI just isn’t there. Each individual integration test brings a small marginal coverage of use cases. If you consider all the code path combinations to your application coupled with another application’s codepaths, the number of tests that need to be written easily explodes to a number that is unachievable.
Instead we instruct our customers to focus on unit test coverage, a handful of integration tests that will demonstrate total failure to key areas of the application, additional metrics such as SLA/SLO, additional alerting, canary deployments and one-click rollbacks.
Inter-service integration tests defeat the purpose of microservices. If you’re testing how the system works as a whole, then conceptually you’ve done something wrong since these services should be loosely coupled. Instead, building in a “contingency plan” into the code when a sub-service fails is a better path. For example, Netflix uses a service to predict what you will see in your feed but when that service fails it falls back on a cached version of the feed so that no failures exist.
How have microservices impacted security and controlling access to data?
Security to data has significantly increased because teams only have to give access to data on more discrete datasets. Data can be segregated by service so that a password isn’t stored with other sensitive user data like PPI.
Depending on the size of the organization this differs greatly, as the other service requesting access to your service’s data may be in another part of the country. We’ve found that there are three ways to solve this problem:
- Mutual TLS: this is very secure as it depends on certificates being granted to the service accessing the data. The downside is managing certificates for both parties. Netflix created a service called Lemur to help with certificate management.
- Token Based Authentication: this has the advantage of being simple to implement
- Security Groups: these are more difficult to implement since it becomes harder to implement across VPC and account resources in a security group.
Have you run into issues managing data consistency in a microservice architecture?
The important thing is that you don’t have multiple services managing a data model. We had this issue where we had one microservice control the schema version and writing to the tables and another microservice reading directly from the database. This meant that whenever we deployed one service it required us to deploy the other service, therefore coupling these two “microservices” together, which defeats the purpose of the architecture.
Thanks again to Isaac for his time and input! This interview was done for our Microservices for Startups ebook. Be sure to check it out for practical advice on microservices.