This interview was done for our Microservices for Startups ebook. Be sure to check it out for practical advice on microservices. Thanks to Nick for his time and input!
Nick Zadrozny is the founder and CEO of One More Cloud, a company that provides specialized hosted cloud services. Nick is the creator of OMC's two hosted search as a service products, Bonsai and Websolr.
For context, how big is your engineering team?
We are kind of a small team. We're eight people full time and, I don't know, three to five contractors part time at any given point depending on the projects.
Are you using microservices and can you give a general overview of how you’re using them?
I think I would say we started off as a monolith but that's not entirely true. We've always kind of had Microservices from day one and this is day one dating back to 2009 even.
The main app is mostly a user-facing dashboard. It's an API endpoint for Heroku to provision resources and accounts and then for users to kind of get a single sign-on into their dashboard. And then if you go to Bonsai.io or websolr.com, it's the main home page and marketing presence and create your account and log in and manage your resources and all that kind of stuff. So just the usual SaaS account and resource dashboard stuff.
Websolr is managed Apache Solr as a service and Bonsai is managed ElasticSearch. We were kind of first to market with both of those. In fact, when we founded Websolr we did that by invitation of folks at Heroku as they were starting up this add-on program. Their customers wanted search engine hosting. That didn't really exist at the time, which was really interesting. And so my original co-founder had some connections there with the folks at Heroku and some background in search. I also had background with web development and search and operations. So they kind of persuaded us: you guys should start your own business doing search hosting. It will be fun. And one thing leads to another and eight years later here we are. We have just a Rails app -- that is what our customers interface with. This app is itself an API for Heroku who talks to our app to provision things for customers, and then that's kind of the middle point for us where then we're going to make calls out to other services to provision backend server resources and make updates to kind of middle routing layer middleware services and things like that. So that's kind of a super high level sketch of our architecture.
Did you start with a monolith and later adopt microservices?
I would say we started out with Microservices from day one. We've always had services that would run as a kind of coordinator of server resources with Amazon independent from our main app. In the early days it was definitely much more of monolith and I think in general being a small team, there are definitely advantages to monolithic development.
If so, what was the motivation to adopt microservices? How did you evaluate the tradeoffs?
The microservices that we do use are done really out of a strong technical necessity. We can't just run our Rails app on every single Solr server that we run for example. It just doesn't make sense. And then if you were talking about that then you're talking about all of those talking to the same central database that our app is using. So that would be a serious central point of failure for our app if we were to architect that way, because if our main database ever goes down then all of our customer instances go down. So just out of necessity we have to strongly decouple. Our Solr and our ElasticSearch clusters have to stay running no matter what's going on with our main application. So that was the clearest separation from day one out of the technical necessity. From there, our centralized app is still kind of a monolith. We are gradually extracting very clearly defined pieces of functionality out into services.
How did you approach the topic of microservices as a team/engineering organization? Was there discussion on aligning around what a microservice is? Did you change the way your team(s) were organized or operated in response to adopting microservices?
I think it hasn't been as much of an issue or question with us. We are a very engineering-heavy team. Six out of our eight full time people are engineers, kind of product and platform and operations engineers, and so I think we're all pretty well up to speed on the concept of microservices and certainly we're small and agile enough that we're up to speed on modern application development deployment practices. We were a very early adopter of cloud deployments and things like Heroku's platform model and the Twelve-Factor app model. Outsourcing to various different APIs we've always considered kind of a microservice approach. Now whether that's transactional email delivery or subscription and billing management, so we've always made pretty decent use. I think the question for us with microservices or monolith ends up being a case by case, like we sort of evaluate that on a feature by feature basis. So when we talk about developing some new functionality, for us the question always comes up: do we build this into the Rails app or do we extract some sort of microservice that we then have the apps communicating over an API? For us it's largely a question of the benefits of repetition. Are we going to repeat ourselves and write this code twice is essentially the question of building it into the Rails app directly, or are we going to write the bulk of the code once and write a smaller layer of integration in each app? Because each line of business for us does have its own primary monolith. They are separate apps with separate databases. So that's really the main kind of thing that we would discuss at this point in terms of philosophically [whether] this one feature or area of functionality should be moved into a microservice or not. But I would say as far as what is a microservice and what role does it play in our architecture, I think it's probably more clear for us as a team of mostly engineers. I would say were pretty mature in terms of our use of microservices.
Have you broken a monolithic application into smaller microservices? If so, can you take us through that process?
I would say we started out with Microservices from day one.
How do you determine service boundaries? What was that discussion like within your team? Can you give some examples?
So we have some unique requirements I think. We are a 24/7/365 company. We are an operations provider, and because of that our first responsibility is managing excellent uptime for our customers. Our customers use our service because we're able to manage higher uptime and higher performance than they would be able to certainly at the cost of our service. So we definitely have a strong design philosophy of isolating any possible component that would be in the hot path of a customer's traffic to their search engine. So that's kind of rule number one in a separation boundary, is trying to clearly define when and why do we have to interact with customer resources on the backend. Most things that are going to have to either interact with customer resources directly or indirectly we're going to want to separate from just normal kind of day-to-day product development. And so the benefits of that kind of isolation: there is the backend microservice [that] has its own API, so we can decouple the release cycles and the release management from whether we need to tweak some CSS on the front end or tweak some marketing copy or something. That can all get deployed on a way different schedule than we need to ship and update that's going to change how we manage customer resources. That would be rolled out a lot more carefully and have its own code base that has maybe even a different kind of code review policy and things like that. I think that's probably the first major rubric that we use for defining those service boundaries.
What lessons have you learned around sizing services?
[W]hen we're looking at microservices or services internally that manage a lot of data -- we store really detailed logs and analytics for all of our customer traffic. We have a proxy layer that logs all the metadata of search requests, for example. And this is like what kind of action is it, and how long did it take to complete? What was the backend server that handled it? A lot of these kinds of things so we can look at trends on operational metrics. And so this particular service was at one point responsible for collecting all the logs. So in the very early days the logs on the servers themselves grew to very unwieldy sizes or were impractical to search. And so then we start exporting all those logs into another ElasticSearch cluster so we can search within those logs. And then after a while we start hitting some definite size and scaling challenges there and you can vertically scale for a while add more capacity to that service. But in time we found it more practical to then rebuild the metrics collection one more time, and then these days we actually do a lot of pre-computation and aggregation and so now this service is not just indexing into ElasticSearch for searchability. It's also doing a lot of aggregating and we store metrics in Cassandra. So the flexibility to kind of recompose how things are persisted. Starting with on disk log files to then Elk Stack, to then Elk plus Cassandra was one really nice benefit of having some of that metadata just shipped directly off the server and onto some other service within our backend. And we are able to do all of that -- and again this is is directly adjacent to the customer hot path -- we are able to do manage releases in a way that was effective that led us to kind of do some major capability and capacity improvements without risking any kind of impact to actual customer traffic.
How have Microservices impacted your development process? Your ops and deployment processes? What were some challenges that came up and how did you solve them? Can you give some examples?
Deployment has definitely sped up every time we've introduced an effective microservice. We've always seen an increase in performance because we have a lot more confidence when we clearly define these boundaries that when you're making a change on one side or another you understand the scope of what's being impacted when you're reviewing code and planning for deployment. So we now have a deployment process where a lot of times if we're going to add a new functionality that depends on a microservice we can dark deploy the functionality on the backend first and get that working correctly. And then only later once once it's done and working well we can shift the interface for it in the front end. So decoupling those two means you're able to make smaller changes more frequently and that just really increases our confidence in the changes that we're making. It improves our ability to conceptualize what has been changed and and not have to worry about code that's all entangled and creating sort of unnecessary and unclear points where your software is coupled to itself. That's just incredibly valuable in operations as well. If you aren't constantly vigilant in maintaining an ability to deploy frequently and deploy very small changes and to validate them, then you kind of get stuck in this pattern where you're releasing once a quarter or God forbid, once a year. And it's just like pulling teeth to get all the resources to make it happen and then by the time it's actually done, now you're planning for your next one.
Have microservices impacted the way you approach testing? What are lessons learned or advice around this? Can you give some examples?
Yeah, it has. And we're still working through some stuff on this actually. I find it pretty interesting. I mean it is nice from testing perspective. You're able to have much more focused unit tests on the kind of again you have the service boundary, right? And so one side of the service boundary you have the microservice; on the other side of the boundary you have the client that is consuming it. And so you're able to write tests with both in mind. The unit tests that are testing the microservice itself -- it's a little bit more of an implementation detail. You have a little more freedom and flexibility as the owner of a microservice to kind of change around your implementation of things and that goes with changing your tests. And so by clearly defining the boundary, like maybe there does not need to be a change to the API itself. I think you just promote a lot more productivity. I think tests are more clear, more productive, they're a little more disposable. It's a tool that serves a purpose for a time in the context that it needs to exist in. But from the perspective of the app that's consuming the microservice it really doesn't care about that stuff. I find that that's really a really clear benefit when it comes to testing. We have a lot fewer tests in our main primary monolith that test the behavior of the microservice because we're able to trust at a higher level that it is serving the job, that it is doing its job that it needs to do. And so that keeps the monolith a lot more focused. It prevents coupling back to what may be implementation details in the microservice. And then it just allows a lot more flexibility within the microservice itself to kind of change and adapt. It's kind of more of a development practice thing than specifically testing. But yeah, I think that's kind of the main thing we get is just the flexibility to use the right tool for the job and to focus on the code base that you're working in directly, focusing on what it is responsible for.
So changing the way we approach testing, we would then -- I think in our microservices, most of them bundle a client as part of the project. We may have a Java drop wizard microservice in the back end and it would have a Ruby gem client bundled in that repo that would then be consumed by the Rails app. And so we would actually drive the integration test and the microservice itself using its own client. That's one of the nice things about that kind of testing. We would still have Java unit tests as well. But ultimately the Ruby client is where we would kind of put all the integration tests and put examples of how this thing expects to be used. And if a contract was ever violated on the API level then we have a place to go where we would then create a regression test.
Have you run into issues managing data consistency in a microservice architecture? If so, describe those issues and how you went about solving them.
We help our customers with this question all the time. I mean for us data consistency is not as much of a problem because we do have that monolith up front and that, because it's talking to PostgreSQL, basically PostgreSQL transactions give us all of the consistency we need. The way our models are defined mostly as an event based design in our Rails app gives us pretty much everything we need in terms of resolving any potential data race or concurrency or conflict situation. So our challenges on that stuff are less than what our customers might have. One of our microservices does have a distributed data store that's built into it. But fortunately it is kind of a last right wins design.
There are three layers of this for me. Our business is the management of distributed data services, namely ElasticSearch is a big one and it was designed from day one to be distributed. So over the years just being engineers staying up with cloud architecture, we've had to remain really educated on the distributed systems engineering subjects. So our main customer base and accounts and cluster management and resource management, that's not necessarily a distributed problem because it goes through a central database. But when it comes to coordinating some of our data services, that can become a distributed data problem and so this is where we have at times had to rely on databases that are specifically built for that environment. So we do make use of a Zookeeper, which is designed for consistency in a distributed environment. We also have a different data service that is designed for high availability, so it may not always be 100 percent consistent, but it is always 100 percent available. And so when we design microservices in the backend there's always going to be a question of what is the fundamental backing data storage and what are the tradeoffs with respect to data consistency and availability, and how does that impact our decisions? So we have all three. We have central, we have decentralized but optimized for consistency, and we have decentralized but optimized for availability. That said, managing consistency is really interesting when you look at say one of our customers integrating with these search engine service like ours because now they have to make changes in their primary data store and those changes have to get out to their search engine and be searchable. And sometimes they need to make updates to resources and coordinate the consistency of those updates, especially if there are many updates happening in quick succession to a single document for example. So we do a lot of education with our customers through our support channels. We have half a dozen different kinds of best practices that we can recommend to people when it comes to managing data consistency.
I think really the best thing for an engineer in any sort of microservices world, in a distributed world, I think the best thing to consider is don't create data consistency problems if you don't have to. Just use PostgreSQL or some other reliable central data store. That's not always the right answer because that becomes a single point of failure. I think the bottom line is engineers really owe it to themselves to become educated on the subject of data consistency and designing data persistence and the tradeoffs of consistency and availability and just pure centralization.
Thanks again to Nick for his time and input! This interview was done for our Microservices for Startups ebook. Be sure to check it out for practical advice on microservices.