This interview was done for our Microservices for Startups ebook. Be sure to check it out for practical advice on microservices. Thanks to Zachary for his time and input!
For context, how big is your engineering team?
Our engineering team is 15 people. Particle is a full stack Internet of Things device platform, which is say that we go all the way. We give you hardware and firmware and we've made protocol decisions and encryption decisions. We have services that devices talk to. We have an API for all your mobile apps and servers and things like this. We have a bunch of front end interfaces for both developing code for those devices as well as for managing devices at scale. We interact with telephony partners, cellular carriers. We are an MVNO. We are a cell carrier. We also have automated SaaS self-service billing. We do enterprise contracts for big companies who are rolling out Internet of Things products. There is a great deal that goes on at Particle. I give people -- even non-technical folks -- who join Particle a sort of technical on-boarding to say, "Here's a high level of what the Particle cloud is." I break it down into 11 different services that are some of the things that go on at Particle. And for the technical folks we dive in for more details there.
Are you using microservices, and can you give general overview of how you’re using them?
Some of those services are more micro than others are. The API is one of the biggest. As often happens at companies like us, developer-facing, [we have] an API you add to over time. Last year we overhauled our library system. When you make hardware you often have a firmware library that you use to interact with some off the shelf sensor. Theres a temperature sensor that you want to add to your device. You could write the low level code to change the voltages on the wire to talk to that chip or somebody else has probably made that a lot easier. We have hundreds, maybe thousands, I haven't checked lately of community contributed firmware libraries. Last year we overhauled the library system as it exists on our back end to play better into our roadmap. And we brought that...the whole library system was a part of our front end, our web IDE but it was not a part of our API, and so part of the overhaul was bringing the library system into the API. And we did that as its own microservice that sits behind the API. So those particular routes for the API go sort of through the main API to the Microservice and then back. There's a bit of routing there. And then everything else there's sort of like a service that handles all interactions with carriers. It's about the principle of encapsulation right. If we need to talk to one of our carrier partners all of the apps in the cloud should not know anything about that. They should just ask one internal service that then has all the business logic for talking to cellular carriers. Similarly for all our automated SaaS billing, which includes things like cellular data usage, o we have the monthly subscription but then calculating overages, which gets very complex. And that's similarly encapsulated in a certain service. Talking to devices -- there's one service that talks to them over TCP. There's another service that talks to them over UDP. They do similar functions but cellular is more appropriate for UDP. So in any case, that's some sense of how at Particle we use "microservices."
Did you start with a monolith and later adopt microservices? If so, what was the motivation to adopt microservices? How did you evaluate the tradeoffs?
We built the company from the beginning as more of a service-oriented architecture, the word that has a bit longer history than microservices. So there are a handful of services. They do one thing; they do it well. Then over time they have grown and there have been occasions where we say, "This is too big. We now want to add this new feature, but it just is starting to feel like it's too big to keep adding to something that's getting more monolithic." One of our services becoming just too big, it's just a smell for the team that this is getting too big so we want to add to it in a more microservices-like way. So as one example we have a public facing REST API. And that has grown and grown.
I would say one of the biggest things is don't just do microservices because it's a hot thing. Do it for a specific feature for a specific reason to add specific customer value. Or sometimes just internal facing, like speeding up the developer workflow. In some cases that can be enough reason in itself if things have gotten too bad already. I don't mean to make that sound too negative because there are lots of reasons if you're just getting started. You're bootstrapping. You don't know if your company is going to be around in six months. There are lots of good reasons to say focusing on the maintainability of this in the long term is not the right way to spend your time. If you are a startup just trying to execute super fast, the monolith is often the best way to do it. Now when you start feeling the pain of [it being] too hard to understand the giant app, maybe the test suite is too slow, maybe the interfaces are not defined well, there's a bunch of different ways you get access to certain data. Those are all good opportunities for breaking something out in a specific service, and it's often more of a thought exercise than necessarily a coding exercise. You have to think carefully about where you want the barriers between services to be, what the specific interfaces to look like, and think about security, think about privacy, think about data access. And if you spec those things clearly and you design the system so that it will work for you in the long term and the deployment of it and how those things are going to be scalable and talk to each other, discoverable is another problem to solve. So the mental model of where the interfaces fall, what they look like, is a hard thinking problem and the ops problem, the deployment problem, is also sometimes going to be hard. But often writing code is easier. That's that's one of the main advantages here, like you want to write a tiny app that just does one thing. That's super easy. That's the trade off that you're intentionally making there, but you're going to make it most effectively if you think very hard about the service boundaries and their interfaces and exactly how you're going to deploy them if you're first moving into this world.
How did you approach the topic of Microservices as a team/engineering organization? Was there discussion on aligning around what a Microservice is?
Every Wednesday we have a meeting that is for longer form discussions of this type. So somebody might make a presentation, make some slides, and add something to the agenda for the next team discussion meeting. And in that meeting [they will] present: here's the place where we are, here's the feature we want to build, here's the pros and cons of adding it to these different services where you might think you might argue it could fit. Here is a definition of a new service we could add. Do we want to maintain that? Is that something that we actually think deserves the overhead of spinning up a new service, which is non-zero, as much as proponents of Microservices would like to say, "Just make it all! Zero overhead, just build up a brand new service." That's not realistic. There is overhead so we try and make it as easy as possible, but there is some overhead to creating a new service on our infrastructure, adding it to the system, maintaining it, making it globally scalable, and we hash that out. We have the debate. I would say that's mainly the way that we approach it. When there is something that doesn't quite fit and it might need its own service, we have a presentation, a discussion, talk about it. Maybe we can come to a conclusion in one meeting. Sometimes it might stretch out for two or three. Maybe we need to change some of our services to gather some metrics that we're not gathering right now. We [may] need more data to make this decision. That's sometimes a way that we make those decisions.
Did you change the way your team(s) were organized or operated in response to adopting microservices?
We have more services than engineers. We try and make it so that multiple people know each service. Some of the key services we actually mandate there have to be at least two first class owners of the service, like the device service and the worker service in particular.
There are other services that are either very unique to Particle, or at least if not unique, more complicated and nonstandard in terms of what they do, how they do it, and why they do it that way. And those services have in the past been in danger of not having a clear owner and when problems come up we don't know who to say, "So and so, what's going on there? Help us understand it." And so we have now sort of mandated that there have to be at least two owners for each of those two services. And that is well understood and well documented in our team. So for the most important services, to directly answer your question, we do define clear owners. There is enough diversity in our services that it is incredibly rare that any one individual could actually be fluent in very many of them, more than four or five of them.
How much freedom is there on technology choices? Did you all agree on sticking with one stack or is there flexibility to try new? How did you arrive at that decision?
We want people to feel free to experiment. But that said, there is a set of skills, capabilities on the team and if, [for example], Rita is really into Haskell and wants to spin up a new Haskell service and then Rita quits, and nobody can maintain that because nobody else knows Haskell -- we have to avoid that. So the tension is between giving people the autonomy to experiment, to try things. We are very much biased toward experimentation and learning. But then when we come to a point where an experiment is going to turn into something we're going to maintain long term we have to make the call of: is it okay to do it in this new language, this new environment, this new framework, whatever it is?
For most of the history of Particle...almost everything at Particle is built in Node.js or Ruby on Rails. There are a couple of Rails apps. Most things are in Node. A few things are in Ruby. Similarly, almost all the team is comfortable in Node, and half the team is comfortable with Rails. We have a handful of things in Go. All the firmware is obviously C++. And so again about half the team is comfortable writing C and C++. There's a very small number of people in the team who are fluent in Go and a couple more who are comfortable experimenting with it, interested in learning it. The Go stuff I suppose is an interesting case study here. They were generally written as experiments by an engineer who actually is no longer here. So we had one person who was really gung ho about Go and wrote them and is no longer with the company. And so we have interesting discussions about as we move forward. It was an experiment. It's not that big. Should we rewrite it in something the whole team is more comfortable maintaining? Should we leave it in Go? There are good profiling tools in Go, which is one of the reasons that this engineer wrote it in Go and we were all impressed with that. That was a good thing. So again, we approach it as a discussion, but there is a certain set of skill sets on the team. And we are conscious of the need to maintain these things and keep developing them and keep improving them. And if something's in a language that one person knows there's always the bus factor.
Definitely people have the ability to experiment here. But if we're going to do something in a new language that's not Ruby or Node, we have to come up with a plan for that, basically. It's not totally shut down but you can't just do it with no plan about how we're going to maintain it.
Have you broken a monolithic application into smaller microservices? If so, can you take us through that process?
Our only real experience with that is the adding on the libraries and points for the API. And that's actually from either side. The API was big and we didn't just want to add a whole bunch more stuff to it because it was feeling big. But separate from that, the Web IDE is the closest thing to the monolith that we started with that we have. And we're moving away from it. We're gradually moving all functionalities out of that app and into our API and then just making it a front-end client of the API. That was one of the first things that we built when the company was just bootstrapping and first getting started. It never got that big, but there was some functionality and still is some functionality that only exists in that app and has not been pulled into the API. We're on the path to move the last big chunk of functionality out of that. So what's the process there? It's long. It's long and slow. It's not like one initiative to just break it up. It tends to be user-experience focused. We're very focused on providing value to our customers so we don't just break it up for the sake of breaking it up. And we don't just break up a big app for the sake of making the lives of engineers better. There also has to be some customer-facing value for the effort.
How did you approach the task? What were some unforeseen issues and lessons learned?
For that library's transition, the library system was only in the Web IDE. We wanted to break it up out of that, make the Web IDE for libraries just a front-end client to the API which would be much, much thinner. Similarly we wanted to add it to the API, but the API also was feeling big so we did it as this sort of microservice behind the API that handles just library's routes. It's a separate app that engineers [worked on]. Part of the reason for that was that the engineers who were working on that library's transition had not previously worked on the API a bunch. There was going to be a lot of context for them to grok if they were going to contribute to the whole API and fit within the norms that are in this app. As it gets bigger it's hard to understand all of that context and they were struggling with this as newcomers to that app. And one of the ways that we wanted to unblock them, make the development faster was to say just spin up a tiny service that all it does is respond to the endpoints that you're dealing with here. Don't worry about all the context of the API. Here's what you can assume about requests and here's what you should provide in responses and just work on that and don't worry about the context of this other pretty enormous app. And so the impetus there, one of the main things was helping developers who didn't have context for a big app. Helping them focus on the problem at hand. Ship a thing with the tool chain that worked for them, not needing to do everything in exactly the same way as all the other hundreds of API routes. So there's that. How do we approach it, what's the process? It's a long, slow process so we didn't just break up the Web IDE, make it an API client. This wasn't one big effort. A library system was the specific thing and in moving libraries out of that app and into the API, there were specific customer-facing value additions that that came out of that. Once we deployed them to our community -- we have a huge developer community -- and the folks who create libraries were all super happy to see the new system. They were like, "Oh this makes so much sense; it's going to be so much easier. Thank you all so much." So in addition to making the apps on our side easier to understand, easier to onboard engineers into one smaller repo for an engineer to pull down, work on, commit to, make a pull request on, making it smaller so it's easier to wrap your head around it. In addition to that which is all internal value, and valuable because it speeds us up. But that by itself is usually not sufficient and the customer-facing value of the features that we're actually going to deliver that were going to make our customers happy was also a requirement there.
How do you determine service boundaries?
It mostly comes down to just good principles of encapsulation: where would you draw the boundaries around any functionality? When is it right to put in one app? When is it right to put in one file or module? When is it right to put in one model, one controller, whatever it is. As we gain experience as software engineers, we start to develop rules of thumb and a spidey sense for where those boundaries lie but it's always evolving. Sometimes the answer is not clear and you have to hash it out with a group of people. And especially the group of people who are going to have to maintain it is often the key group to involve.
How have microservices impacted your development process? Your ops and deployment processes? What were some challenges that came up and how did you solve them? Can you give some examples?
I'll approach it from the difficulty or the getting a new developer up and running perspective. At Particle this is genuinely hard. And we are always striving to make that process easier, but it is rough. I'll just say that. Because we need to give you hardware and then we have to teach you how to develop on that hardware. Even if you're a front engineer, all the front end things are going to be interacting with IoT devices and so you need to have a device in hand. Just making a front end app, unless it's just using functionality that all already exists and is well tested and everything so that you know exactly the behaviors that you should expect. A front end at Particle can't just be developed in isolation. You often actually have to have hardware in your hand and you have to write firmware that's going to go on that hardware to make that hardware behave in a certain way. You have to know which kind of protocol and which service it's talking to. One of the engineers created some documentation internally the other day like how to locally run this set of things. When you want to test this functionality, how do you run that locally? And it was pretty concise. It was three or four pages that described nine terminals to open up and the commands to run in all of those nine terminals and the environment settings you needed to make all of these things talk to each other and how to make your device talk to your local machine over wi-fi or cellular. And then you could test it all locally without actually having a server up. And that was eye-opening. It's rare that somebody needs to do that so here's one of the ways we mitigate that.
We have a good staging environment and good, easy deployment pipelines for how people get stuff on staging. And while staging does become a contentious resource sometimes, like we're working on two big features at once and they haven't been merged yet and they touch some of the same files in the API or the device service, something like this, we have a bot in Slack that's taking the staging conch and it's sort of like Lord of the Flies, like, "I've got the conch, so I have the staging." But you use the bot in Slack to put the conch up. Most of the time it's fairly uncontentious and there's just good communication on the team about like, "Hey we're going to be staging to test these features for the next couple hours," and then you put it back and other people take it to test the set of features. That's only when we're actually working on a couple big things at once. When there really is more of a single focus for the engineering team then there's often not even that conflict. Staging represents the thing that we're going to deploy soon to prods.
Local development at Particle with this deep of a stack is hard and that's exacerbated by the microservices nature of it. The fact that we've got, let's say, nine microservices that we have to run to test this one set of functionality end to end on your local machine. That's hard. So one of the ways that we have mitigated that is by making a staging environment that is really quite seamless for everybody to use. If you commit in a certain way you run a certain command in chat ops that the things just go on staging. The fact that we use so many small services has driven us to adopt production use of Docker earlier than many companies. We use Docker widely in our ops systems. Not for everything because it is sometimes hard to make Docker do some of the low level things that we need to do in our services but in general we adopted it pretty early and felt some of the early pain of it not quite being production ready, or the tooling not quite being there, some of that stuff. But because we had so many services the ability to do a Docker compose YML to say like, "Spin up this service, and this service, and this service and have them talk over these ports," -- to specify that kind of thing in one file that you can run locally or on staging or on prod is very helpful. It automates some of that work.
How have microservices impacted the way you approach testing?
I sort of want to say it doesn't. If we had a monolith, we would be writing a bunch of unit tests for the individual functionality and then end to end tests to make sure sort of outside in behaved as we wanted. And then there would be some functional or integration tests somewhere in the middle. But with microservices we essentially do the same thing. All of the individual services have their own unit tests and we have integration tests for the whole system of for example, here is how we expect the entire system to behave. And those tests don't know that it's a bunch of microservices. It's just different levels of inputs and outputs that you expect.
What are lessons learned or advice around this? Can you give some examples?
So one, I suppose, we hired people who had a solid testing mindset early, so from the beginning this has just been how we roll. But you are right that there are some extra challenges. And some of the ways that we solve that: when you want to have a test that verifies that when you hit this API endpoint you turn an LED on. We have systems that test that whole thing end to end. And they generally run on like a Raspberry Pi or some custom circuit board that we make that sits in a room and it's just running this test all the time on every deploy, like maybe there's some code that gets pushed out to this test rig. And again, because we're a company that manufactures hardware and we write open source firmware and we're testing that firmware on the manufacturing line, we have to develop all those capabilities along the way anyway and then we have to combine them. And it is something that we do fairly uniquely well. But I have a hard time seeing how anybody could do it any other way. To do it any other way is just not to test things, I don't know. Just all manual testing.
How have microservices impacted security and controlling access to data? What are lessons learned or advice around this? Can you give some examples?
Particle has been very security conscious from the beginning. And we had a service-oriented architecture from the beginning. So from the beginning basically, we've had pretty good encapsulation of data access. We know ways that we could make that even better and we sort of get that stuff on the roadmap more and more. One of the big ways that people talk about this, the sort of cutting edge here, to be far to the tiny end of the Microservices spectrum is to have every service have its own database. That's the ideal, [that] the only way to get that data is through that app. We have not gone all the way that way and there are some places where we do want to push access into a more restricted separate service. And right now we have a single digit number of databases that are covered by a handful of different apps out there. We are pretty good at the encryption and safety and encapsulation of data access and there's there's always more we could do. But we've been very security conscious from the beginning so every one of these services was spun up from the beginning with like, "Here is the port that inputs come in on. Here is the port that outputs go out on. Here is what we have access to data-wise. Here is how the encryption happens at each stage for each input and output." We've just been very security conscious from the beginning and designed each of these services as a as a secure entity by itself, and that makes us feel much more comfortable as we combine them. The interfaces between the services are extremely clear. So you spin up some new service, we debate something, we say okay, we need a new service to handle functionality X that doesn't fit well with something we already have. The available interfaces to get other types of data or interact with devices or interact with carriers, whatever it might be is all extremely clear. These are the interfaces that the other services expose and you have to talk to them, use an encrypted proto bus on this port, whatever it is. We have these interfaces specifically defined and that's the only way to talk to them.
Thanks again to Zachary for his time and input! This interview was done for our Microservices for Startups ebook. Be sure to check it out for practical advice on microservices.