I have really been struggling with finishing off this 3 part series because the last part should be the “big finish”, it will drive huge amounts of traffic to this blog and folks will be raising their lighters in the air asking for a Part 4! Had I sat down and written all of this a week or two after we presented it back and June I think that would have been easier. The reality however is that I’m finding myself struggling with the idea of moving to PaaS as the “best of breed” solution these days. Through this blog I’ll explain what our original thoughts were about PaaS and why it is represented as the “best” model and throughout I am going to sprinkle in my more cynical view points on it as a migration path.
Why did we think PaaS was the “best” way to host a cloud solution?
If we take a relatively shallow view of this I think the PaaS model is actually very easy to sell as a migration target for your application. Let’s look at 5 extremely obvious benefits and how they are true on the surface but are way more complex when you peek under the hood.
1. Automation is good, manual intervention destroys elasticity
There is no way to deny that automation is a great thing. In fact if you look at any of the leading PaaS models today (Amazon Web Services w/ Elastic Beanstalk, Windows Azure with Web/Worker Roles, and Google App Engine with customized runtimes) they all cover this feature by providing a way to upload a “packaged” application and all of the environment setup (OS, App Server, etc…). Other setup aspects can be easily scripted out or in the case of Amazon simply baked into a template and reused. Being able to automate the installation of an application is paramount to being able to survive things like hardware outages or being able to handle spikes in demand that require quick responses by adding N instances of your application. Without automation there is no way cloud computing can deliver on half of the promises of elasticity and resiliency.
That’s the surface argument for moving to PaaS but in reality it’s not anything new. We have all sorts of demand for automation in setting up and rolling out applications in on premise web farms today and I can remember being involved in projects to handle this well over 10 years ago (the deployment processes at the time were typically 10+ page manual setup scripts that the ops team had to try and execute without error … repeatedly … sometimes after hours … FAIL). Private cloud options from all the key vendors are also taking this problem head on to simplify that task and when you start to move into the public cloud you have all sorts of automation APIs from the IaaS vendors as well. I guess what I’m saying is that there is nothing wrong with this as a necessary element for cloud computing but in reality we’re not talking about a PaaS specific aspect here.
The reason this often gets broken out as a PaaS specific thing is because it does in theory make your life a bit easier by providing you with a super simplified model for doing this. If you look at things like the Powershell cmdlets for IaaS or the Cloud Formation templates to tailor an Amazon Machine Image (AMI) you’ll find some relatively labor intensive things you have to do (albeit reusable if you have some canned architecture templates). With PaaS, you do in fact have all of this stuff super simplified. Role templates (Web/Worker/Cache) and Startup tasks for Windows Azure provide for a simplified way to laydown dependencies and Elastic Beanstalk is effectively dedicated to providing all sorts of different pre-baked configurations for your web apps written in .NET, Java, PHP, or any other customized setup you believe you can reuse.
So what then do you have to give up? That depends which platform you’re looking to push your code to. For the case of Azure I can tell you there is a lot of control that you will relinquish to plug into this model. The startup and underlying fabric that lives between your application and the host VM you’re running on can be a blessing … and it can be a curse. I have spent a solid 2 years working with folks that wanted to seamlessly migrate applications into those containers and it simply is not friction free.
For Amazon, you end up seeing a model that looks a whole lot more like, deploy it and then get out of the way and reuse the IaaS model under the hood. You won’t find yourself trying to understand which parts of the underlying cloud specific app container provide obstacles … but you will not get the benefits of higher security and greater isolation that you get with a platform like Azure. This is true of Google App Engine as well by the way, they have customized Python runtimes very similar to the app containers you deal with in Azure. These things will get in between your VM and your user code … often for the betterment of the overall health of the cloud infrastructure and not necessarily your specific application!
2. Reducing developers need to focus on horizontal application elements is good, a retailer’s dev team building complex caching frameworks or complex identity infrastructure is bad
I don’t think anyone that has done enterprise development for a significant amount of their career will argue this. I’ve spent most of my career in these roles as an enterprise developer looking for ways to add as much value as possible. The best way to do that is not to get caught in the weeds of horizontal application frameworks and infrastructure. There are a number of ways this happens when building line of business applications but the most common is poor technical leadership and a desire to do “gold plating” for your application. A good friend of mine who had spent many years in the enterprise dev space loved to say “great is the enemy of good”. I have to say I agree and live by this credo when working on designing enterprise solutions.
So when you look at something like PaaS in the cloud you will find that a huge benefit is the building blocks available for you to reduce the amount of code you have to write. There are elements that all vendors have like NoSql style storage repositories and then there are ways to store block and page level data in a highly durable/scaleable storage infrastructure. Then you’ll see identity and access infrastructure, messaging capabilities, relational databases, and caching or content distribution features. The list goes on and on and will probably change before this ink is even dry on this post. All of these have nominal charges considering the amount of code the app developer ultimately avoids having to write.
Now I have to say these features are absolutely wonderful if you’re envisioning a brand new application or looking at rewriting your application. That said, there is rarely any simple migration path for an existing on premise application. For example, if you want to look at something like Azure queues or Amazon’s Simple Queue Service because your current application has been using an on premise queue between tiers then how do you do that … well … you rewrite your code if you want to move to PaaS. With IaaS you’ll have other options but if you want to start using a PaaS feature it is very hard to do that from an existing application dependency like this.
The other big challenge is the multi-tenant nature of the PaaS features you’re going to consume. The most common one I hear folks struggle with is the relational database that they choose to use. In order for vendors to achieve better economies of scale it is typical to see the isolation boundaries blur some and thereby leaving the applications subject to transient problems. Kind of like having your neighbor suck up all the internet bandwidth on your street by hosting a really popular Justin Bieber fan site (no this has not actually happened to me, just thought it was a good illustration). The big difference here though is you can’t walk over to that neighbor and say … hey … cut that crap out or go put that freaking site up in the cloud man! In a PaaS scenario you have to hope the superintendent or home owners association rules come to the rescue. This type of issue is something you can design for if you’re building a new solution but if you’re simply migrating an existing application, well, that is really hard and often causes developers to stumble right out of the gate when moving to the cloud.
3. Config and app version consistency is good, environment drift is bad
PaaS is awesome for this, when you consider the basic model here you are almost forced to stay fairly clean with the footprint of your application on the VM itself. You are building an application “package” and scripting or building templates for any specific infrastructure dependencies. The promise of elasticity and resiliency means you need to be able to have your application stand back up from a bare bones VM quick and easy and consistently. A typical PaaS developer has to consider the environment evil and untrustworthy.
There are some specifics you’ll need to understand though. First of all, environment drift is still a potential risk in a PaaS environment because remote access to the VM is available from Azure and AWS (but not Google, the Google App Engine really is a pure black box … you don’t touch that environment at all). You will also find different rules about when the environment is blown away and when it isn’t. Is the environment state lost when we version our application? Is it lost when a hardware failure occurs? Is it lost when the OS image is updated? The answer is actually not consistent here and you need to explore the platform you’re using to fully determine how the state of your application might be cleaned up.
Now let’s address the topic of consistency. This is always a goal for application developers that are working with multi-node configurations. When I roll out a new version of my application I want to ensure that it gets done consistently and allows me to do it without any downtime. The best way to do this is by leveraging load balancers and virtual IPs with replica staging envrionments. Seems simple enough, in the Windows Azure space you get an option to execute a VIP swap from a staged environment and Amazon provides a similar option to control the LB routing from their portal with tagging options for staging VMs. If that were the end of the story then I probably would just say this is a no brainer … but it isn’t.
There are still certain types of solution changes and certain architectures that don’t work well with a direct staging to production swap. For those you’ll see varying solutions. In the case of Windows Azure you have a rolling upgrade option but this has a lot of complexity. It is baked into the PaaS model but you have to deal with a downgraded capacity and in any scenario with more than 2 instances you could have to handle running 2 versions of your application side by side! This is because the way upgrade domains cascade through your solution they shut down a percentage of your application for upgrade and then bring it back online while continuing to “walk the upgrade domains”. If you can stick with VIP/LB management you’re fine and I would still consider that part of the platform. It gets a bit more sticky as you look at smart fabric options.
Last but not least, the concept of high availability is baked into your cloud platform in a few different ways. Options like clustering on top of availability zones in AWS or availability sets in Azure can get very confusing. You have to understand how to dissect your application into tiers that can not be taken down at the same time. You may need to look at what internal dependencies exist between those tiers and you have to realize that your nodes may be taken down or fail over at any time due to hardware failures or host infrastructure updates. High availability is not something PaaS gives you for free but it is something to consider when you’re looking at versioning your application. You have some work to do here if you want to provide for continuous service availability.
4. Sticking with a dev platform you know and using tools you’re comfortable with provides for efficient cloud development and is therefore good
I have found this to be true for me but I’ve been a pure Windows, Visual Studio, .NET guy for the past 8-10 years. The tools we have for doing development are great, the tools we have for doing cloud development are good and continue to get better. They do provide for a lot of efficiency gain because you have local emulators and you have numerous SDKs that simplify the work against some of the more confusing APIs. The same can be said of the tools for Eclipse working with plug-ins to deploy to AWS (or even Azure from Eclipse). The Google tools work well too and have full blown local emulators. Amazon is the only one where I don’t see any local emulators today. Yes, they have the Eucalyptus bits that can do a form of private cloud and thereby give you a on-premise configuration but I think we’ve stepped out of the realm of dev emulators at that point.
The reason I focus on the emulators is because this whole post is about PaaS. How are you expected to write any application and test it locally without emulators to verify that it works! Moving the code back and forth to an actual public cloud set of instances is time consuming and ridiculously inefficient. There in lies one of the first ways the tools story starts to break down some. The emulators are good … they are not great. I’ve used all of them for a long time and you’re always going to find that they are not 100% consistent with the cloud platform you’re deploying too. They also have their own overhead associated with them so if you’re planning to run tests in those local emulators just plan for some extra coffee breaks.
I would also say that while tools are important for development they don’t necessarily make or break the platform. I’ve seen incredibly efficient developers that use Notepad++ and incredibly inefficient developers that use the most advanced dev tools ever seen. The real efficiency gains are in the approach and knowledge of the platform. So much of the gains you might see from the tool are vaporized when developers are not building good unit tests or not aware of the latest API wrappers that could have saved them 100’s of lines of code. If you’re basing your success on a tool alone when moving to the cloud then I’d say you’ve already fed the Mogwai after midnight.
5. Moving your application from a shared infrastructure to a dedicated one with horizontal and vertical scaling options is good, being locked in to a constrained set of shared infrastructure on premise is bad
You may have noticed I said “shared infrastructure” twice in that line. That was on purpose, what you often see in an on premise application that is trying to move to the cloud is an application that has no idea how to run in an isolated set of infrastructure. To be cost effective you want to run with the smallest set of VMs that can satisfy your user demand. Now, admittedly, this isn’t just a PaaS issue, it really is for any move into a hosted cloud infrastructure. Most on premise operations teams will run applications together on a VM. Sometimes I’ve seen this divided by business unit, sometimes I’ve seen it divided by some sort of costing model. Whatever it is, there is usually a lot of applications that run side by side with other applications. In addition, these servers are often loaded up with a set of enterprise standard software that was chosen 5 – 10 years ago.
I mention all of this because the goal of running in a cost effective manner means you have to determine your existing capacity requirements. These requirements are often skewed by what my good friend the Samurai Programmer likes to call “Shmutz” (I can’t take claim to using Yiddish … he’s taught me all the words I know … thanks Greg 🙂 ). That “Shmutz” is the stuff distracting us from finding out what the real capacity requirements are. I might have a bunch of CPU getting burned up by a virus scanner, I might have another application that gets a spike in demand every morning at 9 am and forces my application to throttle down, I might have background jobs eating up a bunch of memory, etc.
There is no arguing that getting to an isolated infrastructure is more flexible but is it the most cost effective model for your solution? What else could you consider? There are shared hosting models now in Windows Azure and with Amazon you’d have no problem setting up multiple sites and routing different DNS requests based on the incoming URL. High density hosting and pushing again for the right economies of scale can be a difficult decision depending on the type of application. At this point it does become more of a PaaS topic but really an even finer grained one than you may have originally thought. Are you going to be happy with PaaS shared or do you need isolated and how can you switch when you need to.
I’m very impressed with what we came up with for Azure Web Sites when you’re looking at this scenario but it is limited to web applications. I’m also impressed with what Amazon does for their “Spot Instances” pricing option. This isn’t really a model for shared hosting per say but it is a way for you to drive your costs down by bidding for compute space and if there is some available then Amazon will give it to you. Imagine if you have some flexibility in how long that computation job takes but you only have $X to get it done. You bid and wait and you likely get it done at some point. Priceline.com for cloud computing … where’s the Shatner commercial?
PaaS and Vendor Lock In
I can’t complete this topic of migrating to PaaS without being fair to this topic of vendor lock. After all, if you’re going to write code that targets a specific platform you have to wonder if the platforms are locking you into their model. In some cases the answer is yes but you can insulate yourself in a number of ways. If you’re planning to write code that is purely PaaS and leverages all of the tools and SDKs that are most mature for each vendor then yes, you’re likely to end up locked in. In fact some of the programming languages themselves are going to bind you to a specific platform (the Go language for Google App Engine or .NET for Windows).
What you can do however is look at options that allow for better portability. Languages like Python are being built up on Amazon, Google, and Microsoft’s clouds complete with SDKs for their platform. you could look to those runtimes and frameworks if you want to avoid getting locked in. This is an area I continue to explore in more depth and you can almost guarantee there will be more posts coming from me on this in the near future.
The other interesting area here is how to leverage the various private cloud data center management tools to provide for seamless back and forth work in the public and private cloud … also without too hard of a vendor lock in. Again, more to come on this as I’m focusing in this area extensively over the next 12 months. As it relates to pure portability I know Azure has a distinct advantage because they are the only cloud vendor with a developing private cloud solution. VMWare could certainly poke their head up in this area at some point but they are way behind with nothing but Cloud Foundry (in Beta) in the public cloud space today.
So have I completely changed my mind about PaaS being the “best” location for migrating your application? Honestly, I’d say I have changed my mind, mainly because I can’t see how it is feasible as a migration target given all the friction. That is not an indictment on the Google, Amazon, or Microsoft platforms. They are all fantastic in their own rights but as someone who has written a ton of on premise applications and attempted to help migrate them to the cloud for the past 2 years I can tell you the reality is a new application or a rewrite is likely going to be the only workable options to get to PaaS.
This is why IaaS continues to be an important bridge to the cloud. Many solutions may use that bridge to let the air out of the balloon on their datacenter and start investing in new development in a platform of their choosing. The issues of compliance and cost have been vetted out. They are not necessarily all consistent across the vendors but they will be at some point in the near future. I personally believe that the 2 biggest questions should now be:
- Can i do this cloud app development on premise and have some flexibility in and out of the public cloud? Will it be seamless?
- Does the vendor provide me a platform that will lock me in and how do I avoid that?
I hope you found some of these three parts interesting and maybe they made you ask some new questions you weren’t thinking about. PaaS does, at it’s core, provide for a simplification of some aspect(s) of the development of your cloud solution. The question for developers now is, how and when do I start to look at that because straight migrations are often a complete dead end.