Bad development drives bad infrastructure decisions

The relationship between Development and IT Operations is a mutually inclusive one in that a decision that is made on one side can effect the other, not always for the better.

With all the focus in the world of DevOps seemingly on the Operations and infrastructure side I would like to take a moment to discuss some of the fundamental issues that can effect how your DevOps process is rolled out if focus isn’t also applied to Development as well.

Moving an organisation away from Waterfall to Agile is no mean feat.  It entails a dramatic shift in mind sets and organisational culture.  The adoption of new methods of working will cause many to find it overwhelming if thrown right into the deep end.  Fortunately the saving grace of DevOps is not only to help them move towards incremental changes to their products but to do so in small managed incremental changes it’s self.  Lead by example.

Yet even managing this transition of your infrastructure in bite sized digestible chunks can be thrown off track if fundamental issues in Development aren’t addressed first.

Development first

You have your new green field project and the go ahead to “do the DevOps thing”.  It’s time to plan your infrastructure right?

Not so fast, and here’s why.

While your Development team may already be functioning in an Agile fashion, holding scrums, code reviews and automated testing within their own controlled environments, it may not always mean that what is being produced by Development is in a state that can be readily absorbed by your engineering teams to deploy to live.  And this can be the root cause of many poor Infrastructure design decisions.

The usual Waterfall environment map may have anything between 5-7 environments to validate the build, stability and acceptability of release candidates. In that world you have a lot of quality gates to capture poor releases and as release cycles can stretch into months there is always time to capture defects and get them fixed before they hit production.

Under Agile our goal is to reduce that lead time dramatically and shorten the development cycle producing features on demand as and when they are required by the customer.  All good in theory, however it also means removing a lot of quality gates which until this point have been helping you stop poor release candidates getting through the door.  Under this scenario it can be the natural response for the architect to add additional quality gate environments into the map to help capture those instances.  While all good in theory, this is not the ideal response when trying to create LEAN, Agile infrastructure.  Instead of saving money, you will end up spending more on those additional environments to try and capture these bad RC instances.  In effect Infrastructure is still having to over engineer to protect themselves from bad development practices.

If this sounds familiar to you then your work with Development is not quite over.  In my experience I’ve identified the following key areas to sanity check and help drive better release candidates

Automated testing

There is no question as the importance of testing your code.  BDD and TDD will help developers create more efficient LEAN code, but are they testing correctly?

There is a world of difference between Unit and Functional testing and they should both be used in tandem.  While Functional testing can give you a good idea of how the code is functioning as a whole, without Unit tests it becomes harder to validate the individual stability of each component and you could be missing some important information to help diagnose bugs and defects.  For example a method could be changed in a controller which causes methods in a model to behave differently. Without Unit tests to protect the individual functionality of each method it is harder to diagnose where the fault is if the developer who implemented the change is unavailable.

Code review

I really can’t stress how important the role of the code review is to the release process.  While automated testing can capture issues with the functionality of code at the high level, it is difficult for automated tools to definitively assess whether the written code is of a high enough standard to pass through to production.  A more senior developer should always review code to ensure that what has been written meets coding standards, does not contain hard coded passwords and is of a sufficient quality to merge into the next Release Candidate.

Post-release feedback

Lets say that a release candidate has gone through the testing phases without incident but then is deployed into Production and something goes wrong.  A fix is found and the release is finalised.  Is that information being passed back to Development?

This sounds like the most basic of concepts yet in too many organisations I’ve worked with, this fundamental and crucial stage in the release process is missing.  It’s an area that underlines the importance of a proper implementation of DevOps methodologies.  In the test case we looked at earlier where Infrastructure are over engineering environments, it could simply be that the Developers have not been informed what went wrong therefore continue to make the same mistakes.  Putting in place that feed back loop allows the Developers to raise these issues in code reviews to ensure that the defects encountered can be alleviated in future.

The onus is now back on the Development team to fix their process but in a manner compatible with shared ownership as the Ops/Infrastructure team are actively engaged with Development to resolve release issues at root cause.  The dialog then allows Architects to simplify the environment map to the Minimum Viable Product so that extra environments aren’t required to try and catch development mistakes.

Summary

Effective communication is vital at all stages of the release process and attempting to fix fundamental development issues by over complicating infrastructure is most definitely not the correct way to go.  DevOps should always be used as an aligning tool to bring both sides of the product cycle into balance and ensure successful growth and stability.

Agile Waterfalls

There are many businesses today that still use Waterfall for managing software development projects.  While Agile has been around for a long time, Waterfall precedes it by a large margin and it is still the mainstay for larger organisations.  There are many reasons for this such as the fact that it fits in nicely with ITIL or Prince2 project strategies but more likely it was the strategy that was employed with legacy products and due to constraints to the project, customer requirements or technological debt can’t be easily migrated into a fully Agile approach.

Having said that, all is not lost in being able to bring aspects of Agile software development into the Waterfall structure and build the foundations of eventually moving towards a full Agile solution.

The usual Route To Live layout of projects that are controlled via Waterfall consists of multiple environments, usually labeled Dev, Test, UAT, PreProd, Production and Training.  The general flow will consist of sprint cycles on Dev and the changes then promoted to TEST where defects are identified and fixed.  Then UAT after the test cycle and so on until all the Pre-Production criteria are met and the business gives the go ahead to deploy to production.  In some cases Training may be deployed to before Production in order to bring staff up to speed on the new features about to hit production.

When viewing the whole software delivery cycle from this high vantage point it is difficult to see how Agile can make any beneficial change without breaking the whole delivery process wide open.  If your purpose is to completely replace Waterfall then yes this will inevitably happen.  The good news however is that Agile can be what you make of it.  Like ITIL, it is not necessarily a full list of dos and don’ts, but more of a framework on which you can build your business logic and processes on.  With this in mind it is possible to be more Agile within a Waterfall delivery system.

Redefining customer

From the business point of view, the customer is the end user or client who will be using your delivered product or service.  When working with Waterfall this external entity is generally the focus and the process is to ensure that the customer gets what they asked for or are expecting.

To begin implementing Agile it helps to fully understand and redefine what we’re talking about as the customer.

Taking on the aforementioned definition, we can define a customer as anyone who is expecting something from the work being undertaken.  In the case of the development team, they are producing features and implementing changes according to the functional specifications provided.  From their point of view, the customer would be the Testing team.  Testing feed into QA and QA feed into operations.  By defining these relationships we can begin to start seeing how Agile can be used to aid the process of delivery in a macro level by allowing the focus of “customer” to be narrowed down.

Agile in Development

Now that the focus of the customer is on the needs of the testing team, it is possible to start breaking down the requirements of Agile development into deliverable and achievable goals.  This can take on the form of redefining how sprint cycles are manged, Continuous Integration and build servers, automated unit testing and finally how changes are promoted to downstream environments.  Rather than trying to create a solution that will encompass the whole Waterfall structure, we’re creating macro solutions per environment where we can start bringing about tangible, measurable changes that won’t effect the overlying business logic and project plan.  To some, this may seem to be a bit of a cop out, however, when considering what it is we’re trying to achieve with Agile, it soon becomes apparent that this is a very Agile way of working within a predefined project structure.

Source control branching strategies

Probably the biggest issue that can be encountered with Waterfall and continuous integration strategies is how Branches are used as part of the environmental promotion.  I lean strongly towards using the Git-Flow branching methodology where there are 2 main permanent branches usually labeled master/production and development.  Development is where all the main line evolving features are implemented and master/production should be reflective of what is currently the state of the live production system.

I have seen situations where companies have adopted git as the source control then proceeded to create a branch per environment.  While this may seem a simple and logical step to take, it is by my opinion one of the hardest branching strategies to maintain.

Consider the flow of changes through that type of implementation.  Development finalize a release cycle and merge it into Test.  During the test cycle defects may be uncovered and fixed then those changes are merged into the UAT branch and hopefully back to Dev.  In UAT more defects are uncovered and fixed before being passed to PreProd.  But at this stage those UAT changes not only have to go back to Test but also Dev.  The further down the fall we go, the more merges are required to back port the fixes to previous branches and it can quickly escalate out of control when we have multiple releases going down the Waterfall behind each other.  In the worst case scenario a defect may be fixed in Dev which was also fixed with an earlier feature cycle in a downstream environment and then we end up with merge conflicts.  The whole process can get very messy and difficult to manage.

By far the easiest means of managing feature promotion through multiple environments is through Git-Flow’s use of Release branches.  As mentioned there are only 2 main permanent branches in existence, but there are also a number of transient branches called Feature, Release and Hotfix branches.  A good definition of the Git-Flow strategy can be found here.

It is then this release branch that is making the journey down the waterfall rather than attempting to move the change its self through multiple merges to environment branches.  These release branches can be tested, defects resolved and passed to downstream environments more efficiently than constantly merging changes across branches and the end result is that any defects found in the branch only have to be back merged to a single location, the Dev branch.  This won’t resolve all merge conflict issues such as the one mentioned before so effective communication between the Dev and Test teams is a must, but there is no reason why at the end of each environment cycle, changes can’t be merged back to Dev before moving forward.

The same can be said with relation to Hot fixing issues in Production.  The Hot fix will be cut from the Master branch and this can then be passed through the environments to be checked, tested and accepted easier than trying to revert each environmental branch back to production’s current state (which I have seen in the past) and then reapplying weeks of commits back after the fix to prepare the environments for the next cycle of feature releases.

Testing in Dev vs Test environment

The importance of automated Unit and Functional testing is apparent to everyone working within software development and is now becoming a normal aspect of all development environments.  Even User Acceptance Testing can now be automated to a certain extent allowing more of the applications features to be tested at the point of commit.

There are many Continuous Integration servers out there to pick from and so many ways of testing that the hardest part of creating an automated test platform is in picking the right tools to use for your software development requirements.

But a very serious question has to be asked here.  If it is possible to now test almost all aspects of the product with the use of CI servers and automated testing, is there a need for the Testing environment and team?

The answer to this question can all be down to how comprehensive and thorough the automated testing solution you’re implementing into Dev are going to be.

Effectively by adopting more and more of the functional and non functional testing into the Dev environment as part of the development process through automated testing, the lower the requirement will be for an extra testing team to handle those tasks in a separate environment.  In this scenario, it starts to become a very feasible option to deprecate the Test, UAT and Pre-Production environments and employ the Test team to write the functional and non functional tests against the CI server where all this testing is being carried out.  The byproduct of this approach is that an organisation can then merge Test and QA into a single function and the role of the tester is to ensure that the quality of the tests being ran and verifying the UAT function meets project requirements.

Summary

While we can’t blanket implement Agile into Waterfall development cycles, we can bring aspects of Agile to each environment in a way that the accumulative effect of the changes can be felt throughout the whole Route To Live.

It may not be possible to completely deprecate the use of multiple environments for development and testing immediately, but we can bring in elements of testing into the Development environment in a staged and controlled manner and through this process have a very real possibility of removing a large section of the Waterfall structure effectively adopting a more recognizable Agile development environment.

The end result of continuous automated testing carried out within the development environment and the migration of the Test function to a QA role, we can reduce the number of environments utilized in the delivery of the product down to 3, Dev, Production and Training.

Year in review – DevOps in Danger

Let me tell you a story.  There was a man.  He had a great idea.  He told his friends who listened and agreed it was a great idea.  He began to tell other people his idea who also thought it was a great idea.  One day he held a conference to tell the world, and he gave it a jazzy title.  The people came and the man tried to tell the people his idea, but the people had stopped listening.  All they were talking about was the jazzy title and what a great title it was.  The idea became lost and all the people remembered was the title.

Welcome to the state of DevOps at the end of 2016.

I was asked an interesting question recently.  What was my view on where DevOps is heading.  Was it progressing or would it stagnate?  My answer was neither.  It’s a bubble that’s ready to burst.  The issue is that the original intention and problem DevOps was seeking to solve is being forgotten and the focus has been shifted to new tools that want to jump on the band wagon hoping to claim they are the DevOps tool of the future.

What was the problem originally?  Software development!

There was a time when Agile was not as common place as it is today.  Most people used Waterfall to manage their projects.  Development cycles were defined into a long periods of development time to add all the features which was then passed to the test cycle, then the UAT cycle, then the NFT cycle, then pre-prod and finally after months of cascading down the rock face of QA to production.  This cycle could take months and result in companies only being able to produce software at most in bi-annual updates if they worked hard and didn’t hit any issues on the way.

As the world changed and customers began to expect new features yesterday, market demand began to be driven by trends and new starters popped up threatening larger companies market share it became clear that something had to be done to help developers produce features in smaller chunks so Agile began to become accepted.  But that wasn’t the only issue.

Many organisations faced issues releasing those updates so new methods of managing software releases had to be analysed and defined.  Sometimes the issues weren’t so much technological but down to the culture of the organisation, so ways of getting Dev teams and Ops teams to work together in parity became the key to making it all work together.  And thus DevOps was brought to the world.

So how has the focus of DevOps shifted from being about developing software to managing Infrastructure?

There are many factors but probably the main one is the explosion of the internet and connected devices.  At it’s concept, the internet was not what it is today and many of the tools created to support DevOps were geared towards desktop application development and server environments.  But then new devices came about.  Smart phones and other mobile devices became common place. The internet began to be something you could carry around in your pocket so web development began to gain a larger market share in application development.

As web traffic grew, it became clear that single dedicated physical servers could not cope on their own so load balanced clusters of web, application and database servers began appearing.  But these took up physical space and cost a lot of money to run, maintain, upgrade or replace, so those physical machines began to be migrated to Hypervisors, larger servers capable of hosting many virtual machines at the same time.  Organisations began to cluster their Hypervisors and ended up back at square one where they were with the dedicated servers so cloud companies formed, offering warehouses of clustered Hypervisors for customers to host their equipment on without the problem of worrying about upgrades.  Want more RAM? No problem.  CPU cores?  Here you go.

Then some bright people looked and thought, hey, why are we wasting so much overhead on the host emulating hardware and devices on Virtual Machines?  Can’t we just have a single mega computer and run processes and applications direct on the host in simple shipable containers?  Thus tools like Docker came about.

While these are all great tools for Ops and managing Infrastructure, they don’t help a great deal with developing and delivering software which is what DevOps should be doing.  How do containers aid Continuous Integration and continuous testing processes?  How does hosting on the Cloud help to ensure the integrity of the code base and that customer requirements are being met?  Configuration Management may help to streamline the deployment, but how does it help to ensure that the code being produced is always of a high quality and as bug free as possible?

These are the questions that DevOps should be seeking to answer.

As a DevOps specialist whether the client hosts on physical boxes, cloud VMs or containers should be irrelevant.  These are questions for the client to focus on depending on their business need.  They are entirely in the realm of Ops, and I certainly don’t get overly excited because a new container system or method of hosting VMs comes about.  What we need to focus on in our role as DevOps specialists are to ensure that the organisation is creating software that is driven by market demand, meeting customer requirements and can be quickly shipped out to said customer when QA deem the software fit for purpose.

We’re not super Ops people or sysadmins.  Infrastructure is a concern only when it becomes a problem to delivering software but should never be our main focus.

When we stop focusing on the Dev, we become Ops and this is why I see the bubble bursting very soon, when the world wakes up and asks the question, “What happened to the Dev?”

LEAN, mean DevOps machine

With all the noise and excitement over new tools being used it’s easy to overlook that DevOps is not just a technical role.  There are many aspects that sets being a DevOps specialist apart from being another form of Systems Administrator and it is one of these areas that I’m going to talk about today.

Lean is a methodology that is usually found in marketing and manufacturing.  Toyota is noted for it’s Just In Time (JIT) manufacturing methods which Ford also implemented into his early production lines.   But what is it and why is it so important for someone like myself?

The shortest explanation is that Lean helps you look at processes that form up how a function is performed and allow you to identify waste.  That is in wasted time, effort, resources, money etc.  To me it is a brilliant framework to help me diagnose what is wrong with the Delivery cycle in a company and start being able to implement the right tools, methods, strategies to bring about a robust and stable Continuous Integration and Delivery solution.  Knowing how to automate a process I feel is only half the battle.  Knowing what to automate is where the biggest gains can be made and Lean allows you to identify those areas that need the attention most.

Lean also forms a foundation for me to Measure.  At some point in the DevOps process you will be asked to identify improvements and justify the need for you in the organisation.  When I identify waste through Lean, I take that opportunity to also identify measurable metrics.  There may be a process in the deployment cycle that requires 2 or 3 members and takes 5 hours to complete.  This is an easy metric as you can identify an actual cost of that process by the number of man hours dedicated to it.  Time as they say is money and here you can clearly calculate a cost.  There may be many such processes in the organisation and Lean coupled with Measure allows you to identify what are the greatest wastes and the more valuable lowest hanging fruit to change first.

The future of computing is Collaberative

Once, there used to be 2 camps in the world of computing.  Lets call them the Softies and the Nixies.

On the one side you had the Nixies in their carpet slippers, pointing their pipes and scowling at the Softies on the other side who in turn wearing their polo shirts and denim jeans shook their heads back at the Nixies and never the twain should meet.  That was the world of IT 20 years ago when I first cut my teeth.

The world has revolved, time has passed and today things have changed dramatically.  Once Microsoft was a monopolistic megalith seeking world domination on your desktops, servers and even the very web pages you hosted.  They succeeded in the first instance, certainly in the enterprise they rule on servers, but they never could make a decent dent into the internet.  Today the internet is everywhere, on everything and our lives are being dominated by trends, fads, the need for information and to stay connected to our ever growing list of peers.

How has this effected the 2 camps?  I can only speak from my own experience, but certainly, the Nixies are no longer so scowly and the Softies are not so head shaky and there is even the first tentative blooms of respect between them.  But what caused this change of paradigm?

Microsoft caused it.  What?  Did I really say that?  I did.  You heard that right out of my own, well, fingers.  I do indeed believe in my own opinion that Microsoft has had a great part in helping to bring IT together.

This is all my own opinion so please don’t take what I say as lore.  I think what we have seen certainly over the last 10 years is the power of culture and how changing culture can lead to dramatic and long felt effects.  Once the Culture in Microsoft was very much they were top dog, did the right things and would take over the world.  And those attitudes filtered down through the organisation to the customers and to a certain extent that cultural attitude is still prevalent in a lot of people today.  Likewise, Linux culture viewed open source as far superior with no hidden agendas and would lead the way to taking over the world.  Clearly neither side are ruling the world.  Each dominates their own sphere of computing influence, but there is no clear winner at either side.  So how exactly did Microsoft lead the change?

Leading on from my previous statement on culture, CEOs and CTOs have retired, been replaced or moved on to better places.  Those coming up behind them had alternative goals and started to change the culture from the top which has already filtered down the organisation substantially.  This change is already permeating further into Microsoft’s customers and people are becoming more open minded about what devices they used.  Some may say that the reason for the change is due to the failure of Windows 8 and windows devices with the unified interface.  This may be partially true, I don’t know, but I am certain that a change in direction from the top was instigated and will continue to ripple through MS’s sphere of influence for some time to come.

Remember the Microsoft Loves Linux presentation followed closely by the release of dot net core?  Earlier on this year we heard about SQL Server and the recent release of PowerShell that had me all excited yesterday.  These are all signs on the increasing cultural change in Microsoft towards a more collaborative stance in working with customers.  Gone are the days when they would say “All this is mine, and all that you have over there will be mine”.  Today the clear message from Microsoft is “We respect your choice, so let us help you have more of a choice” and they certainly are offering many good alternative solutions for those who don’t want to lock themselves into a Wintel or Linux only type infrastructure.

Likewise, companies like Redhat are also striving to improve tools such as Ansible which were always considered to be Linux only tools by adding Windows support.  Times have changed and it is clearer than ever that through working together with Customers, not against them, Microsoft is altering direction.  What is next?  Who knows! But I can’t wait to find out!

These are indeed interesting and exciting times we’re living in!