Bad development drives bad infrastructure decisions

The relationship between Development and IT Operations is a mutually inclusive one in that a decision that is made on one side can effect the other, not always for the better.

With all the focus in the world of DevOps seemingly on the Operations and infrastructure side I would like to take a moment to discuss some of the fundamental issues that can effect how your DevOps process is rolled out if focus isn’t also applied to Development as well.

Moving an organisation away from Waterfall to Agile is no mean feat.  It entails a dramatic shift in mind sets and organisational culture.  The adoption of new methods of working will cause many to find it overwhelming if thrown right into the deep end.  Fortunately the saving grace of DevOps is not only to help them move towards incremental changes to their products but to do so in small managed incremental changes it’s self.  Lead by example.

Yet even managing this transition of your infrastructure in bite sized digestible chunks can be thrown off track if fundamental issues in Development aren’t addressed first.

Development first

You have your new green field project and the go ahead to “do the DevOps thing”.  It’s time to plan your infrastructure right?

Not so fast, and here’s why.

While your Development team may already be functioning in an Agile fashion, holding scrums, code reviews and automated testing within their own controlled environments, it may not always mean that what is being produced by Development is in a state that can be readily absorbed by your engineering teams to deploy to live.  And this can be the root cause of many poor Infrastructure design decisions.

The usual Waterfall environment map may have anything between 5-7 environments to validate the build, stability and acceptability of release candidates. In that world you have a lot of quality gates to capture poor releases and as release cycles can stretch into months there is always time to capture defects and get them fixed before they hit production.

Under Agile our goal is to reduce that lead time dramatically and shorten the development cycle producing features on demand as and when they are required by the customer.  All good in theory, however it also means removing a lot of quality gates which until this point have been helping you stop poor release candidates getting through the door.  Under this scenario it can be the natural response for the architect to add additional quality gate environments into the map to help capture those instances.  While all good in theory, this is not the ideal response when trying to create LEAN, Agile infrastructure.  Instead of saving money, you will end up spending more on those additional environments to try and capture these bad RC instances.  In effect Infrastructure is still having to over engineer to protect themselves from bad development practices.

If this sounds familiar to you then your work with Development is not quite over.  In my experience I’ve identified the following key areas to sanity check and help drive better release candidates

Automated testing

There is no question as the importance of testing your code.  BDD and TDD will help developers create more efficient LEAN code, but are they testing correctly?

There is a world of difference between Unit and Functional testing and they should both be used in tandem.  While Functional testing can give you a good idea of how the code is functioning as a whole, without Unit tests it becomes harder to validate the individual stability of each component and you could be missing some important information to help diagnose bugs and defects.  For example a method could be changed in a controller which causes methods in a model to behave differently. Without Unit tests to protect the individual functionality of each method it is harder to diagnose where the fault is if the developer who implemented the change is unavailable.

Code review

I really can’t stress how important the role of the code review is to the release process.  While automated testing can capture issues with the functionality of code at the high level, it is difficult for automated tools to definitively assess whether the written code is of a high enough standard to pass through to production.  A more senior developer should always review code to ensure that what has been written meets coding standards, does not contain hard coded passwords and is of a sufficient quality to merge into the next Release Candidate.

Post-release feedback

Lets say that a release candidate has gone through the testing phases without incident but then is deployed into Production and something goes wrong.  A fix is found and the release is finalised.  Is that information being passed back to Development?

This sounds like the most basic of concepts yet in too many organisations I’ve worked with, this fundamental and crucial stage in the release process is missing.  It’s an area that underlines the importance of a proper implementation of DevOps methodologies.  In the test case we looked at earlier where Infrastructure are over engineering environments, it could simply be that the Developers have not been informed what went wrong therefore continue to make the same mistakes.  Putting in place that feed back loop allows the Developers to raise these issues in code reviews to ensure that the defects encountered can be alleviated in future.

The onus is now back on the Development team to fix their process but in a manner compatible with shared ownership as the Ops/Infrastructure team are actively engaged with Development to resolve release issues at root cause.  The dialog then allows Architects to simplify the environment map to the Minimum Viable Product so that extra environments aren’t required to try and catch development mistakes.

Summary

Effective communication is vital at all stages of the release process and attempting to fix fundamental development issues by over complicating infrastructure is most definitely not the correct way to go.  DevOps should always be used as an aligning tool to bring both sides of the product cycle into balance and ensure successful growth and stability.

Agile Waterfalls

There are many businesses today that still use Waterfall for managing software development projects.  While Agile has been around for a long time, Waterfall precedes it by a large margin and it is still the mainstay for larger organisations.  There are many reasons for this such as the fact that it fits in nicely with ITIL or Prince2 project strategies but more likely it was the strategy that was employed with legacy products and due to constraints to the project, customer requirements or technological debt can’t be easily migrated into a fully Agile approach.

Having said that, all is not lost in being able to bring aspects of Agile software development into the Waterfall structure and build the foundations of eventually moving towards a full Agile solution.

The usual Route To Live layout of projects that are controlled via Waterfall consists of multiple environments, usually labeled Dev, Test, UAT, PreProd, Production and Training.  The general flow will consist of sprint cycles on Dev and the changes then promoted to TEST where defects are identified and fixed.  Then UAT after the test cycle and so on until all the Pre-Production criteria are met and the business gives the go ahead to deploy to production.  In some cases Training may be deployed to before Production in order to bring staff up to speed on the new features about to hit production.

When viewing the whole software delivery cycle from this high vantage point it is difficult to see how Agile can make any beneficial change without breaking the whole delivery process wide open.  If your purpose is to completely replace Waterfall then yes this will inevitably happen.  The good news however is that Agile can be what you make of it.  Like ITIL, it is not necessarily a full list of dos and don’ts, but more of a framework on which you can build your business logic and processes on.  With this in mind it is possible to be more Agile within a Waterfall delivery system.

Redefining customer

From the business point of view, the customer is the end user or client who will be using your delivered product or service.  When working with Waterfall this external entity is generally the focus and the process is to ensure that the customer gets what they asked for or are expecting.

To begin implementing Agile it helps to fully understand and redefine what we’re talking about as the customer.

Taking on the aforementioned definition, we can define a customer as anyone who is expecting something from the work being undertaken.  In the case of the development team, they are producing features and implementing changes according to the functional specifications provided.  From their point of view, the customer would be the Testing team.  Testing feed into QA and QA feed into operations.  By defining these relationships we can begin to start seeing how Agile can be used to aid the process of delivery in a macro level by allowing the focus of “customer” to be narrowed down.

Agile in Development

Now that the focus of the customer is on the needs of the testing team, it is possible to start breaking down the requirements of Agile development into deliverable and achievable goals.  This can take on the form of redefining how sprint cycles are manged, Continuous Integration and build servers, automated unit testing and finally how changes are promoted to downstream environments.  Rather than trying to create a solution that will encompass the whole Waterfall structure, we’re creating macro solutions per environment where we can start bringing about tangible, measurable changes that won’t effect the overlying business logic and project plan.  To some, this may seem to be a bit of a cop out, however, when considering what it is we’re trying to achieve with Agile, it soon becomes apparent that this is a very Agile way of working within a predefined project structure.

Source control branching strategies

Probably the biggest issue that can be encountered with Waterfall and continuous integration strategies is how Branches are used as part of the environmental promotion.  I lean strongly towards using the Git-Flow branching methodology where there are 2 main permanent branches usually labeled master/production and development.  Development is where all the main line evolving features are implemented and master/production should be reflective of what is currently the state of the live production system.

I have seen situations where companies have adopted git as the source control then proceeded to create a branch per environment.  While this may seem a simple and logical step to take, it is by my opinion one of the hardest branching strategies to maintain.

Consider the flow of changes through that type of implementation.  Development finalize a release cycle and merge it into Test.  During the test cycle defects may be uncovered and fixed then those changes are merged into the UAT branch and hopefully back to Dev.  In UAT more defects are uncovered and fixed before being passed to PreProd.  But at this stage those UAT changes not only have to go back to Test but also Dev.  The further down the fall we go, the more merges are required to back port the fixes to previous branches and it can quickly escalate out of control when we have multiple releases going down the Waterfall behind each other.  In the worst case scenario a defect may be fixed in Dev which was also fixed with an earlier feature cycle in a downstream environment and then we end up with merge conflicts.  The whole process can get very messy and difficult to manage.

By far the easiest means of managing feature promotion through multiple environments is through Git-Flow’s use of Release branches.  As mentioned there are only 2 main permanent branches in existence, but there are also a number of transient branches called Feature, Release and Hotfix branches.  A good definition of the Git-Flow strategy can be found here.

It is then this release branch that is making the journey down the waterfall rather than attempting to move the change its self through multiple merges to environment branches.  These release branches can be tested, defects resolved and passed to downstream environments more efficiently than constantly merging changes across branches and the end result is that any defects found in the branch only have to be back merged to a single location, the Dev branch.  This won’t resolve all merge conflict issues such as the one mentioned before so effective communication between the Dev and Test teams is a must, but there is no reason why at the end of each environment cycle, changes can’t be merged back to Dev before moving forward.

The same can be said with relation to Hot fixing issues in Production.  The Hot fix will be cut from the Master branch and this can then be passed through the environments to be checked, tested and accepted easier than trying to revert each environmental branch back to production’s current state (which I have seen in the past) and then reapplying weeks of commits back after the fix to prepare the environments for the next cycle of feature releases.

Testing in Dev vs Test environment

The importance of automated Unit and Functional testing is apparent to everyone working within software development and is now becoming a normal aspect of all development environments.  Even User Acceptance Testing can now be automated to a certain extent allowing more of the applications features to be tested at the point of commit.

There are many Continuous Integration servers out there to pick from and so many ways of testing that the hardest part of creating an automated test platform is in picking the right tools to use for your software development requirements.

But a very serious question has to be asked here.  If it is possible to now test almost all aspects of the product with the use of CI servers and automated testing, is there a need for the Testing environment and team?

The answer to this question can all be down to how comprehensive and thorough the automated testing solution you’re implementing into Dev are going to be.

Effectively by adopting more and more of the functional and non functional testing into the Dev environment as part of the development process through automated testing, the lower the requirement will be for an extra testing team to handle those tasks in a separate environment.  In this scenario, it starts to become a very feasible option to deprecate the Test, UAT and Pre-Production environments and employ the Test team to write the functional and non functional tests against the CI server where all this testing is being carried out.  The byproduct of this approach is that an organisation can then merge Test and QA into a single function and the role of the tester is to ensure that the quality of the tests being ran and verifying the UAT function meets project requirements.

Summary

While we can’t blanket implement Agile into Waterfall development cycles, we can bring aspects of Agile to each environment in a way that the accumulative effect of the changes can be felt throughout the whole Route To Live.

It may not be possible to completely deprecate the use of multiple environments for development and testing immediately, but we can bring in elements of testing into the Development environment in a staged and controlled manner and through this process have a very real possibility of removing a large section of the Waterfall structure effectively adopting a more recognizable Agile development environment.

The end result of continuous automated testing carried out within the development environment and the migration of the Test function to a QA role, we can reduce the number of environments utilized in the delivery of the product down to 3, Dev, Production and Training.