Tuesday, November 26, 2013

2013-11-26 Tuesday - Deployment Optimization

A nice little nugget of a problem was handed to me today: identify ways to help an operations team reduce their system maintenance / deployment window [for production system updates] that has somehow grown to require a xx-hour window, and achieve zero downtime (or as close as possible).

The environemnt is complicated in the extreme: highly regulated industry, compliance requirements, clustered servers, high availability, PCI security zones, 3rd party software/service providers, cloud service providers/integrations (SaaS and PaaS), frequent commercial software upgrades/patches, vendor constraints on database schema changes, disaster recovery dependencies, a legion of upstream and downstream data integration dependencies.

For the last year I've been carefully planting seeds of certain ideas in various conversations with key stakeholders within an organization - to begin the gradual introduction of concepts and practices such as DevOps, Continuous Deployment, and Continuous Operations. Now that a sufficient level of pain has been experienced, there is a broad consensus and acceptance that there needs to be change.

"He was not in a hurry, 'hurry' being one human concept he had failed to grok at all. He was sensitively aware of the key importance of correct timing in all acts — but with the Martian approach: correct timing was accomplished by waiting."
Stranger in a Strange Land, by Robert E. Heinlein

I have some ideas, but as a good researcher, first order of business is to review current directions, trends, peer articles.  This posting will be a place for me to share some of the information that may be of interest to others:

Zero Downtime, Instant Deployment and Rollback

Jevgeni Kabanov (ZeroTurnaround)
Pragmatic Continuous Delivery, at W-JAX 2012

Continuous Operations for Zero Downtime Deployments

The Virtualization Practice

Deploying the Netflix API

Cloud Architecture Tutorial
Constructing Cloud Architecture the Netflix Way
Gluecon May 23rd, 2012, by Adrian Cockroft

Cassandra in the Netflix Architecture, Denis Sheahan
CassandraEU London March 28th, 2012

Patterns for Continuous Delivery, Reactive, High Availability, DevOps and Cloud Native Open Source with Netflix OSS
Adrian Cockroft + Ben Christensen, YOW! Workshop Dec'2013

Best Practices for Zero Risk, Zero Downtime Database Maintenance

VMware vSphere High Availability 5.0 Deployment Best Practices

Free Ebook: Continuous Delivery — What It Is and How to Get Started

The Phoenix Project, A Novel About IT, DevOps & Helping Your Business Win

How Draw Something Scaled to 50 million New Users, in 50 Days, with Zero Downtime

I Ain't Afraid of No Downtime: Scaling Continuous Deployment, by Cody Powell

Mandi Walls free ebook, Building a DevOps Culture [Kindle]

Daily Dose of DevOps: 27 People to Follow on Twitter

Selected QCON 2013 San Francisco presentations:

Adopting Continuous Delivery, Adjusting your Architecture
Rachel Laycock, ThoughtWorks
 Build Your Own PaaS the Netflix Way
Sudhir Tonse, Manager, Cloud Platform Infrastructure, Netflix
Facebook Infrastructure
Pedro Canahuati, Director, Infrastructure Operations

  • Improved checksum performance
  • CORE-1509: Significantly decreased memory usage, especially with large sql files
  • CORE-1533: Performance improvements in dropAll

ZeroTurnAround's LiveRebel:

  • "Log4j 2 can automatically reload its configuration upon modification"
  • "Log4j 2 contains next-generation Asynchronous Loggers based on the LMAX Disruptor library. In multi-threaded scenarios Asynchronous Loggers have 10 times higher throughput and orders of magnitude lower latency than Log4j 1.x"

  • Note the performance benchmark results recently posted on takipiblog.com

Puppet Labs:

No comments: