Job Scheduler

All Roads Lead to the Cloud - Cloud Automation

Imported from http://consultingblogs.emc.com/ published Apr 25 2010

 

There has been and continues to be a huge amount of material generated regarding the 'Cloud'. There are many definitions (http://en.wikipedia.org/wiki/Cloud_computing#Private_cloud) and I am not going to repeat them here. As part of EMC Consulting, we see many different approaches from clients on how to reach the characteristic functions of the cloud. There tends to be a lot of focus currently on the theme of 'control' within the cloud.

 

As I mentioned in my last blog, the folks that have been certified in virtualization products (in the mainstream VMware, Citrix or Microsoft certifications....and there are others of course) tend to start using the main management consoles provided with the virtualization solution or a 3rd party product that has integrated API functionality used for management of all/part of the computing real estate. This is a bottom-up approach and works reasonably well. There are other more programmatic approaches, in the sense of an overlay to the computing resources being created. A portal-style application is used to capture service instructions and process them. This is a middle up-down approach focused on the existing IT landscape.

 

However, what strikes me as being of significance is that there are many roads leading to the cloud, and that it is perhaps necessary to understand that many of those journeys are mandated by the current contingencies operating within different organizations. The unique mix of market needs, skills and the configuration within an organization tend to lend shape to the transformation approach leading to the cloud. Interestingly, these initial forays into this domain provide significant learning experiences for organizations, ultimately allowing them to determine which cloud configuration will best support their business ambitions. This is also true of organizations operating within the same market - they all take a different internal approach to building out their clouds.

 

However, when an organization has been tasked with creating the necessary capabilities allowing cloud transformations to take place on very large scales, then some upfront thought is definitely going to pay off. In the race for product sets providing panaceas to cloud control, some of the good 'ole fashioned computing management lessons learned over the last 30-40 years tend to be pushed out of scope, although they may well still be relevant.

 

One of the areas that I was reminded of the other day in discussion with some vendors and clients was how to control the various activities within the cloud - once it is actually thereWink I was struck by the incredible complexity and simplicity of this statement. Back in the early 80s I used to work on IBM mainframes, and many of the characteristics of the cloud that we see had some of their early, and arguably from a GUI perspectiveSurprise, primitive beginnings. I recall that job scheduling was a big thing at that time! There were literally thousands of activities taking place in the background that nobody was aware of, and they kept the business running.

 

In a cloud, once the infrastructure levels are instantiated, and the virtual compute resources have been apportioned to specific guest operating systems within a virtual machine container (yes - I know there are other ways of giving resources in the cloud - just taking this one as an example as most organizations are familiar with this) - the fun really starts. So let's take this further. We have suddenly 10,000 virtual machines running server operating systems, and another, say, 100,000 virtual desktops running in our cloud. Great stuff - well done folks!Cool

 

Well, as most administrators and IT shops know, the work is just starting. There are all the activities regarding data backup, replication of data, servicing restores, rolling out anti-virus updates, controlling the flow of agents within each of those machines (e.g. update programs running on desktops offering to update the Adobe Acrobats of this world, and indeed the operating system itself all directed at a limited number of source machines), patching and the list goes on and on.

 

There are many ways to deal with these types of activities, but ultimately they come back to some form of console where these unique events are scheduled. For example, typically backups are grouped, scheduled and hopefully executed. Reporting on an exception basis focuses the administrator on potentially re-running some of the failed backups. This could be partially automated using semi-automatic event-driven intelligence - where specific alerts generate specific actions - that are then triggered and managed - much like a scheduled job.

 

As you can see, some of the typical stuff that IT shops have been doing over the years are still relevant. Don't get me wrong here; there are other ways of doing things. Indeed the paradigm of data protection through backup has seen substantial revision in the last years with the widespread use of disk media technologies. However, the reality at IT shops is still to have control and accountability of the backup process. Control is a very important part of IT Service Delivery displines in the sense of reporting to your business service clients (internally or externally) that you are doing what they are perhaps purchasing as a service, and that the service is running 'just fine!'Big Smile

 

The point here is that the need for massively scalable job scheduling in the cloud providing event/schedule driven activity intelligence is definitely still there. IT operations would have a very difficult job of actually being able to control the potentially millions of operational activities that need to take place daily. Ensuring for example that all virtual machines are backed up, and providing the reporting data to management with a breakdown per business unit, utilizing cost and performance dimensions is potentially a 'job' that would need to be run at a certain time. This stuff does not just happen on its own automagically!

 

I was speaking about this theme with a particular vendor UC4 (you can find these folks at http://www.uc4.com/home.html and there are others in the market of course - but the beer was very good in Belgium of course - thanks Lennaert De JongWink and we were discussing the backup 'job' when there are potentially hundreds of thousands/millions of clients. Never mind that the technological way of realizing this would probably differ vastly from the traditional backup program approach of streaming to storage medium. The task itself was still there. In such a large cloud environment, I realized that the all the tricks of the datacenters, ICT shops and service providers still apply - with some significant modifications needed.

 

However, the sheer scale under discussion requires the effective means of control - this is absolutely essential. Think about it - patching a million virtual machines in the cloud that require a critical patch may not allow the luxury of rolling out the patch (hopefully regression tested first please) in small groups of machines, verifying if that is ok, and then rolling out to larger and larger groups.

 

The patch in question may be against a particularly virulent viral infection. There may well be twists and turns in the logic such as' patch-if ok reboot- if not ok bring back a previous image of the machine - patch again - if still failing power off virtual machine and call your nearest IT Virus Buster through an alerting mechanism'. The poor IT administrator may potentially get thousands or millions of alerts in this way. Basically, the IT operation could be swampedAngry.

 

It definitely pays dividends for organizations embarking on the cloud transformation to ensure their IT house has been brought in order to handle massive numbers of parallel events. Even simple activities that currently take place in organization such as file transfers can on this scale become a seriously complex issue when things start to go wrong.

 

So scale as well as preparing for things going wrong and mapping these to some of those traditional ICT management skills will certainly help to move further on the cloud journey. Go on, don't be afraid to dust off some of that 'old' knowledge and get it working again for the cloudSmile

 

 

 

Disclaimer

The opinions expressed here are my personal opinions. Content published here is not read or approved in advance by EMC and does not necessarily reflect the views and opinions of EMC.