Friday, June 10, 2011

Ensuring only one WorkerRole instance performs a task at a time

As a developer in a cloud environment one has to deal with several issues. Using multiple instances that can be shut down and transferred to other devices, up- and downscaling make it impossible to identify a single instance and be sure that it’s up and running. The cloud environment only ensures that the necessary amount of instances is available all the time.

Thinking of a scenario where special tasks are to be performed regularly or where those tasks may only be performed by a single machine at a time, it is essential to identify a “master” that will execute the tasks. So how to deal with this?

This example shows how to solve this problem with a file lock approach. Each instance wakes up after a given period of time and checks if it can become the “master” instance that will perform the necessary tasks. If this instance can be the “master” it puts a lock file into the cloud storage, performs the tasks and deletes this lock file again.
If an instance finds an existing lock file when checking for becoming the “master”, it does not perform any task.

Here is how you can implement this behavior step by step:

  1. Create a new Windows Azure Project in Visual Studio 2010 and add a single WorkerRole to this project.
  2. Inside the WorkerRole.cs, prepare to connect to the cloud storage by declaring three static members containing information about where to place the lock file and how this file is to be named:
      1: public class WorkerRole : RoleEntryPoint
      2: {
      3:     /// <summary>
      4:     /// Determines the container where the block file will be placed
      5:     /// in the cloud storage
      6:     /// </summary>
      7:     private static string blockFileContainer = "tasksample";
      8:     /// <summary>
      9:     /// Determines the name for the block file
     10:     /// </summary>
     11:     private static string blockFile = "block.ext";
     12:     /// <summary>
     13:     /// Represents the full path to the block file
     14:     /// </summary>
     15:     private static string blockFilePath = blockFileContainer + "/" + blockFile;
     16: 
     17:     ...
     18: }

  3. In the same file create an CloudBlobClient to access the storage.
      1: public class WorkerRole : RoleEntryPoint
      2: {
      3:     ...
      4:     /// <summary>
      5:     /// Client to access the blob storage
      6:     /// </summary>
      7:     private CloudBlobClient blobClient = CloudStorageAccount.DevelopmentStorageAccount.CreateCloudBlobClient();
      8:     
      9:     ...
     10: }

  4. Now we need to define for each instance in what interval it will check for becoming the “master” and performing the tasks. Since the development fabric is quite fast and all instances will startup nearly at the same time, initializing a randomizer by time won’t work here. In the real cloud it might be different.
    As a workaround we’re going to initialize a randomizer depending on the ID of the instance it belongs to. The typical ID of an instance in the development fabric is for example “deployment(19).MyProject.WorkerRole.0” where the 0 determines that this instance is the first one in the deployment. So the randomizer for this instance will be initialized with 0 as a seed.
    Then we choose a random value between 10 and 30 seconds.
    The code looks as follows:
      1: public class WorkerRole : RoleEntryPoint
      2: {
      3:     ... 
      4: 
      5:     /// <summary>
      6:     /// Numeric ID of this instance (might only work in development fabric)
      7:     /// </summary>
      8:     private static int instanceID = int.Parse(RoleEnvironment.CurrentRoleInstance.Id.Substring(RoleEnvironment.CurrentRoleInstance.Id.LastIndexOf(".") + 1));
      9:     /// <summary>
     10:     /// Milliseconds this instance needs to wait until trying to perform task
     11:     /// </summary>
     12:     private static int millisecondsToWait = new Random(instanceID).Next(10000, 30000);
     13: 
     14:     ...
     15: }

  5. Visual Studio prepares the WorkerRole.cs file so the the method “OnStart”  already is overwritten in the template. In this method we need to make sure that the container where we want to store the lock file exists.
      1: public class WorkerRole : RoleEntryPoint
      2: {
      3:     ...
      4: 
      5:     public override bool OnStart()
      6:     {
      7:         ...
      8: 
      9:         // make sure that the file lock container exists!
     10:         blobClient.GetContainerReference(blockFileContainer).CreateIfNotExist();
     11: 
     12:         return base.OnStart();
     13:     }
     14:     
     15:     ...
     16:  }

  6. The “Run” method is also already implemented. Here we need to execute our logic: At first the instance goes to sleep for the determined amount of time. After waking up, it will check if it can perform the tasks. If yes it will block other instances, perform the tasks and then delete the lock again.
      1: public class WorkerRole : RoleEntryPoint
      2: {   
      3:     ... 
      4:     public override void Run()
      5:     {
      6:         // This is a sample worker implementation. Replace with your logic.
      7:         Trace.WriteLine("ProcessWorker entry point called", "Information");
      8: 
      9:         while (true)
     10:         {
     11:             // wait
     12:             Trace.WriteLine("Waiting for " + millisecondsToWait + "ms", "Information");
     13:             Thread.Sleep(millisecondsToWait);
     14: 
     15:             // check if this instance should perform the task
     16:             // by trying to get a file lease in the cloud storage
     17:             if (CanPerformTask())
     18:             {
     19:                 // block other instances from performing the task
     20:                 BlockOtherInstances();
     21: 
     22:                 // perform the task
     23:                 PerformTask();
     24: 
     25:                 // release block to allow other instances to perfrom the task
     26:                 ReleaseBlock();
     27:             }
     28:             else
     29:             {
     30:                 // we are not allowed to perform this task
     31:                 Trace.WriteLine("May not perform task!", "Information");
     32:             }
     33:         }
     34:     }
     35:     ...
     36: }

  7. The function “CanPerformTask” checks if the given lock file exists in the blob storage by trying to fetch its attributes. If the attributes can be retrieved this file exists, otherwise an exception will be thrown.
      1: /// <summary>
      2: /// This function determines if this instance can perform the task
      3: /// by checking if any other instance is 
      4: /// </summary>
      5: /// <returns></returns>
      6: private bool CanPerformTask()
      7: {
      8:     Trace.WriteLine("Checking...", "Information");
      9:     // check if the locking file exists
     10:     try
     11:     {
     12:         // try to get the attributes from the lock file
     13:         // to check if the file exists
     14:         blobClient.GetPageBlobReference(blockFilePath).FetchAttributes();
     15:     }
     16:     catch
     17:     {
     18:         // the lock file does not exist -> this instance may perform the task
     19:         return true;
     20:     }
     21: 
     22:     // the blob exists -> this instance may not perform the task atm
     23:     return false;
     24: }

  8. In the method “BlockOtherInstances” we create new PageBlob with the size of 0 bytes and store in a metadata attribute which instance created this blob. That way we make sure that only the same instance that created a lock file can delete it again.
      1: /// <summary>
      2: /// This method blocks other instances from performing the task
      3: /// by creating the file lock.
      4: /// </summary>
      5: private void BlockOtherInstances()
      6: {
      7:     Trace.WriteLine("Blocking other instances", "Information");
      8: 
      9:     // create a new blob at the lock file url with size 0
     10:     // and note in the properties that this instance created the lock file
     11:     var block = blobClient.GetPageBlobReference(blockFilePath);
     12:     block.Create(0, new BlobRequestOptions() { BlobListingDetails = BlobListingDetails.All });
     13:     block.Metadata["CreatingInstance"] = RoleEnvironment.CurrentRoleInstance.Id;
     14:     block.SetMetadata();
     15: }

  9. When releasing the lock again in the “ReleaseBlock” method, we check if the instance that intends to delete the lock is the same that created this lock. If the instances match, we delete the page blob again.
      1: /// <summary>
      2: /// This method releases the lock file so that other instances
      3: /// can perform the task.
      4: /// </summary>
      5: private void ReleaseBlock()
      6: {
      7:     // get the block file and its attributes
      8:     var block = blobClient.GetPageBlobReference(blockFilePath);
      9:     block.FetchAttributes();
     10: 
     11:     // check if this instance created the block
     12:     if (block.Metadata["CreatingInstance"] == RoleEnvironment.CurrentRoleInstance.Id)
     13:     {
     14:         Trace.WriteLine("Deleting block file", "Information");
     15:         // this instance created the block > delete it
     16:         block.Delete();
     17:     }
     18: }

  10. Last but not least: performing a task in this example means writing a message to the trace and waiting for 5 seconds.
      1: /// <summary>
      2: /// This method represents the task that may only be performed by a single
      3: /// instance at a time.
      4: /// </summary>
      5: private void PerformTask()
      6: {
      7:     // for demonstration purpose as a task
      8:     // we only write an information to the trace.
      9:     Trace.WriteLine(String.Format("Performing the task at {0}", DateTime.Now.ToString()), "Information");
     10:     Thread.Sleep(5000);
     11: }

Now feel free to set up as many instances as you wish and see how this example works. Here is a screenshot running two instances. The first one has an interval of 24,5 seconds and the second one waits 14,9 seconds before trying to perform the task. Both instances perform the tasks until the first one finds an existing file lock from the other instance, which is already performing the task at 13:27:22. So it goes to sleep again…


blocking


You can download the source code of this example project here: PerformingSingleTask.zip

No comments:

Post a Comment