Monday, January 24, 2011

PageBlobs with arbitrary size

PageBlobs have many useful advantages. For example: suspend able uploads can be realized with them or one might receive parts of the blob without having to download the whole blob.
But PageBlobs have some restrictions like that their contents have to be of a size that is a multiple of 512 bytes. In this sample, I would like to show you how to use PageBlobs for files of any file size.

At first we need a function that converts a file size up to the next multiple of 512 bytes.

private long GetPageBlobSize(long size)
{
// determine how many pages are needed
var numberOfPages = size / 512;
// if there are any bytes left, we need one more page
if (size % 512 != 0)
numberOfPages += 1;

return numberOfPages * 512;
}



Then we have to create a method that creates a PageBlob in the cloud, stores the actual file size in the metadata of the PageBlob, fills up the file with zero-bytes to create a file that has a size as a multiple of 512 and uploads the data.

// client initialization set aside
private CloudStorageAccount _storageAccount;
private CloudBlobClient _blobClient;

public void UploadFile(Stream filestream, string blobUrl)
{
// create the blob with proper blob size
var blob = _blobClient.GetPageBlobReference(blobUrl);
var blobPageSize = GetPageBlobSize(filestream.Length);
blob.Create(blobPageSize);

// note in the blob what the real file size is
var realFileSize = filestream.Length;
blob.Metadata.Add(“contentlength”,
realFileSize.ToString());
blob.SetMetadata();

// store how many bytes need to be “filled up”
// to complete the last page
var bytesToFillUp = blobPageSize – realFileSize;

long offset = 0;
var buffer = new byte[512];
while (fileStream.Read(buffer, 0, buffer.Length) == 512)
{
// a “full” page has been read
var pageContent = new MemoryStream(buffer);

// write this page to the blob
blob.WritePages(pageContent, offset);
pageContent.Close();
pageContent.Dispose();

offset += buffer.Length;
}

// now there is either no page to read (file was 512 aligned)
// or the rest of the file has been loaded
if (bytesToFillUp > 0)
{
// there is still data left
var restContent = new MemoryStream(buffer);

// ! there are bytes left from the last “full” page read
// ! in the buffer. these should be replaced by zeroes
// ! that will be removed on reading the blob
restContent.Seek(bytesToFillUp, SeekOrigin.End);
while (restContent.Position != content.Capacity)
restContent.WriteByte(0);

// write the rest to the blob
blob.WritePages(restContent, offset);
restContent.Close();
restContent.Dispose();
}
}



And finally a method that downloads the file, removes the zero-bytes again and returns the proper content of the file.

public Stream DownloadFile(string blobUrl)
{
// get the blob reference and its metadata from the storage
var blob = _blobClient.GetPageBlobReference(blobUrl);
blob.FetchAttributes();

// get proper file size
var realFileSize = long.Parse(blob.Metadata[“contentlength”]);

var blobStream = blob.OpenRead();
var blobPageSize = blobStream.Length;

var pages = blobPageSize / 512;

// create a memory stream to be returned
var returnStream = new MemoryStream((int)realFileSize);

// read all pages but the last one
var buffer = new byte[512];
while (pages > 1)
{
bs.Read(buffer, 0, buffer.Length);
returnStream.Write(buffer, 0, buffer.Length);

pages--;
}

// now read the last page and write only necessary bytes
bs.Read(buffer, 0, buffer.Length);
returnStream.Write(buffer, 0, (int)(512 – bytesToIgnore));

return returnStream;
}




 

1 comment:

  1. This was a very helpful post! I would like to point out that, if you don't mind modifying the stream, you can just call the SetLength function in the stream, which will truncate a stream or zero-fill it out to the end.

    During upload you can replace your two for loops with:
    stream.SetLength(GetPageBlobSize());
    blob.WritePages(stream, 0);

    And during download you can replace most of the end of you function with:
    blob.DownloadToStream(stream);
    stream.SetLength(realFileSize);

    Of course, my method only works if your stream supports modifying its length which, according to http://msdn.microsoft.com/en-us/library/system.io.memorystream.setlength.aspxnot all stream classes do, so your mileage may vary.

    ReplyDelete