Blobs
- Two kinds of storage in Azure. SQL Data Services and Windows Azure storage. This is about the base storage offering
- Three kinds of storage, blogs, tables and queues
- All accessible via a REST API
- Access secured via 256 bit (SHA256) key
- Two separate data centres in US (Northwest, Southwest)
- Affinity for storage and computation to reduce latency (available April)
- Blobs – named objects, accounts, containers and blobs. (containers are like S3 buckets)
- Tables – structured storage (like SimpleDB)
- Queues
- Sharing policies are set on a container basis
- 8kb of name/value pairs can be associated with each container
- Listing abstractions for blobs in a container
- Blob name space http://<Account Name>.blob.core.windows.net/<Container>/<BlobName>
- Blobs can be up to 50GB in size
- PutBlob, GetBlog, DeleteBlob
- 8kb of metadata per blob
- Support for MD5 checksum native to Storage API
- Can use range get to retrieve a part of the blog
- Support for block level upload to allow interruptible uploads (S3 doesn’t do this)
- PutBlock 1-N, then commit with PutBlockList
- Blocks can be uploaded out of order or in parallel
- Blocks can be uploaded twice, newer overwrites older
- PutblockList will delete unused blocks
- Blocks can be up to 4MB
- Blocks can vary in size
- Each block has a 64 byte ID, scoped by blob name
- Overlapping get and put? Get will always see a single version of the blob. So while put is in process old blob is all that is seen
- First PutBlockList wins in the case where multiple PutBlockLists occur.
- Conditional Put/Get operations to support optimistic concurrency
- Use a hash of Block to represent block ID
Tables
- Billions of entities, TB of data
- Highly available, durable
- Account, table, entity are the key concepts
- Table names are scoped by storage account name
- A table is a set of entities (rows)
- A entity is a set of propeties (columns)
- Every table has a partition key column
- Table partition, all entities in a table with the same partition key
- Application controls granularity of partition key
- A heavily partitioned table makes it easier to load balance
- Entities in the same partition will be stored together
- Multiple operations over multiple entities can be handled atomically in the future
- Partition key and row key gives primary index
- If partition key is part of query its fast, if it isn’t then the query ends up scanning
- Each entity can have up to 255 properties, mandatory properties are partition key and row key
- All entities have a system maintained version
- No fixed schema, just name/value pairs
- Access via ADO.NET Data Services (supports REST API)
- Default number of connections is 2
- 100-continue is default. Turn this off to save round trips.
- Turn tracking off for read only queries
- Bug in ADO.net relating to de-serialisation fix is to name the entity class the same as the table name
- Be prepared for partial results from your queries
- Query is limited to 60 seconds. After this results are returned and you must continue to get the rest
- Not a relational database, no joins, foreign keys
Queues
- Web Roles, Worker roles
- Reliable message delivery
- Access via REST
- Account, Queue, Message
- No limit on messages in queue
- A message is stored for at most a week
- Messages <= 8kb
- http://<Account>.queue.core.windows.net/<QueueName>
- Create/Delete/Clear Queues
- Enqueue/Dequeue/Delete
- Dequeue makes message invisible. You delete after processing. If delete doesn’t get called timeout will make message visible once invisible time expires.
- Queues are designed to be idempotent. Each message can be processed at least once, may be processed twice.
- No fixed ordering for dequeue of messages, but approximates to FIFO
- Use queue length to scale your worker tasks