5    Overview of Cloud Storage

5.1   Introduction

When discussing cloud storage and standards, it is important to distinguish the various resources that are being offered as services. These resources are exposed to clients as functional interfaces (i.e., data paths) and are managed by management interfaces (i.e., control paths). This international standard explores the various types of interfaces that are part of offerings today and shows how they are related. This international standard defines a model for the interfaces that may be mapped to the various offerings and a model that forms the basis for cloud storage interfaces into the future.

Another important concept in this international standard is that of metadata. When managing large amounts of data with differing requirements, metadata is a convenient mechanism to express those requirements in such a way that underlying data services may differentiate their treatment of the data to meet those requirements.

The appeal of cloud storage is due to some of the same attributes that define other cloud services: pay as you go, the illusion of infinite capacity (elasticity), and the simplicity of use/management. It is therefore important that any interface for cloud storage support these attributes, while allowing for a multitude of business use cases.

5.2   What is Cloud Storage?

The use of the term "cloud" in describing these new models arose from architecture drawings that typically used a cloud as the icon for a network. The cloud represents any-to-any network connectivity in an abstract way. In this abstraction, the network connectivity in the cloud is represented without concern for how it is made to happen.

The cloud abstraction of complexity produces a simple base on which other features can be built. The general cloud model extends this base by adding a pool of resources. An important part of the cloud model is the concept of a pool of resources that is drawn from, on demand, in small increments. A relatively recent innovation that has made this possible is virtualization.

Thus, cloud storage is simply the delivery of virtualized storage on demand. The formal term that is used for this is Data storage as a Service (DaaS).

5.3   Data Storage as a Service

By abstracting data storage behind a set of service interfaces and delivering it on demand, a wide range of actual offerings and implementations are possible. The only type of storage that is excluded from this definition is that which is delivered in fixed-capacity increments instead of based on demand.

An important part of any DaaS offering is the support of legacy clients. Support is accommodated with existing standard protocols such as iSCSI (and others) for block and CIFS/NFS or WebDAV for file network storage, as shown in Figure 1.

LegacyDSIsNoShadow.jpg

 

Figure 1 - Existing Data Storage Interface Standards

The difference between the purchase of a dedicated appliance and that of cloud storage is not the functional interface, but the fact that the storage is delivered on demand. The customer pays for either what they actually use or what they have allocated for use. In the case of block storage, a Logical Unit Number (LUN), or virtual volume, is the granularity of allocation. For file protocols, a file system is the unit of granularity. In either case, the actual storage space may be thin provisioned and billed for based on actual usage. Data services, such as compression and deduplication, may be used to further reduce the actual space consumed.

Managing this storage is typically done out of band for these standard data storage interfaces, either through an API, or more commonly, through an administrative browser-based user interface. This out-of-band interface may be used to invoke other data services as well (e.g., snapshot and cloning).

In this model, the underlying storage space exposed by the out-of-band interfaces is abstracted and exposed using the notion of a container. A container is not only a useful abstraction for storage space, but also serves as a grouping of the data stored in it and a point of control for applying data services in the aggregate.

Each data object is created, retrieved, updated, and deleted as a separate resource. In this type of interface, a container, if used, is a simple grouping of data objects for convenience. Nothing prevents the concept of containers from being hierarchical, although any given implementation might support only a single level (see Figure 2).

CRUDNoShadow.jpg

 

Figure 2 - Storage Interfaces for Object Storage Client Data

5.4   Data Management for Cloud Storage

Many of the initial offerings of cloud storage focused on a kind of "best effort" quality of storage service and ignored most other types of data services. To address the needs of enterprise applications with cloud storage, however, there is an increasing need to offer better quality of service and the deployment of additional data services.

Cloud storage may lose its abstraction and simplicity benefits if new data services that require complex management are added. Cloud storage customers are likely to resist new demands on their time (e.g., setting up backup schedules through dedicated interfaces, deploying data services individually for data elements).

By supporting metadata in a cloud storage interface and prescribing how the storage system and data system metadata is interpreted to meet the requirements of the data, the simplicity required by the cloud storage model may be maintained while still addressing the requirements of enterprise applications and their data.

User metadata is retained by the cloud and may be used to find the data objects and containers by performing a query for specific metadata values. The schema for this metadata may be determined by each application, domain, or user. For more information on support for user metadata, see 16.2.

Storage system metadata is produced/interpreted by the cloud offering and basic storage functions (e.g., modification and access statistics, access control). For more information on support for storage system metadata, see 16.3.

Data system metadata is interpreted by the cloud offering as data requirements that control the operation of underlying data services for that data. It may apply to an aggregation of data objects in a container or to individual data objects, if the offering supports this level of granularity. For more information on support for data system metadata, see 16.4.

5.5   Data and Container Management

There is no reason that managing data and managing containers should involve different interfaces. Therefore, the use of metadata is extended from applying to individual data elements to applying to containers of data as well. Thus, any data placed into a container inherits the data system metadata of the container into which it was placed. When creating a new container within an existing container, the new container would similarly inherit the metadata settings of its parent's data system metadata. After a data element is created, the data system metadata may be overridden at the container or individual data element level, as desired.

Even if the provided interface does not support setting metadata on individual data elements, metadata may still be applied to the containers. In such a case, the interface does not provide a mechanism to override metadata that an individual data element inherits from its parent container. For file-based interfaces that support extended attributes (e.g., CIFS, NFSv4), these extended attributes may be used to specify the data system metadata to override that specified for the container.

5.6   Reference Model for Cloud Storage Interfaces

The Cloud Storage Reference Model is shown in Figure 3.

CloudStorageRefModelNoShadow.jpg

 

Figure 3 - Cloud Storage Reference Model

This model shows multiple types of cloud data storage interfaces that are able to support both legacy and new applications. All of the interfaces allow storage to be provided on demand, drawn from a pool of resources. The storage capacity is drawn from a pool of storage capacity provided by storage services. The data services are applied to individual data elements, as determined by the data system metadata. Metadata specifies the data requirements on the basis of individual data elements or on groups of data elements (containers).

5.7   Cloud Data Management Interface

The Cloud Data Management Interface (CDMI™) shown in Figure 3 may be used to create, retrieve, update, and delete objects in a cloud. The features of the CDMI include functions that

   allow clients to discover the capabilities available in the cloud storage offering;

   manage containers and the data that is placed in them; and

   allow metadata to be associated with containers and the objects they contain.

This international standard divides operations into two types: those that use a CDMI content type in the HTTP body and those that do not. While much of the same data is available via both types, providing both allows for CDMI-aware clients and non-CDMI-aware clients to interact with a CDMI provider.

CDMI may also be used by administrative and management applications to manage containers, domains, security access, and monitoring/billing information, even for storage that is functionally accessible by legacy or proprietary protocols. The capabilities of the underlying storage and data services are exposed so that clients may understand the offering.

Conformant cloud offerings may support a subset of the CDMI, as long as they expose the limitations in the capabilities reported via the interface.

This international standard uses RESTful principles in the interface design where possible (see REST).

CDMI defines both a means to manage the data as well as a means to store and retrieve the data. The means by which the storage and retrieval of data is achieved is termed a data path. The means by which the data is managed is termed the control path. CDMI specifies both a data path and control path interface.

CDMI does not need to be used as the only data path and is able to manage cloud storage properties for any data path interface (e.g., standardized or vendor specific).

Container metadata is used to configure the data requirements of the storage provided through the exported protocol (e.g., block protocol or file protocol) that the container exposes. When an implementation is based on an underlying file system to store data for a block protocol (e.g., iSCSI), the CDMI container provides a useful abstraction for representing the data system metadata for the data and the structures that govern the exported protocols.

A cloud offering may also support domains that allow administrative ownership to be associated with stored objects. Domains allow the standard to (among other things):

   determine how user credentials are mapped to principals used in an Access Control List (ACL),

   allow granting of special cloud-related privileges, and

   allow delegation to external user authorization systems (e.g., LDAP or Active Directory).

Domains may also be hierarchical, allowing for corporate domains with multiple children domains for departments or individuals. The domain concept is also used to aggregate usage data that is used to bill, meter, and monitor cloud use.

Finally, capabilities allow a client to discover the capabilities of a CDMI implementation. Requirements throughout this international standard shall be understood in the context of CDMI capabilities. Mandatory requirements on functionality that is conditioned on a CDMI capability shall not be interpreted to require implementation of that capability, but rather shall be interpreted to apply only to implementations that support the functionality required by that capability.

For example, in 5.10, this international standard states, "Every cloud storage system shall allow object ID-based access to stored objects". This requirement shall be understood in the context that access by object ID is predicated on the presence of the cdmi_object_access_by_ID capability.

5.8   Object Model for CDMI

The model for CDMI is shown in Figure 4.

2012-04-10_CDMI_UML_Entities_Diagram.jpg

 

Figure 4 - CDMI Object Model

The five types of resources defined are shown in Table 3. The content type in any given operation is specific to each type of resource.

Table 3 - Types of Resources in the Model

Resource Type

Description

Reference

Data objects

Data objects are used to store values and provide functionality similar to files in a file system.

See Clause 8.

Container objects

Container objects have zero or more children, but do not store values. They provide functionality similar to directories in a file system.

See Clause 9.

Domain objects

Domain objects represent administrative groupings for user authentication and accounting purposes.

See Clause 10.

Queue objects

Queue objects store zero or move values and are accessed in a first-in-first-out manner.

See Clause 11.

Capability objects

Capability objects describe the functionality implemented by a CDMI server and are used by a client to discover supported functionality.

See Clause 12.

For data storage operations, the client of the interface only needs to know about container objects and data objects. All data path implementations are required to support at least one level of containers (see 5.5). Using the CDMI object model (see Figure 4), the client may send a PUT via CDMI (see 5.6) to the new container URI and create a new container with the specified name. Container metadata are optional and are expressed as a series of name-value pairs. After a container is created, a client may send a PUT to create a data object within the newly created container. A subsequent GET will fetch the data object, including the value field.

Queue objects are also defined (see Figure 4) and have special properties for in-order, first in, first-out creation and fetching of queue values. More information on queues may be found in Clause 11.

CDMI defines two namespaces that can be used to access stored objects, a flat object ID namespace and a hierarchical path-based namespace. Support for objects accessed by object ID is indicated by the system-wide capability cdmi_object_access_by_ID, and support for objects accessed by hierarchical path is indicated by the container capability cdmi_create_dataobject found on the root container (and any subcontainers).

Objects are created by ID by performing an HTTP POST against a special URI, designated as
/cdmi_objectid/ (see 9.8). Subsequent to creation, objects are modified by performing PUTs using the object ID assigned by the CDMI server, using the /cdmi_objectid/ URI (see 8.6). The same URI is used to retrieve and delete objects by ID.

Objects are created by name by performing an HTTP PUT to the desired path URI (see 8.2). Subsequent to creation, objects are modified by performing PUTs using the object path specified by the client (see 8.6). The same URI is used to retrieve and delete objects by path.

CDMI defines mechanisms so that objects having only an object ID can be assigned a path location within the hierarchical namespace, and so that objects having both an object ID and path can have their path dropped, such that the object only has an object ID. This function is accomplished by using a move modifier to a PUT or POST operation, as shown in Figure 5.

CDMI_Object_Lifecycle.jpg

 

Figure 5 - Object Transitions between Named and ID-only

5.9   CDMI Metadata

CDMI uses many different types of metadata, including HTTP metadata, data system metadata, user metadata, and storage system metadata.

HTTP metadata is metadata that is related to the use of the HTTP protocol (e.g., Content-Length, Content-Type, etc.). HTTP metadata is not specifically related to this international standard but needs to be discussed to explain how CDMI uses the HTTP standard.

CDMI data system metadata, user metadata, and storage system metadata is defined in the form of name- value pairs. Vendor-defined data system metadata and storage system metadata names shall begin with the reverse domain name of the vendor.

Data system metadata is metadata that is specified by a CDMI client and is a component of objects. Data system metadata abstractly specifies the data requirements associated with data services that are deployed in the cloud storage system.

User metadata consists of client-defined JSON strings, arrays, and objects that are stored in the metadata field. The namespace used for user metadata names is self-administered (e.g., using the reverse domain name), and user metadata names shall not begin with the prefix "cdmi_".

Storage system metadata is metadata that is generated by the storage services in the system (e.g., creation time, size) to provide useful information to a CDMI client.

The matrix of the creation and consumption of storage system metadata is shown in Table 4.

Table 4 - Creation/Consumption of Storage System Metadata

 

Created by User

Created By System

Consumed by User

User metadata

Storage system metadata

Consumed by System

Data system metadata

N/A

5.10   Object ID

Every object stored within a CDMI-compliant system shall have a globally unique object identifier (ID) assigned at creation time. The CDMI object ID is a string with requirements for how it is generated and how it obtains its uniqueness. Each offering that implements CDMI is able to produce these identifiers without conflicting with other offerings.

Every cloud storage system shall allow object ID-based access to stored objects by allowing the object's ID to be appended to the root container URI. If the data object "MyDataObject.txt" has an object ID of "00006FFD001001CCE3B2B4F602032653", the following pair of URIs access the same data object:

http://cloud.example.com/root/MyDataObject.txt

http://cloud.example.com/root/cdmi_objectid/00006FFD001001CCE3B2B4F602032653

If containers are supported, they shall also be accessible by object ID. If the container "MyContainer" has an object ID of "00006FFD0010AA33D8CEF9711E0835CA", the following pairs of URIs access the same data object:

http://cloud.example.com/MyContainer/

http://cloud.example.com/cdmi_objectid/00006FFD0010AA33D8CEF9711E0835CA/

http://cloud.example.com/MyContainer/MyDataObject.txt

http://cloud.example.com/cdmi_objectid/00006FFD0010AA33D8CEF9711E0835CA/MyDataObject.txt

5.11   CDMI Object ID Format

The offering shall create the object ID, which identifies an object. The object ID shall be globally unique and shall conform to the format defined in Figure 6. The native format of an object ID is a variable-length byte sequence and shall be a maximum length of 40 bytes. An application should treat object IDs as opaque byte strings. However, the object ID format is defined such that its integrity may be validated, and independent offerings may assign unique object ID values independently.

0

1

2

3

4

5

6

7

8

9

10

...

38

39

Reserved (zero)

Enterprise Number

Reserved (zero)

Length

CRC

Opaque Data

Figure 6 - Object ID Format

The fields shown in Figure 6 are defined as follows:

   The reserved bytes shall be set to zero.

   The Enterprise Number field shall be the SNMP enterprise number of the offering organization that created the object ID, in network byte order. See RFC 2578 and http://www.iana.org/assignments/enterprise-numbers. 0 is a reserved value.

   The byte at offset 5 shall contain the full length of the object ID, in bytes.

   The CRC field shall contain a 2-byte (16-bit) CRC in network byte order. The CRC field enables the object ID to be validated for integrity. The CRC field shall be generated by running the algorithm (see CRC) across all bytes of the object ID, as defined by the Length field, with the CRC field set to zero. The CRC function shall have the following fields:

   Name    : "CRC-16",

   Width    : 16,

   Poly    : 0x8005,

   Init    : 0x0000,

   RefIn    : True,

   RefOut    : True,

   XorOut    : 0x0000, and

   Check    : 0xBB3D.

This function defines a 16-bit CRC with polynomial 0x8005, reflected input, and reflected output. This CRC-16 is specified in CRC.

   Opaque data in each object ID shall be unique for a given Enterprise Number.

The native format for an object ID is binary. When necessary, such as when included in URIs and JSON strings, the object ID textual representation shall be encoded using base 16 encoding rules described in RFC 4648 and shall be case insensitive.

5.12   Security

Security, in the context of CDMI, refers to the protective measures employed in managing and accessing data and storage. The specific objectives to be addressed by security include:

   provide a mechanism that assures that the communications between a CDMI client and server may not be read or modified by a third party;

   provide a mechanism that allows CDMI clients and servers to provide an assurance of their identity;

   provide a mechanism that allows control of the actions a CDMI client is permitted to perform on a CDMI server;

   provide a mechanism for records to be generated for actions performed by a CDMI client on a CDMI server;

   provide mechanisms to protect data at rest;

   provide a mechanism to eliminate data in a controlled manner; and

   provide mechanisms to discover the security capabilities of a particular implementation.

Security measures within CDMI may be summarized as

   transport security,

   user and entity authentication,

   authorization and access controls,

   data integrity,

   data and media sanitization,

   data retention,

   protections against malware,

   data at-rest encryption, and

   security capabilities.

With the exception of both the transport security and the security capabilities, which are mandatory to implement, the security measures may vary significantly from implementation to implementation.

When security is a concern, the CDMI client should begin with a series of security capability lookups (see 12.1.1) to determine the exact nature of the security features that are available. Based on the values of these capabilities, a risk-based decision should be made as to whether the CDMI server should be used. This is particularly true when the data to be stored in the cloud storage is sensitive or regulated in a way that requires stored data to be protected (e.g., encrypted) or handled in a particular manner (e.g., full accountability and traceability of management and access).

HTTP is the mandatory transport mechanism, and HTTP over TLS (i.e., HTTPS) is the mechanism used to secure the communications between CDMI clients and servers. To ensure both security and interoperability, all CDMI implementations shall implement the Transport Layer Security (TLS) protocol as described in Annex A, but its use by CDMI clients and servers is optional.

5.13   Required HTTP Support

5.13.1   RFC 2616 Support Requirements

A conformant implementation of CDMI shall also be a conformant implementation of RFC2616 (see RFC 2616) (i.e., HTTP 1.1). The subclauses below list the sections of RFC 2616 that shall be supported; however, this list is not comprehensive.

5.13.2   Content-Type Negotiation

For CDMI operations, media types for CDMI objects are used, as defined in RFC 6208.

A client may optionally supply an HTTP Accept header, as per section 14.1 of RFC 2616. If a client is restricting the response to a specific CDMI media type, the corresponding media type shall be specified in the Accept header. Otherwise, the Accept header may contain "*/*" or a list of media types, or it may be omitted.

If a request message body is present, the client shall include a Content-Type header, as per section 14.17 of RFC 2616. If the client does not provide a Content-Type header when required or provides a media type in the Content-Type header that does not match with the existing resource media type, the server shall return an HTTP status code of 400 Bad Request.

If a response message body is present, the server shall provide a Content-Type header.

This international standard may further qualify content negotiation (e.g., in 9.3, the absence of a Content-Type header has a specific meaning).

5.13.3   Range Support

The server shall support HTTP Range headers and partial content responses (see Section 14.16 of RFC 2616).

5.13.4   URI Escaping

Percent escaping of reserved characters specified in RFC 3986 shall be applied to all text strings used in URIs. This includes user-supplied field names, metadata names, object names, container names and domain names when used in URIs.

Field names and values shall not be escaped when stored and sent in request and response message bodies.

EXAMPLE    A client retrieving a metadata item named "@user" from a container object with the name of "@MyContainer" would perform the following request:

GET /%40MyContainer/?objectName;metadata:%40user HTTP/1.1

Host: cloud.example.com

Accept: application/cdmi-container

X-CDMI-Specification-Version: 1.0.2

The response shall be:

HTTP/1.1 200 OK

Content-Type: application/cdmi-container

X-CDMI-Specification-Version: 1.0.2

 

{

   "objectName": "@MyContainer",

   "metadata": {

       "@user": "test"

   }

}

5.13.5   Use of URIs

The format and syntax of URIs are defined by RFC 3986.

Every CDMI client shall maintain one or more root URIs that each correspond to a root container on the CDMI server. Since all URIs to CDMI containers end in a trailing slash, all root URIs will end in a trailing slash.

All URIs in this international standard are relative to the root URI unless otherwise noted. As a consequence, the algorithm used for calculating the resolved URI is as described in Section 5.2 of RFC 3986.

Table 5 shows how relative URIs are resolved against root URIs.

Table 5 - Relative URIs Resolved Against Root URIs

Root URI

+ Relative URI

=> Resolved URI

http://cloud.example.com/

cdmi_object/testObject

http://cloud.example.com/cdmi_object/testObject

http://cloud.example.com/

/cdmi_object/testObject

http://cloud.example.com/cdmi_object/testObject

http://cloud.example.com/p1/

cdmi_object/testObject

http://cloud.example.com/p1/cdmi_object/testObject

http://cloud.example.com/p1/

/cdmi_object/testObject

http://cloud.example.com/cdmi_object/testObject

http://cloud.example.com/p1/p2/

cdmi_object/testObject

http://cloud.example.com/p1/p2/cdmi_object/testObject

http://cloud.example.com/p1/p2/

/cdmi_object/testObject

http://cloud.example.com/cdmi_object/testObject

This international standard places no restrictions on root and relative URIs. All of the examples in Table 5 are valid, use a root URI of http://cloud.example.com/, and return absolute path references, as shown in the second line of Table 5.

5.13.6   Reserved Characters

The name of CDMI data objects, container objects, queue objects, domain objects and capability objects shall not contain the "/" or "?" characters, as these characters are reserved for delimiters.

5.14   Time Representations

Unless otherwise specified, all date/time values are in the ISO 8601:2004 extended representation (YYYY-MM-DDThh:mm:ss.ssssssZ). The full precision shall be specified, the sub-second separator shall be a ".", the Z UTC zone indicator shall be included, and all timestamps shall be in UTC time zone. The YYYY-MM-DDT24:00:00.000000Z hour shall not be used, and instead, it shall be represented as YYYY-MM-DDT00:00:00.000000Z.

Unless otherwise specified, all date/time intervals are in the ISO 8601:2004 start date/end date representation (YYYY-MM-DDThh:mm:ss.ssssssZ/YYYY-MM-DDThh:mm:ss.ssssssZ). The end-date shall be equal to or later than the start-date. The full precision shall be specified, the sub-second separator shall be a ".", the Z UTC zone indicator shall be included, and all timestamps shall be in UTC time zone. The YYYY-MM-DDT24:00:00.000000Z hour shall not be used, and instead, it shall be represented as YYYY-MM-DDT00:00:00.000000Z.

5.15   Backwards Compatibility

5.15.1   Value Transfer Encoding

CDMI version 1.0.1 introduces the concept of value transfer encoding to enable the storage and retrieval of arbitrary binary data via CDMI content-type operations. Data objects created by CDMI 1.0 clients through CDMI content-type operations shall have a value transfer encoding of "utf-8", and data objects created through non-CDMI content-type operations shall have a value transfer encoding of "base64".

Data objects with a value transfer encoding of base 64 shall not have their value field accessible to CDMI 1.0 clients through CDMI content-type operations. Attempts to read the value of these objects shall return an empty value field ("") to these clients. CDMI 1.0 clients can detect this condition when the cdmi_size metadata is not 0 and the value field is empty.

5.15.2   Container Export Capabilities

CDMI version 1.0.2 normalizes the names of capabilities used by a client to discover if a container can be exported via various protocols and deprecates the following container export capability names:

   cdmi_cifs_export,

   cdmi_nfs_export,

   cdmi_iscsi_export, and

   cdmi_occi_export.