Cloud Computing (3)

Release Date:2010-09-13 Author:Wang Bai, Xu Liutong

Editor's Desk:
In the preceding two parts of this series, several aspects of cloud computing—including definition, classification, characteristics, typical applications, and service levels—were discussed. This part continues with a discussion of Cloud Computing Oopen Architecture and Market-Oriented Cloud. A comparison is made between cloud computing and other distributed computing technologies, and Google’s cloud platform is analyzed to determine how distributed computing is implemented in its particular model.

 

6.4 Open Architecture
Virtualization is a core technology for enabling cloud resource sharing, and  Service-Oriented Architecture (SOA) enables flexibility, scalability, and reusability. By combining these two technologies, researchers have developed an open cloud computing  architecture based on the Open System Interconnection (OSI) model, and have used it as the reference model for implementing the cloud computing system, as shown in Figure 6.

 

 


    This architecture encompasses cloud ecosystem, cloud infrastructure and its management, service-orientation, core provisioning and subscription, composite cloud offerings, cloud information architecture and management, and cloud quality analytics. In designing the architecture, seven basic principles are adopted:


    (1) Integrated Management for Cloud Ecosystem
    An architecture must support cloud computing ecosystem management. Such an ecosystem includes all services and solutions vendors, partners, and end users that provide or consume shared resources within the cloud computing environment.  Collaboration between vendors and their partners is emphasized in the cloud computing value chain.

 


    (2) Virtualization for Cloud Infrastructure
    Hardware virtualization involves managing hardware equipment in plug-and-play mode;  software virtualization involves using software image management or code virtualization technology to enable software sharing. Dynamic code assembly and execution is another software virtualization technology. In an Internet application, some JavaScript code elements can be dynamically retrieved and inserted into an Ajax package to create new functions or features for a web client.


    (3) Service-Orientation
    Service-orientation is a driving force that gives cloud computing business value in terms of asset reusability, composite applications, and mashup services. Common services can be reused to enable the cloud’s core provisioning and subscription services as well as to build cloud offerings in Infrastructure as a Service (IaaS), Software as a Service (SaaS), and even Business Process as a Service (BPaaS).


    (4) Extensible Service Provisioning
    This feature is unique to cloud computing systems. Without extensibility, the provisioning part of the cloud architecture can only support a certain type of resource sharing. Free use and paying users can periodically change their roles as service providers or consumers  and this change can occur at three levels of service provisioning.


    (5) Configurable Enablement for Cloud Offerings
    The architecture must ensure configurability of the cloud computing platform and services. The modularized ecosystem management, virtualization, service-orientation, and cloud core form a solid foundation to ensure a computing platform that is configurable,  combinable and manageable.


    (6) Unified Information Representation and Exchange Framework
    The collaborative feature of cloud computing comprises information representation and message exchange between cloud computing resources. Cloud computing resources include all business entities (e.g. cloud clients, partners, and vendors) and supporting resources such as virtualization related modules, service-orientation related modules, cloud core, and cloud offerings. The cloud information architecture module enables representation of cloud entities in a unified cloud computing entity description framework. Message routing and exchange protocols as well as message transformation capability form the foundation of cloud information architecture.


    (7) Cloud Quality and Governance
    This module identifies and defines quality indicators for the cloud computing environment and a set of guidelines to govern the design, deployment, operation, and management of cloud offerings.


    In short, the objective of such an architecture is to combine SOA and virtualization technologies in order to exploit the business potential of cloud computing.

 

6.5 Market-Oriented Cloud
Cloud computing is a new Internet-based resource sharing mode particularly focused on its business model. How, then, does this feature impact cloud computing? Researchers have proposed a market-oriented cloud architecture, global cloud exchange and market infrastructure for trading services, which have been investigated intensively.

 

6.5.1 Market-Oriented Cloud Architecture
In the article Cloud Computing and Emerging IT Platform: Vision, Hype, and Reality for Delivering Computing as the 5th Utility, researchers from the Cloud Computing and Distributed Systems (CLOUDS) Laboratory of the University of Melbourne presented a market-oriented architecture. This architecture supports Quality of Service (QoS) negotiation and Service Level Agreement (SLA)-based resource allocation in the context of cloud computing, as shown in Figure 7.

 


    In this architecture, there are four main entities:
    (1) Users/Brokers
    Users (or brokers acting on their behalf) submit service requests from anywhere in the world to the Cloud Computing Center to be processed.

 


    (2) SLA Resource Allocator
    The SLA Resource Allocator acts as the interface between Cloud service provider and external users/brokers. It requires the interaction of the following mechanisms to support SLA-oriented resource management.

 

  • Service Request Examiner and Admission Control
    When a service request is first submitted, the Service Request Examiner and Admission Control mechanism interprets it for QoS requirements before determining whether to accept or reject it. The mechanism also requires updated status information on resource availability from the Virtual Machine (VM) Monitor mechanism and workload processing from the Service Request Monitor mechanism in order to make effective resource allocation decisions. It then assigns the request to a VM and determines resource entitlements for the allocated VM.
  • Pricing
    The Pricing mechanism determines how service requests will be charged based on submission time(peak/off-peak), pricing rates, or resource availability.
  • Accounting
    The Accounting mechanism meters the actual usage of resources by each request so that the final cost can be calculated and charged to the user.
  • VM Monitor
    The VM Monitor mechanism oversees the availability of VMs and their resource entitlements.
  • Service Request Monitor
    The Service Request Monitor mechanism oversees the execution progress of service requests.

 

 

 

6.5.2 Cloud Service Exchanges and Markets
Enterprises currently employ cloud services to improve the scalability of their services and to deal with bursts in resource demand. However, at present, the proprietary interfaces and pricing strategies of service providers prevent consumers from swapping one provider for another. For cloud computing to become mature, services must follow standard interfaces. This would enable services to be commoditized and would pave the way for the creation of a market infrastructure for trading in services.


    In cloud computing markets, service consumers expect their specific QoS requirements to be met with minimal expense, and service providers hope to retain their clients while achieving the highest possible Return on Investment (ROI). To achieve this, mechanisms, tools, and technologies must be developed to represent, convert, and enhance resource value. Figure 8 illustrates a cloud exchange and market system model based on real-world exchanges.

 


    In this model, the market directory allows participants to locate providers or consumers with suitable offers. Auctioneers periodically clear bids and requests received from market participants, and the banking system carries out financial transactions.

 

 
    Brokers perform the same function in such a market as they do in real-world markets: they mediate between consumers and providers by purchasing from the provider and sub-leasing to the consumer. Consumers, brokers and providers are bound to their requirements and related compensations through SLAs. An SLA specifies the details of the service to be provided in terms of metrics agreed upon by all parties, and penalties for violating these expectations, respectively. Such markets can bridge disparate clouds, allowing consumers to choose a suitable provider by either executing SLAs in advance or by purchasing capacity on the spot. Providers can set the prices for a resource based on market conditions, user demand, or current level of utilization of the resource. The admission-control mechanism at the provider end is responsible for selecting the auctions to participate in or the brokers to negotiate with. The negotiation process continues until an SLA is formed or the participants decide to break off. Brokers profit from the difference between the cost of leasing the resource, and what they charge consumers to gain a share of the resource. A broker, therefore, must choose both consumers and providers. Consumer demands  include deadlines, fidelity of results, turnaround time of applications, and budget limitations. Enterprise consumers can deploy their own limited IT resources into clouds  as guarantees for enterprise computing, or they can lease providers’ resources to upscale their applications.


    The idea of utility markets for computing resources has been around for a long time. Recent research projects have particularly focused on trading VM-based resource allocation by time slices. In the above model, a resource broker can negotiate with resource providers. Based on enterprise Grid, Melbourne University’s CLOUDS Laboratory  implements a market-oriented platform called "Aneka", which is also a NET-based service-oriented resource management platform. Aneka exhibits many of the properties of the cloud computing model.

 

6.6 Comparison of Cluster, Grid, and Cloud Computing
The first part in this series briefly introduced some characteristics of cloud computing that can be directly experienced by users. This part, however, discusses some of the technical characteristics that distinguish cloud computing platforms from cluster and grid computing.
Although cloud platforms share some common characteristics with clusters and Grids, they have their own unique attributes and capabilities. These include support for virtualization, services with Web Service interfaces that can be dynamically composed, and support for the creation of third-party, value-added services by building on cloud compute, storage, and application services. Table 1 compares the key characteristics of cluster, grid and cloud computing systems.

 


7 Cloud Computing Model
Although enterprises and academic researchers have proposed various cloud system models, most of these do not reveal the computing paradigm for problem solving in a could. To enable communication and collaboration between server clusters within a cloud, Google has introduced Google File System (GFS), BigTable, and MapReduce technologies—so-called the "three sharp weapons" of cloud computing. With these technologies, Google has formed a cloud with thousands or even millions of computers, creating a powerful data center.

 

 

7.1 GFS File System
Desktop applications differ from Internet applications in many respects. GFS is a proprietary distributed file system developed by Google Inc. It is designed to allow efficient and reliable access to data by using large clusters of commodity hardware. GFS is optimized for Google’s core data storage and usage needs (primarily the search engine), which can generate enormous amounts of data that needs to be retained. Google’s Internet search computing learns from the functional programming paradigm in which operations do not modify original data but generate new computing data. Therefore, one feature of GFS is that it generates a large number of very large files mainly for reading. These files can be appended but rarely re-written. GFS is also characterized by high data throughput.


    There are two types of GFS nodes: one master node, and a large number of chunkservers. Chunkservers store data files, with each individual file broken up into fixed-sized chunks of 64 megabytes. Each chunk is assigned a unique 64-bit label to maintain logical mappings of files to constituent chunks. The master node only stores metadata associated with the chunks, such as the tables mapping the 64-bit labels to chunk locations and the files they make up, the locations of the copies of the chunks, what processes are reading or writing to a particular chunk, or taking a "snapshot" of the chunk pursuant to replicating it. This metadata is kept current by the master node as it periodically receives updates from each chunk server.


    Modification permissions are handled by means of time-limited "leases". The master node grants permission for a process to modify a chunk within a given period. The modifying chunkserver, which is always the primary chunk holder, then propagates the changes to chunkservers with backup copies for synchronization. With several redundant copies, reliability and availability are guaranteed. Programs access the chunks by first querying the Master server for the locations of the desired chunks, the Master replies with the locations, and the program then contacts and receives the data from the chunkserver directly.


    Google currently has over 200 GFS clusters, each of which consists of 1,000 to 5,000 servers. Using GFS, Google has proven that clouds built on cheap machines can also deliver reliable computing and storage.

 

7.2 BigTable Database System
BigTable is a compressed, high performance proprietary database system built mainly on GFS and Chubby Lock Service. It is also a distributed system for storing structured data. A BigTable is a sparse, distributed, multi-dimensional sorted map, which can be indexed by a row key, column key, and a timestamp. By allowing a client to dynamically control data layout, storage format, and storage location, BigTable meets application demands for localized access. Tables are optimized for GFS, being split into multiple tablets of about 200 megabytes. The locations in the GFS of tablets are recorded as database entries in multiple special tablets, which are called "META1" tablets. META1 tablets are found by querying the single "META0" tablet. The META0 tablet typically has a machine to itself which is queried by clients by clients for the location of the META1 tablets; and consequently, the location of the actual data.


    BigTable is designed for databases of petabyte scale with data across thousands of servers. It is also designed to accomodate more machines without the need for reconfiguration.

 

7.3 MapReduce Distributed Programming Paradigm
GFS and BigTable are used by Google for reliable storage of data in a large-scale distributed environment. Google’s MapReduce is a software framework designed to support parallel computing on large data sets (often greater than 1 terabyte) on a large cluster. It is therefore a computing model specifically designed for cloud computing.

 

7.3.1 Software Framework
The MapReduce software framework design is inspired by two common programming functions: "Map" and "Reduce". It was developed within Google as a mechanism for processing large amounts of raw data; for example, counting the number of occurrences of each word in a large set of documents. In functional programming, map and reduce are tools for constructing higher-order functions.


    Map applies a given function to a list of elements (element by element) and returns a new list. These new elements are the products of the function applied to each element in the original list. For example, Map f [v1, v2, ..., vn ] = [f (v1), f (v2), ..., f (vn)].  In this way, the functions can be computed in parallel. The MapReduce computing model is suitable for applications requiring high-performance parallel computing. If the same computing is required on a large set of data, the data set can be divided and assigned to different machines for computing.


    Reduce involves combining elements of a list using a computing approach (function). To unfold a binary operation (function) into a n-ary operation (function), the reduce function is used: Reduce f [v1, v2, ..., vn ]=f (v1, (Reduce f[v2, ..., vn]) = f (v1, f (v2, (Reduce f [v3, ..., vn])) = f (v1, f (v2, f ( ...f (vn-1, vn) ...)). MapReduce computing model combines the intermediate results obtained from Map operations by Reduce operations until the final result is calculated.

 

7.3.2 Execution Procedure
Map invocations are distributed across multiple machines by automatically partitioning the input data into a set of splits or shards. Reduce invocations are distributed by partitioning the intermediate key space into pieces using a partitioning function. When the user program calls the MapReduce function, the overall operation flow is illustrated in Figure 9.

 


    The MapReduce library in the user program first splits the input files into M pieces. It then starts up many copies of the program on a cluster of machines.

 


    One of the copies of the program—the master—is special; the rest are workers. The master picks idle workers and assigns each one a map task or a reduce task.


    A worker that is assigned a Map task reads the contents of the corresponding input split. It parses key/value pairs out of the input data and passes each pair to the user-defined map function. The intermediate key/value pairs produced by the Map function are buffered in memory.


    The buffered pairs are periodically written to local disk and partitioned into R regions by the partitioning function. The locations of these buffered pairs on the local disk are passed back to the master, which is responsible for forwarding these locations to the Reduce workers.


    When a Reduce worker is notified by the master about these locations, it uses remote procedure calls to read the buffered data from the local disks of map workers. When a Reduce worker has read all intermediate data for its partition, it sorts it by the intermediate keys so that all occurrences of the same key are grouped together.


    The Reduce worker iterates over the sorted intermediate data and for each unique intermediate key encountered, it passes the key and the corresponding set of intermediate values to the user’s reduce function. The output of the reduce function is appended to a final output file for this reduce partition.


    When all Map and Reduce tasks have been completed, the master wakes up the user program. At this point, the MapReduce call in the user program returns back to the user code.


    Upon completion, the output of the MapReduce execution is available in the R output files. Typically, users do not need to combine these R output files into one file; they often pass these files as input to another MapReduce call or use them from another distributed application.

 

7.4 Apache Hadoop Distributed System Infrastructure
Google’s GFS, BigTable, and MapReduce technologies are open to the public but their implementation is private. A typical implementation of these technologies in the open source community involves the Apache Hadoop project. Inspired by Google’s MapReduce and GFS, Hadoop is an open-source Java software framework consisting of functional programming-based concurrent computing model, and a distributed file system. Hadoop’s HBase, similar to BigTable distributed database, supports data-intensive distributed applications to work with thousands of nodes and petabytes of data.


    Hadoop was originally developed to support distribution for the Nutch search engine project. Yahoo has invested a great deal of money into the project and uses Hadoop extensively in areas such as web search and advertising. IBM and Google have launched an initiative to use Hadoop to support university courses in distributed computer programming. All these have been instrumental in promoting and popularizing cloud computing worldwide.


(To be continued)

 

 

 

 

 

 


    (3) VMs
    Multiple VMs can be activated or stopped dynamically on a single physical machine to meet accepted service requests.


(4) Physical Machines
    Multiple computing servers form a resource cluster to meet service demands.
Commercial market-oriented cloud systems must be able to:

  • Support customer-driven service management;
  • Define computational risk management tactics to identify, assess, and manage risks involved in the execution of applications;
  • Devise appropriate market-based resource management strategies that encompass both customer-driven service management and computational risk management in order to sustain SLA-oriented resource allocation;
  • Incorporate autonomic resource management models that effectively self-manage changes in service requirements in order to satisfy both new service demands and existing service obligations;
  • Leverage VM technology to dynamically assign resource shares according to service requirements.