Remote deployment and lifecycle management of VMs
Nimbus has been developed in part within the Globus Toolkit 4 framework and provides interfaces to VM management functions based on the WSRF set of protocols. There is also an alternative implementation implementing Amazon EC2 WSDL. (The underlying service implementation is protocol agnostic)
Nimbus clients can deploy, pause, restart and shutdown VMs.
On deployment, the client presents the workspace service with:
- meta-data (containing a pointer to the VM image to use as well as configuration information such as networking)
- resource allocation (specifying what resources: deployment time, CPUs, memory, etc. should be assigned to the VM)
Once a request for VM deployment is accepted by the workspace service, a client can inspect various VM properties (e.g., its lifecycle state, time-to-live, the IP address assigned to a VM on deployment, or the resources assigned to the VM) via WSRF resource properties/notificaitons or polling (such as EC2 describe-instances).
Before deployment, clients can discover the properties of site configurations (e.g. what VMM is being supported on the site) and match them against the meta-data of workspaces they want to deploy (which describe for example what VMM is required for the workspace).
Multiple protocol support / Compartmentalized dependencies
The workspace service is an implementation of a strong "pure Java" internal interface (see What is the RM API?) which allows multiple remote protocols to be supported as well as differing underlying manager implementations.
There is currently one known manager implementation (the workspace service) and two supported remote protocol sets:
WSRF based: protocol implementation in longstanding use by previous workspace services and clients including the cloud-client.
These protocols happen to both be Web Services based and both run in the Apache Axis based GT Java container. But neither thing is a necessity:
There is nothing specific to web services based remote protocols in the workspace service implementation, the messaging system just needs to be able to speak to Java based libraries.
Workspace service dependencies have nothing to do with what container it is running in, they are normal Java application dependencies like Spring, ehcache, backport-util-concurrent, and JDBC (currently using the embedded Derby database).
Flexible group management
The workspace service can start and manage groups of workspaces at a time, as well as groups of groups ("ensembles") where each group's VM images, resource allocation, duration, and node number can be different. Groups and ensembles will be run in a co-scheduled manner. That is, all group/cluster members will be scheduled to run at same time or none will run, even when using best-effort schedulers (see the pilot section below).
Support for auto-configuration of these clusters (see the cloud clusters page).
Per-client usage tracking
The service can track deployment time (both used and currently reserved) on a per-client basis which can be used in authorization decisions about subsequent deployments. Clients may query the service about their own usage history.
Flexible request authentication and authorization
The workspace service uses GSI to authenticate and authorize creation requests. Among others, it allows a client to be authorized based on VO/role information contained in the VOMS credentials and attributes obtained via GridShib. Authorization policies can also be applied to networking request, VM image files, resource request, and time used/reserved by the client.
An included authorization setup (not enabled by default) allows for straightforward group management. You can assign identities to logical groups and then write policies about those groups. You can set simultaneous reservation limits, reservation limits that take past workspace usage into account, and detailed repository node and path checks.
Easy user management
As of version TP2.1, an administrator wizard (cloud-admin.sh) is provided to make adding and removing cloud users simple. This includes adjusting authorization policies and creating repository directories with sample images, etc. This is intended for the "cloudkit" setup and requires the group management authorization described in the previous section.
Configuration management (deployment request)
Some configuration operations need to be finished at deployment-time because they require information that becomes available only late in the deployment process (such as network address assignments, physical host assignments, etc.).
The workspace service provides optional mechanisms to carry out such configuration management actions. Configuration actions available are DHCP delivery of network assignments and arbitrary file based customizations (mount + alter image).
Also see one-click clusters
One-click clusters (contextualization)
See the cloud clusters page for how auto-configuration of entire clusters (contextualization) is supported by the science clouds. This allows the cloud client to launch "one-click" clusters whose nodes securely configure themselves to operate in new network and security environments.
The workspace client allows authorized clients to access all Workspace Service features. The current release contains a Java reference implementation.
The workspace cloud client allows authorized clients to access many Workspace Service features in a user friendly way. It is compatible with a certain configuration of the workspace service and aims to get users up and running in a matter of minutes, even from laptops, NATs, etc.
VM network configuration (deployment request)
The workspace service allows a client to configure networking for the VM accommodating several flexible options (allocating new network address from a site pool, bridging an existing address, etc.).
In particular, a client can request configuring a VM on startup with several different NICs allocating different addresses from different pools (e.g., public and private, thus implementing the Edge Service requirement).
There are mechanisms for a site to set aside such address pools for the VMs as well as tools intercepting the VM's DHCP requests to deliver the right addresses.
Xen backend plugin
The current workspace backend plugin is for the Xen hypervisor, an open source, efficient implementation.
Local resource management plugin
The workspace service provides a local resource manager with the capability to manage a pool of nodes on which VMs are deployed to accommodate the service deployment model (as opposed to a batch deployment model).
To use it, the pool nodes are configured with a lightweight Python management script called workspace-control.
Besides interfacing with Xen, workspace-control maps networking requests to the proper bridge interfaces, controls file isolation between different workspace instances, interfaces with ebtables and DHCP for IP address delivery, and can accomplish local transfers (file propagation from the WAN accessible image node) in daemonized mode.
Non-invasive site scheduler integration
When using the local resource management plugin, (the default), a set of VMM resources will be managed entirely by the workspace service. But it can alternatively be integrated with a site's scheduler/resource manager (such as PBS) using the workspace pilot program.
This allows a dual use grid cluster to be achieved: regular jobs can run on a VMM node that hosting no guest VMs; but if the node is allocated to the workspace service (at the service's request), VMs can be used. The site resource manager maintains full control over the cluster and does not need to be modified.
Many safeguards are included to ensure nodes are cleanly returned to their normal non-VM-hosting state, including protection against the workspace service not being available, site (resource manager based) early cancellation, node reboots, and to provide a "worst case scenario" contingency it includes a one-command "kill 9" facility for administrators.
VM fine-grain resource usage enforcement (resource allocation)
The workspace service allows the client to specify (ask for) the resource allocation to be assigned to a VM and manage that resource allocation during deployment. In the current release only memory and deployment time are managed.