Warning: This document describes an old release. Check here for the current version.
Nimbus 2.6 Admin Reference
This section explains some side tasks as well as some none-default configurations.
Notes on conf files (#)
The Nimbus conf files have many comments around each configuration. Check those out. Their content will be inlined here in the future.
See the $NIMBUS_HOME/services/etc/nimbus/workspace-service directory.
Cloud configuration overview (#)
The cloud configuration is a particular configuration of Nimbus that allows the cloud-client to operate out of the box. It is what is set up when you are done with the Zero To Cloud Guide.
This overview section and subsequent reference sections give a more in-depth explanation of how it works in order to provide context for administrators running Nimbus in production that want to customize settings or understand things in a more "under the hood" way for security/networking reasons.
This is all information for deployers of the cloud configuration to learn about it and customize it. This is not necessary for cloud users to read and understand. If you are a cloud user just looking to understand how to launch and manage VMs on an existing cloud, start with the clouds pages.
In the Nimbus 2.6 release the repository service (Cumulus) and the IaaS services MUST be on the same node. In future releases this restriction will be lifted.
The server addresses must be directly reachable from the Internet or otherwise configured to deal with being NAT'd. The IaaS services container can be setup for NAT or other port forwarding situations. Cumulus should be NAT friendly so long as its listening port (default 8888) is forwarded through the NAT.
The diagram above depicts the basic setup.
- A special workspace client called the "cloud-client" invokes operations on the Iaas services and Cumulus server. A number of defaults are assumed which makes this work out of the box (these defaults will be discussed later).
- Files are transferred from the cloud-client to a client-specific storage system on the repository node (manual or other types of S3 protocol based transfers are also possible for advanced users).
- The service invokes commands on the VMMs to trigger file transfers to/from the repository node, VM lifecycle events, and destruction/clean up.
- If the workspace state changes, the cloud-client will reflect this to the screen (and log files) and depending on the change might also take action in response.
Cloud configuration user experience (#)
Working backwards from the user's cloud-client experience is a good way to understand how the service needs to be setup.
Here is an abbreviated depiction of a simple user interaction with a cloud, to give you an idea if you've never used it. This does not depict an image transfer to the repository node but that is similarly brief.
-
A grid credential is needed, there is an embedded grid-proxy-init program if that is necessary.
-
You can list what's in your repository directory:
$ ./bin/cloud-client.sh --list
Sample output:
[Image] 'base-cluster-01.gz' Read only Modified: Jul 06 @ 17:34 Size: 578818017 bytes (~552 MB) [Image] 'hello-cloud' Read only Modified: May 30 @ 14:16 Size: 524288000 bytes (~500 MB) [Image] 'hello-cluster' Read only Modified: Jun 30 @ 20:18 Size: 524288000 bytes (~500 MB)
-
And pick one to run (ignore the 'cluster' images for now)
$ ./bin/cloud-client.sh --run --name hello-cloud --hours 1
Sample output:
SSH public keyfile contained tilde: - '~/.ssh/id_rsa.pub' --> '/home/guest/.ssh/id_rsa.pub' Launching workspace. Using workspace factory endpoint: https://cloudurl.edu:8443/wsrf/services/WorkspaceFactoryService Creating workspace "vm-023"... done. IP address: 123.123.123.123 Hostname: ahostname.cloudurl.edu Start time: Fri Feb 29 09:36:39 CST 2008 Shutdown time: Fri Feb 29 10:36:39 CST 2008 Termination time: Fri Feb 29 10:46:39 CST 2008 Waiting for updates.
Some time elapses as the image file is copied to the VMM node. Then a running notification is printed:
State changed: Running Running: 'vm-023'
-
The client had picked up your default public SSH key and sent it to be installed on the fly into the VM's authorized_keys policy for the root account. So after launching you can use the printed hostname to log in as root:
$ ssh [email protected]
You can see an example of a cluster cloud-client deployment on the one-click clusters page.
Cloud configuration assumptions and defaults (#)
A number of things go into making the cloud client work out of the box, but it is in large part accomplished by giving the user a downloadable package with a number of default configurations.
These defaults limit functionality options in some cases, but that is the idea: eliminate decisions that need to be made and set working defaults. There are avenues left open for experienced users to do more (for example, by overriding the defaults or even switching over to the regular workspace client).
In the previous section, the first thing that probably stands out is that there are no contact addresses being entered on the command line.
The service and repository URLs are derived from a properties file that is included in the toplevel "conf" directory of the cloud-client package.
Note: How properties files and commandline overrides work is covered in a later section in detail, it is all designed to be flexible under the covers. If you don't want to follow the conventions laid out in this current "assumptions" section, it will be important to understand the later section to know how to change things for a good client package or properties file(s) that your users can use. Continue reading this section first, though, to get the basic ideas.
There are three main groups of assumptions and defaults. The first is the contact and identity information of the workspace service (see above for configuration sample where this are specified). The other two groups make up the rest of this "Assumptions" section:
Deriving per-user repository directories (#)
For Cumulus S3 based commands (like --list, --delete, and --transfer) the server to contact is based on the contact in the cloud properties file. The s3id and s3 secret to authenticate with the Cumulus server is stored in the cloud properties file.
When you transfer a local file, the target of the transfer is the same filename in your personal repository directory. When you refer to the name of a workspace to run, this name must correspond to a filename in your personal repository directory.
We know where the repository comes from but how is that directory derived?
There are two other components to derive the directory used: the configured base bucket property and the user S3 ID.
- The configured base directory property. The default configuration for the base bucket on the repository node is "Repo". This value can be changed in the cloud.properties file so long as it matches the particular clouds setup. It is best to not alter this value.
There is a cloud-client option to input any name or local file path and see what the derived URL is. See the --extrahelp description of the --print-file-URL option.
Runtime assumptions (#)
The second set of assumptions to cover is how a given image file is going to actually work. There are many options that you can specify in regular workspace requests. For example, the memory size, the number of network interfaces to construct, the pool name(s) to lease network addresses from, and the partition name the VM is expecting for the base partition.
Some fixed assumptions are made:
- There can be only one network interface
- The network interface is expecting its address via DHCP
- There can be only one partition file, for the root partition, configured with an ext2/ext3 filesystem. Other filesystems may not work correctly (this has to do with the cloud's default kernel as well as its ability to edit the image's files before boot).
The rest of the launch request is filled by default configurations, here they are:
- Request 3584 MB of memory (this is usually overriden)
- Request networking address from a pool named public
- Mount the partition to sda1
Cloud configuration, necessary configurations (#)
The previous section summed up the defaults and main assumptions. Opting to follow these conventions in your cloud leads to these configuration conclusions:
-
Install the workspace service in resource pool mode.
-
Configure an network for addresses to lease from and call it "public".
-
Create a cloud.properties file for your cloud with the values in this example file changed to reflect the correct URLs and identities.
It is best to distribute a unique cloud.properties for each user with the Cumulus credentials in the file already, this is easily set up when using the nimbus-new-user program and the Nimbus web application. See the user management section for details.
-
If you need to adjust the default memory request, add a line of text like so to the cloud.properties file you will distribute: vws.memory.request=256
-
Create a Repo base bucket on the repository node. This should be done by the Nimbus installer.
Cloud configuration property reference (#)
This section goes into more detail about the property file and commandline configurations. This is especially important to understand if you want to diverge from the defaults above.

All commands go through cloud-client.sh which in turn invokes the actual cloud client program. The cloud client is written in Java and installed at lib/globus/lib/workspace_client.jar.
Before calling this program, the script sets up some things:
- ../conf/cloud.properties is set as the user properties file
- ../lib/globus becomes the new GLOBUS_LOCATION (overriding anything previously set)
- ../lib/certs is set as a directory to add to the trusted X509 certificate directories for identity validations (the client verifies it is talking to the right servers). Adding the CA cert(s) of the workspace service certificates to this directory ensures that the user will not run into CA (trusted certificates) problems.
The cloud client program respects settings from three different places, listed here in the order of precedence:
-
Commandline arguments - If the client uses one of the optional flags listed in ./bin/cloud-client.sh --extrahelp, these values are used. Many things can be overriden this way, including the service contacts.
-
User properties file - An example of this is distributed with the cloud client.
Note that you can include different properties files and have your users switch between clouds using ./bin/cloud-client.sh --conf ./conf/some-file.
If no --conf argument is supplied, the default file cloud.properties needs to exist. If you need to change this in your client distribution for cosmetic reasons, you can do so by editing the one relevant line at the top of ./bin/cloud-client.sh
-
Embedded properties file - A properties file lives inside the workspace client jar (which is installed into lib/globus/lib/workspace_client.jar). This controls all the remaining configurations.
There are (intentionally) no fallback settings for many of the properties, they will be included in the cloud.properties file you give to a user:
- ssh.pubkey (Path to SSH public key to log in with)
- vws.factory (Host+port of Virtal Workspace Service)
- vws.factory.identity (Virtal Workspace Service X509 identity)
- vws.repository (Host+port of cumulus image repository)
- vws.repository.type=cumulus (currently cumulus is the only supported repository protocol
- vws.repository.canonicalid (The users canonical ID)
- vws.repository.s3id (The users S3 ID)
- vws.repository.s3key (The users S3 secret key)
- vws.repository.s3bucket (The bucket where the system stores images)
- vws.repository.s3basekey (The basename of all images in the bucket)
These are the embedded properties that are shipped with the cloud client, they can also exist in the cloud properties files to override the defaults:
# Default ms between polls vws.poll.interval=2000 # Default client behavior is to poll, not use asynchronous notifications vws.usenotifications=false # Default memory request vws.memory.request=3584 # Image repository base directory (only used for older GridFTP based clouds) vws.repository.basedir=/cloud/ # CA hash of target cloud (only used for advice in --security) vws.cahash=6045a439 # propagation setup for cloud (only used for older GridFTP based clouds) vws.propagation.scheme=scp vws.propagation.keepport=false # Metadata defaults vws.metadata.association=public vws.metadata.mountAs=sda1 vws.metadata.nicName=eth0 vws.metadata.cpuType=x86 vws.metadata.vmmType=Xen vws.metadata.vmmVersion=3 # Filename defaults for history directory vws.metadata.fileName=metadata.xml vws.depreq.fileName=deprequest.xml
User management (#)
In order to manage Nimbus user more easily a set of command line tools have been created.
- nimbus-new-user
- nimbus-list-users
- nimbus-edit-user
- nimbus-remove-user
All of the tools take as a single mandatory argument an email address, with the slight exception of nimbus-list-users. nimbus-list-users allows an administrator to query for users on their systems therefore it takes as an argument a query pattern. For example, if you wanted to look up all the users with email addresses at gmail.com you would run:
$ nimbus-list-users %@gmail.com
All of the command line tools take the argument --help and further information can be found there. For didactic purposes an example of a common session in which a new user is created, changes, listed and removed follows:
$ ./bin/nimbus-new-user [email protected] cert : /home/bresnaha/NIM/var/ca/tmpm3s0Vccert/usercert.pem key : /home/bresnaha/NIM/var/ca/tmpm3s0Vccert/userkey.pem dn : /O=Auto/OU=a645a24d-6183-4bbd-9537-b7260749c716/[email protected] canonical id : aa55655a-8552-11df-a58d-001de0a80259 access id : p3BR1WQTpio8JShc8YD7S access secret : LRY6lMgIFE5BioK5XRu7eZKecBDHjB35PVOqAmCLDm url : None web id : None cloud properties : /home/bresnaha/NIM/var/ca/tmpm3s0Vccert/cloud.properties $ $ ./bin/nimbus-edit-user -p NewPassWord [email protected] dn : /O=Auto/OU=a645a24d-6183-4bbd-9537-b7260749c716/[email protected] canonical id : aa55655a-8552-11df-a58d-001de0a80259 access id : p3BR1WQTpio8JShc8YD7S access secret : NewPassWord $ $ ./bin/nimbus-list-users %@nimbus.test dn : /O=Auto/OU=a645a24d-6183-4bbd-9537-b7260749c716/[email protected] canonical id : aa55655a-8552-11df-a58d-001de0a80259 access id : p3BR1WQTpio8JShc8YD7S access secret : NewPassWord display name : [email protected] $ $ ./bin/nimbus-remove-user [email protected] $
Per-user rights and allocations (#)
In the services/etc/nimbus/workspace-service/group-authz/ directory are the default policies for each user. You pick a pre-configured policy to apply to a new user. The "groups" are not a shared allocation but rather each group is a policy that describes a "type" of user.
Todo: describe what can be tracked on a per-user basis
Todo: speak of nimbus-new-user integration
Enabling the EC2 SOAP frontend (#)
After installing, the EC2 query frontend should be immediately operational. However if you wish you use the SOAP frontend as well, you must make a few configuration changes. To begin, see the $NIMBUS_HOME/services/etc/nimbus/elastic directory. The elastic.conf file here specifies what the EC2 "instance type" allocations should translate to and what networks should be requested from the underlying workspace service when VM create requests are sent.
By default, a Nimbus installation will enable this service:
https://10.20.0.1:8443/wsrf/services/ElasticNimbusService
But before the service will work, you must adjust a container configuration. This will account for some security related customs for EC2:
-
Secure message is used, but only on the request. No secure message envelope is sent around EC2 responses, therefore EC2 clients do not expect such a response. It relies on the fact that https is being used to protect responses.
Both integrity and encryption problems are relevant, be wary of any http endpoint being used with this protocol. For example, you probably want to make sure that add-keypair private key responses are encrypted (!).
-
Also, adjusting the container configuration gets around a timestamp format incompatibility we discovered (the timestamp is normalized after the message envelope signature/integrity is confirmed).
There is a sample container server-config.wsdd configuration to compare against here.
Edit the container deployment configuration:
Find the <requestFlow> section and comment out the existing WSSecurityHandler and add this new one:
<handler type="java:org.globus.wsrf.handlers.JAXRPCHandler"> <!-- enabled: --> <parameter name="className" value="org.nimbustools.messaging.gt4_0_elastic.rpc.WSSecurityHandler" /> <!-- disabled: --> <!--<parameter name="className" value="org.globus.wsrf.impl.security.authentication.wssec.WSSecurityHandler"/> --> </handler>
Now find the <responseFlow> section and comment out the existing X509SignHandler and add this new one:
<handler type="java:org.apache.axis.handlers.JAXRPCHandler"> <!-- enabled: --> <parameter name="className" value="org.nimbustools.messaging.gt4_0_elastic.rpc.SignHandler" /> <!-- disabled: --> <!--<parameter name="className" value="org.globus.wsrf.impl.security.authentication.securemsg.X509SignHandler"/>--> </handler>
If you don't make this configuration, you will see this error when trying to use an EC2 client.
Container restart required after the configuration change.
Configuring the EC2 Query frontend (#)
The EC2 Query frontend supports the same operations as the SOAP frontend. However, it does not run in the same container. It listens on HTTPS using Jetty. Starting with Nimbus 2.4, the query frontend is enabled and listens on port 8444. For instructions on changing this port, see the service ports section.
Configuration for the query frontend lives in the $NIMBUS_HOME/services/etc/nimbus/query/query.conf file.
The Query interface does not rely on X509 certificates for security. Instead, it uses a symmetric signature-based approach. Each user is assigned an access identifier and secret key. These credentials are also maintained by the service. Each request is "signed" by the client by generating a hash of parts of the request and attaching them. The service performs this same signature process and compares its result with the one included in the request.
There is support for creating query credentials in the nimbus-new-user program, for more information see the user management section.
There is support for distributing query tokens via the Nimbus Web application.
Changing the network ports of Nimbus and Cumulus (#)
The assorted Nimbus and Cumulus services use several network ports. Each is configured with sensible defaults, but you may change them if needed. Below are instructions for changing the port of each service.
Nimbus core services (default 8443)
Edit $NIMBUS_HOME/libexec/run-services.sh and change the PORT line to your desired port number. You must restart the service for changes to take effect.
Nimbus EC2 query frontend (default 8444)
Edit $NIMBUS_HOME/services/etc/nimbus/query/query.conf and adjust the https.port line. You must restart the service for changes to take effect.
Cumulus (default 8888)
The configuration options for Cumulus can be found in $NIMBUS_HOME/cumulus/etc/cumulus.ini. Under the heading [cb] there is a port entry. Change that value and restart Cumulus with the nimbusctl program.
[cb] installdir = <nimbus home> port = 8888 hostname = <hostname>
note: If this change is made after you have distributed a cloud.properties file to users then you will need to instruct your cloud-client users to change the value vws.repository=<hostname>:<port> in their local cloud.properties file.
Nimbus Web application (default 1443)
When enabled, the web application listens by default on port 1443. This and other configuration options are location in the $NIMBUS_HOME/web/nimbusweb.conf file. Changes to this file require a restart of the service.
Configuring the Nimbus Web interface (#)
Starting with Nimbus 2.4, the Nimbus Web application is bundled with the service distribution but is disabled by default. To enable it, edit the $NIMBUS_HOME/nimbus-setup.conf file and change the value of web.enabled to True. Next you should run nimbus-configure to propagate the change. Now you can use nimbusctl to start/stop the web application.
Once the web application has been configured, you can start to use it with the nimbus-new-user program (see the help output), see the user management section.
Using the Nimbus Web interface (#)
The Nimbus web application provides basic facilities for distributing new X509 credentials, EC2 query tokens, and cloud.properties files to users.
Previously this was a tedious process that was difficult to do in a secure way that was also user friendly. Nimbus Web allows an admin to upload credentials for a user and then send them a custom URL which invites them to create an account.
Once the web application has been configured, you can start to use it with the nimbus-new-user program (see the help output, -W flag), which allows you to very quickly add a user to the system and get the URL to distribute in your welcome email. The user management tools provide machine parsable output options that make it easy to incorporate into scripts as well (perhaps you would like to go further and create those emails entirely programmatically).
To get started, log into the web interface as a superuser and go to the Administrative Panel. This page has a section for creating users as well as viewing pending and existing users. The best way to do this is by using the nimbus-new-user tool, but this option is available for you to create accounts manually.
If you go the manual route: fill in the appropriate fields and upload an X509 certificate and (passwordless) key for the user. Note that the application expects plain text files, so depending on your browser you may need to rename files to have a .txt extension before you can upload them. Once the new account is created, you will be provided with a custom URL. You must paste this URL into an email to the user along with usage instructions.
When the user accesses the custom URL, they will be asked to create a password and login. Inside their account, they can download the certificate and key which were provided for them by the admin. Note that the design of the application attempts to maximize the security of the process, with several important features:
- The URL token can only be successfully used once. After a user creates a password and logs in, future attempts to access that URL will fail. This is to prevent someone from intercepting the URL and using it to access the user's credentials. If this happens, the real user will be unable to login and will (hopefully) contact the administrator immediately (there is a message urging them to do so).
- In the same spirit, the URL token will expire after a configurable number of hours (default: 12).
- The user's X509 private key can be downloaded once and only once. After this download occurs, the key will be deleted from the server altogether. In an ideal security system, no person or system will ever be in possession of a private key, except for the user/owner of the key itself. Because we don't follow this for the sake of usability, we attempt to minimize the time that the private key is in the web app database.
- When a URL token is accessed or a private key is downloaded, the time and IP address of this access is logged and displayed in the administrative panel.
- The nimbus-new-user tool can create a custom cloud.properties file to use with your cloud. This will have all the right configurations as well as the user's query credentials in the "vws.repository.s3id" and "vws.repository.s3key" fields for using Cumulus.
Configuring a different host certificate (#)
The Nimbus installer creates a Certificate Authority which is used for (among other things) generating a host certificate for the various services. There are three files involved in your host certificate and they are all generated during the install by the nimbus-configure program. By default, these files are placed in "$NIMBUS_HOME/var/" but you can control their placement with properties in the "$NIMBUS_HOME/nimbus-setup.conf" file.
- hostcert.pem - The host certificate. The certificate for the issuing CA must be in the Nimbus trusted-certs directory, in hashed format.
- hostkey.pem - The private key. Must be unencrypted and readable by the Nimbus user.
- keystore.jks - Some Nimbus services require this special Java Key Store format. The nimbus-configure program generates this file from the host cert and key. If you delete the file, it can be regenerated by running nimbus-configure again.
To use a custom host certificate, you can delete (or relocate) these three files, copy in your own hostcert.pem and hostkey.pem, and run nimbus-configure, which will generate the keystore.
CA Certs
It is important that the issuing CA cert is trusted by Nimbus (and any clients used to access the Nimbus services). This is done by placing the hashed form of the CA files in the trusted-certs directory, by default "$NIMBUS_HOME/var/ca/trusted-certs/". For example, these three files:Cumulus https
NOTE: If you are using Cumulus with https you will need to point it at the correct certificates as well. This is further explained here.3fc18087.0 3fc18087.r0 3fc18087.signing_policy
If you simply want to generate new host certificates using the Nimbus internal CA (perhaps using a different hostname), you can follow a similar procedure. Delete or relocate the hostcert.pem, hostkey.pem, and keystore.jks files and then run nimbus-configure. New files will be generated.
You can also keep these files outside of the Nimbus install (for example if you use the same host certificate for multiple services on the same machine. Just edit the $NIMBUS_HOME/nimbus-setup.conf file and adjust the hostcert, hostkey, and keystore properties. Then run nimbus-configure. If these files do not exist, they will be created.
Configuring Nimbus basics manually without the auto-configuration program (#)
What follows is the instructions for setting up a container as they existed before the auto-configuration program or the installer came into being (see here for information about the auto-configuration program). We are leaving in the docs because it provides some insight, especially for administrators that are preparing programmatic node configurations for their clusters (using systems such as Chef).
* Service hostname:
Navigate to the workspace-service configuration directory:
Edit the "ssh.conf" file:
Find this setting:
service.sshd.contact.string=REPLACE_WITH_SERVICE_NODE_HOSTNAME:22
... and replace the CAPS part with your service node hostname. This hostname and port should be accessible from the VMM nodes.
(The guide assumes you will have the same privileged account name on the service node and VMM nodes, but if not, this is where you would make the changes as you can read in the ssh.conf file comments).
* VMM names:
See the resource pool section to learn how to add VMM names.
* Networks:
Navigate to the workspace service networks directory:
The service is packaged with two sample network files, public and private.
You can name these files anything you want. The file names will be the names of the networks that are offered to clients. It's a convention to provide "public" and "private" but these can be anything. If you do change this, the cloud client configuration for what network(s) to request will need to be overriden in the cloud.properties file that you distribute to users.
The public file has some comments in it. Edit this file to have the one DNS line at the top and one network address to give out. The subnet and network you choose should be something the VMM node can bridge to (there are some advanced configs to be able to do DHCP and bridge addresses for addresses that are foreign to the VMM, but this is not something addressed in this guide).
192.168.0.1 fakepub1 192.168.0.3 192.168.0.1 192.168.0.255 255.255.255.0
It is possible to force specific MAC addresses for each IP address, see the file for syntax details. Usually the service will pick these for you from a pool of MAC addresses starting with a prefix that is configured in the "$NIMBUS_HOME/services/etc/nimbus/workspace-service/network.conf" file.
Resource pool and pilot configurations (#)
There are modules for two resource management strategies currently distributed with Nimbus: the default "resource pool" mode and the "pilot" mode.
The "resource pool" mode is where the service has direct control of a pool of VMM nodes. The service assumes it can start VMs
The "pilot" mode is where the service makes a request to a cluster's Local Resource Management System (LRMS) such as PBS. The VMMs are equipped to run regular jobs in domain 0. But if pilot jobs are submitted, the nodes are secured for VM management for a certain time period by the workspace service. If the LRM or administrator preempts/kills the pilot job earlier than expected, the VMM is no longer available to the workspace service.
The "services/etc/nimbus/workspace-service/other/resource-locator-ACTIVE.xml" file dictates what mode is in use (container restart required if this changes). See the available "services/etc/nimbus/workspace-service/other/resource-locator-*" files.
Resource pool (#)
This is the default, see the overview.
In the Zero to Cloud Guide, the configuration script that you interacted with at end of the SSH Setup section took care of configuring the workspace service with the first VMM to use.
A cloud with one VMM is perfectly reasonable for a test setup, but when it comes time to offer resources to others for real use, we bet you might want to add a few more. Maybe a few hundred more.
As of Nimbus 2.6, it is possible to configure the running service dynamically. You can interact with the scheduler and add and remove nodes on the fly. The nimbus-nodes program is what you use to do this.
Have a look at the help output:
cd $NIMBUS_HOME ./bin/nimbus-nodes -h
The following example assumes you have homogenous nodes. Each node has, let's say, 8GB RAM and you want to dedicate the nodes exclusively to hosting VMs. Some RAM needs to be saved for the system (in Xen for example this is "domain 0" memory), so we decide to offer 7.5GB to VMs. For RAM, there is no overcommit possible with Nimbus.
nimbus-nodes needs a running service
If the service is not running, the nimbus-nodes program will fail to adjust anything. Make sure the workspace service is running with "./bin/nimbusctl start".You can SSH to each node without password from the nimbus account, right?
service-node $ whoami nimbus service-node $ ssh nequals01
nequals01 $ ...
The nodes in the cluster are named based on numbers, so for example "nequals01", "nequals02", etc. This means we can construct the command with a for loop.
$ NODES="nequals01" $ for n in `seq -w 2 10`; do NODES="$NODES,nequals$n"; done $ echo $NODES nequals01,nequals02,nequals03,nequals04,nequals05,nequals06,nequals07,nequals08,nequals09,nequals10
With the $NODES variable in hand, we can make the node-addition call.
$ ./bin/nimbus-nodes --add $NODES --memory 7680
At any time you can use the "--list" action to see what the current state of the pool is.
There are several other options discussed in the nimbus-nodes -h text, we will highlight one of the most important ones here.
If you ever want to disable a VMM, use the live-update feature. After running the following command, no new VMs can be launched on the node. Any current VMs, however, will continue running. So this is a way to "drain" your nodes of work if there is maintenance coming up, etc.
$ ./bin/nimbus-nodes --update nequals08 --inactive
Pilot (#)
-
The first step to switching to the pilot based infrastructure is to make sure you have at least one working node configured with workspace-control, following the instructions in this guide as if you were not going to use dynamically allocated VMMs via the pilot.
If the only nodes available are in the LRM pool, it would be best to drain the jobs from one and take it offline while you confirm the setup.
-
Next, make sure that the system account the container is running in can submit jobs to the LRM. For example, run echo "/bin/true" | qsub
-
Next, decide how you would like to organize the cluster nodes, such that the request for time on the nodes from the workspace service in fact makes it end up with usable VMM nodes.
For example, if there are only a portion of nodes configured with Xen and workspace-control, you can set up a special node property (e.g. 'xen') or perhaps a septe queue or server. The service supports submitting jobs with node property requirements and also supports the full Torque/PBS '[queue][@server]' destination syntax if desired.
-
Copy the "services/etc/nimbus/workspace-service/other/resource-locator-pilot.xml" to "services/etc/nimbus/workspace-service/other/resource-locator-ACTIVE.xml"
The configuration comments in "services/etc/nimbus/workspace-service/pilot.conf" should be self explanatory. There are a few to highlight here (and note that advanced configs are in resource-locator-ACTIVE.xml).
-
HTTP digest access authentication based notifications is a mechanism for pilot notifications. Each message from a pilot process to the workspace service takes on the order of 10ms on our current testbed which is reasonable.
The contactPort setting is used to control what port the embedded HTTP server listens on. It is also the contact URL passed to the pilot program, an easy way to get this right is to use an IP address rather than a hostname.
Note the accountsPath setting. Navigate to that file ("services/etc/nimbus/workspace_service/pilot-authz.conf" by default) and change the shared secret to something not dictionary based and 15 or more characters. A script in that directory will produce suggestions.
This port may be blocked off entirely from WAN access via firewall if desired, only the pilot programs need to connect to it. If it is not blocked off, the use of HTTP digest access authentication for connections is still guarding access.
Alternatively, you can configure only SSH for these notifications as well as configure both and use SSH as a fallback mechanism. When used as a fallback mechanism, the pilot will try to contact the HTTP server and if that fails will then attempt to use SSH. Those message are written to a file and will be read when the workspace service recovers. This is an advanced configuration, setting up the infrastructure without this configured is recommended for the first pass (reduce your misconfiguration chances).
-
The maxMB setting is used to set a hard maximum memory allotment across all workspace requests (no matter what the authorization layers allow). This a "fail fast" setting, making sure dubious requests are not sent to the LRM.
To arrive at that number, you must arrive at the maximum amount of memory to give domain 0 in non-hosting mode. This should be as much as possible and you will also configure this later into the pilot program settings (the pilot will make sure domain 0 gets this memory back when returning the node from hosting mode to normal job mode).
When the node boots and xend is first run, you should configure things such that domain 0 is already at this memory setting. This way, it will be ready to give jobs as many resources as possible from its initial boot state.
Domain 0's memory is set in the boot pmeters. On the "kernel" line you can add a parameter like this: dom0_mem=2007M
If it is too high you will make the node unbootable, 2007M is an example from a 2048M node and was arrived at experimentally. We are working on ways to automatically figure out the highest number this can be without causing boot issues.
Take this setting and subtract at least 128M from it, allocating the rest for guest workspaces. Let's label 128M in this example as dom0-min and 2007 as dom0-max. Some memory is necessary for domain 0 to at least do privileged disk and net I/O for guest domains.
These two memory setting will be configured into the pilot to make sure domain 0 is always in the correct state. Domain 0's memory will never be set below the dom0-min setting and will always be returned to the dom0-max when the pilot program vacates the node.
Instead of letting the workspace request fail on the backend just before instantiation, the maxMB setting is configured in the service so b requests for more memory will be rejected up front.
So [ dom0-max minus dom0-min equals maxMB ]. And again maxMB is the maximum allowed for guest workspaces.
( You could make it smaller. But it would not make sense to make it bigger than [ dom0-max minus dom0-min ] because this will cause the pilot program itself to reject the request. )
-
The pilotPath setting must be gotten right and double checked. See this bugzilla item
-
-
Next, note your pilotPath setting and put a copy of workspacepilot.py there. Run chmod +x on it and that is all that should be necessary for the installation.
Python 2.3 or higher (though not Python 3.x) is also required but this was required for workspace-control as well.
A sudo rule to the xm program is also required but this was configured when you set up workspace-control. If the account the pilot jobs are run under is different than the account that runs workspace-control, copy the xm sudo rule for the account.
-
Open the workspacepilot.py file in an editor. These things must be configured correctly and require your intervention (i.e., the software cannot guess at them):
- Search for "secret: pw_here" around line 80. Replace "pw_here" with the shared secret you configured above.
- Below that, set the "minmem" setting to the value you chose above that we called dom0-min.
- Set the "dom0_mem" setting to the value you chose above that we called dom0-max.
The other configurations should be explained enough in the comments and they also usually do not need to be altered.
You might like to create a directory for the pilot's logfiles instead of the default setting of "/tmp" for the "logfiledir" configuration. You might also wish to septe out the config file from the program. The easiest way to do that is to configure the service to call a shell script instead of workspacepiloy.py. This in turn could wrap the call to the pilot program, for example: "/opt/workspacepilot.py -p /etc/workspace-pilot.conf [email protected]"
- Now restart the GT container and submit test workspace requests as usual (cloud requests work too).
Network configuration details (#)
While addresses for VMs are configured and chosen within the Nimbus service, they are physically queried via an external DHCPd service. There are two ways of arranging the DHCP configuration.
- Centralized -- a new or existing DHCP service that you configure with Nimbus-specific MAC to IP mappings. This is generally simpler to set up and is covered in the Zero-to-Cloud guide.
- Local -- a DHCP server is installed on every VMM node and automatically configured with the appropriate addresses just before a VM boots. This is more complicated to set up initially but can be preferable in certain scenarios.
Because Nimbus chooses the MAC address, it controls which DHCP entry will be retrieved by the VM. Additionally, ebtables rules are configured to ensure that a malicious or misconfigured VM cannot use another MAC or IP.
In a local DHCP scenario, workspace-control on each VMM manages the DHCP configuration file and injects entries just before each VM boots. To prevent DHCP broadcast requests from getting out to the LAN, an ebtables rule is enacted to force packets to a specific local interface.
Configuring local DHCP is not difficult, but you should exercise caution to ensure that the DHCP daemons on each VMM do not interfere with other networks. First of all, you must install an ISC-compatible DHCP server. This should be available on all Linux distributions.
Once installed, find the DHCP configuration location. Typically this is something like /etc/dhcp/dhcpd.conf or /etc/dhcp3/dhcpd.conf. Replace this file with the example in the workspace-control package: share/workspace-control/dhcp.conf.example and then edit it to include proper subnet declarations for your network. Afterwards, try restarting DHCP and checking logs to ensure that it started without error.
Next, edit the networks.conf file in etc/workspace-control/. Set the localdhcp option to true and take a look at the dhcp-bridges section to configure where DHCP packets are bridged to.
Finally, you may need to edit the sudo script that workspace-control uses to alter dhcp.conf and restart the service. This script is located at libexec/workspace-control/dhcp-config.sh. It expects the following defaults:
# Policy file for script to adjust DHCPD_CONF="/etc/dhcpd.conf" # Command to run before policy adjustment DHCPD_STOP="/etc/init.d/dhcpd stop" # Command to run after policy adjustment DHCPD_START="/etc/init.d/dhcpd start"
You should also ensure that this script can be called via sudo as the nimbus user.
Configuring a standalone context broker (#)
The context broker is used to facilitate one click clusters.
The context broker is installed and configured automatically starting with Nimbus 2.4, but there is not a dependency on any Nimbus service component. It can run by itself in a GT container. You can use it for deploying virtual clusters on EC2 for example without any other Nimbus service running (the cloud client #11 has an "ec2script" option that will allow you to do this).
If you want to install the broker separately from Nimbus, download the Nimbus source tarball, extract it, and run scripts/broker-build-and-install.sh with an appropriate $GLOBUS_LOCATION set.
To set up a standalone broker that is compatible with post-010 cloud clients, follow these steps:
-
Create a passwordless CA certificate.
You can do this from an existing CA. To unencrypt an RSA key, run: openssl rsa -in cakey.pem -out cakey-unencrypted.pem
Alternatively, you can use the CA created by the Nimbus installer under $NIMBUS_HOME/var/ca
-
Make very sure that the CA certificate and key files are read-only and private to the container running account.
-
Add the CA certificate to your container's trusted certificates directory. The context broker (running in the container) creates short term credentials on the fly for the VMs. The VMs use this to contact the broker: the container needs to be able to verify who is calling.
-
Navigate to "$NIMBUS_HOME/services/etc/nimbus-context-broker" and adjust the "caCertPath" and "caKeyPath" parameters in the "jndi-config.xml" file to point to the CA certificate and key files you created in previous steps.
Note that the old and new context brokers can both use the same CA certificate and key file.
-
Container restart is required.
Cumulus (#)
Cumulus is the S3 compliant repository management service for Nimbus.
Cumulus is an open source implementation of the Amazon S3 REST API. It is packaged with the Nimbus however it can be used without nimbus as well. Cumulus allows you to server files to users via a known and adopted REST API. Your clients will be able to access your data service with the Amazon S3 clients they already use.
Cumulus Configuration (#)
When the Cumulus server is run it expects to find a configuration file (typically called cumulus.ini) in one or all of the following locations:
- /etc/nimbus/cumulus.ini
- ~/.nimbus/cumulus.ini
- the same directory from which the program was launched
- file pointed to by the environment variable CUMULUS_SETTINGS_FILE
cumulus.ini in Nimbus
For Nimbus installations this file can be found at $NIMBUS_HOME/cumulus/etc/cumulus.iniEach file in the path is read in (provided it exists). The values found in each file override the values found in the previous file in this list.
Repository Location (#)
The backend storage system in Cumulus has been created with a modular interface that will allow us to add more sophisticated plugins in the future. Thus giving the administrator many powerful options. In the current implementation there is a single storage module which stores user files a mounted file system. The reliability and performance of Cumulus will thus be limited by the reliability and performance of that file system. Because of this Cumulus administrators will often want to specify a location for the repository.
Within the cumulus.ini file there is the [posix]:directory directive. This is the directory in which all of the files in the Cumulus repository will be stored. The names of the files in that directory will be obfuscated based on the bucket/key name. In order to discover what file belongs to what bucket/key you must use the user management tools (included with the Cumulus installation) There are a series of tools under the bin directory which start with nimbusauthz-* that can help with this. In most cases there will be no need for a system administrator to use these tools and they are provided for expert usage for problematic situations.
Using the boto client (#)
To use boto it is important to disable virtual host based buckets and to point the client at the right server. here is example code that will instantiate a boto S3Connection for use with CB:
cf = OrdinaryCallingFormat() hostname = "somehost.com" conn = S3Connection(id, pw, host=hostname, port=80, is_secure=False, calling_format=cf)
Using the s3cmd client (#)
Once you have the s3cmd successfully installed and configured you must modify the file: $HOME/.s3cfg in order to direct it at this server. Make sure the following key value pairs reflect the following changes:
access_key = <access id> secret_key = <access secret> host_base = <hostname of service> host_bucket = <hostname of server> use_https = False
Using HTTPS with Cumulus (#)
In order to use a secure https connection with cumulus you must edit the cumulus.ini file and provide it with a certificate and key pair. In a typical nimbus installation these are generated for you and placed at: $NIMBUS_HOME/var/hostcert.pem $NIMBUS_HOME/var/hostkey.pem To add them to the cumulus.ini file add the following lines:
[https] enabled=True key=/home/nimbus/var/hostkey.pem cert=/home/nimbus/var/hostcert.pem
Disk Usage Quotas (#)
Cumulus allows administrators to set disk space limits on a per user basis. By default users are created with unlimited space. To set a disk quota limit use the program NIMBUS_HOME/ve/bin/cumulus-quota Here is an example that will set the user [email protected] to a 100 byte limit:
$ ./ve/bin/cumulus-quota [email protected] 100 $ ./ve/bin/cumulus-list-users [email protected] friendly : [email protected] ID : Ar2yXcfdhImjMNeWGUHJZ password : ddOWFSC5rol9L6Tk14hA0QeS7valQdy38xeVvkFZwq quota : 100 canonical id : 21161ebe-862a-11df-a9ca-001de0a80259
Configuring Cumulus Options in Nimbus (#)
There are a few variables that Nimbus relies on to find information about its co-located Cumulus server. These variables are found in the file: ./services/etc/nimbus/workspace-service/cumulus.conf. They are normally written automatically by the nimbus-configure program but if an administrator makes some manual changes to their Nimbus installation some of these variables may been to be changed as well.
- cumulus.authz.db The cumulus authz database. By default it is an sqlite database and is located at $NIMBUS_HOME/cumulus/etc/authz.db
- cumulus.repo.dir The location of the cumulus posix backend file repository. By default this is $NIMBUS_HOME/cumulus/posixdata. Quite often users will want to change this to a more favorable location. Likely one with more disk space or faster disks.
- cumulus.repo.bucket The cumulus bucket in which all cloud client images are stored.
- cumulus.repo.prefix The prefix iwith which all image names are prepended.
LANTorrent (#)
LANTorrent is fast multicast file distribution protocol designed to saturate all the links in a switch. There are several optimizations planned for future releases of LANTorrent.
LANTorrent works best for the following scenarios:
- large file transfers (VM images are typically measured in gigabytes)
- local area switched network (typical for data center computer racks)
- file recipients are willing peers. Unlike other peer to peer transfer protocol, bittorrent for example, LANTorrent is not designed with leeches in mind. It is designed under the assumption that every peer is a willing and able participant.
- many endpoints request the same file at roughly the same time
LANTorrent Protocol (#)
When an endpoint wants a file it submits a request to a central agent. This agent aggregates request for files so that they can be sent out in an efficient single multi-cast session. Each request for a source file is stored until either N request on that file have been made or N' seconds have passed since the last request on that source file has been made. This allows for a user to request a single file in several unrelated session yet still have the file transfered in an efficient multi-cast session.
Once N requests for a given source file have been made or N' seconds have passed the destination set for the source file is determined. A chain of destination endpoints is formed such that each node receives from and sends to one other node. The first node receives from the repository and send to a peer node, that peer node sends to another, and so on until all receive the file. In this way all links of the switch are utilized to send directly to another endpoint in the switch. This results in the most efficient transfer on a LAN switched network.
Often times in a IaaS system a single network endpoint (VMM) will want multiple copies of the same file. Each file is booted as a virtual machine and that virtual machine will make distinct changes to that file as it runs, thus it needs it own copy of the file. However that file does not need to be transfered across the network more than once. LANTorrent will send the file to each endpoint once and instruct that endpoint to write it to multiple files if needed.
LANTorrent Configuration (#)
LANTorrent is not enabled in a default Nimbus installation. A few additional steps are required to enable it.
The following software is required on both service and VMM nodes:
- python2.4
- python simplejson
Lantorrent is run out of xinetd thus it must also be installed on all VMMs.
To install LANTorrent you must take the following steps:
- edit $NIMBUS_HOME/nimbus-setup.conf change lantorrent.enabled: False -> lantorrent.enabled: True
- edit $NIMBUS_HOME/services/etc/nimbus/workspace-service/other/common.conf change the value of propagate.extraargs: propagate.extraargs=$NIMBUS_HOME/lantorrent/bin/lt-request
- install lantorrent on VMM
- recursively copy $NIMBUS_HOME/lantorrent to /opt/nimbus/lantorrent.
- run ./vmm-install.sh on each node either run it as your workspace control user or specify the workspace control user as the first and only argument to the script.
- install LANTorrent into xinetd the vmm-install.sh script creates the file lantorrent. This file is ready to be copied into /etc/xinetd.d/. Once this is done restart xinetd (/etc/init.d/xinetd restart).
- change the propagation method. edit the file:
- <property name="repoScheme" value="scp" /> to:
- <property name="repoScheme" value="lantorrent" />
- restart the service: $NIMBUS_HOME/bin/nimbusctl restart
- [optional] if the path to nimbus on the workspace control nodes (VMMs) is not /opt/nimbus you will also need to edit a configuration file on all backends.
- <workspace control path>/control/etc/workspace-control/propagation.conf make sure the value of:
- lantorrentexe: /opt/nimbus/bin/ltclient.sh points to the proper location of you ltclinet.sh script. This should be a simple matter of changing /opt/nimbus to the path where you chose to install workspace control.
be sure to expand $NIMBUS_HOME to its full and actual path.
$NIMBUS_HOME/services/etc/nimbus/workspace-service/other/authz-callout-ACTIVE.xml and change:
-
In the file:
cumulus.ini in Nimbus
For Nimbus installations this file can be found at $NIMBUS_HOME/cumulus/etc/cumulus.iniEach file in the path is read in (provided it exists). The values found in each file override the values found in the previous file in this list.