Warning: This document describes an old release. Check here for the current version.

Nimbus 2.4 Admin Reference

This section explains some side tasks as well as some none-default configurations.


Notes on conf files (#)

The Nimbus conf files have many comments around each configuration. Check those out. Their content will be inlined here in the future.

See the $NIMBUS_HOME/services/etc/nimbus/workspace-service directory.


Enabling the EC2 SOAP frontend (#)

After installing, see the $NIMBUS_HOME/services/etc/nimbus/elastic directory. The .conf file here specifies what the EC2 "instance type" allocations should translate to and what networks should be requested from the underlying workspace service when VM create requests are sent.

By default, a Nimbus installation will enable this service:

https://10.20.0.1:8443/wsrf/services/ElasticNimbusService

But before the service will work, you must adjust a container configuration. This will account for some security related customs for EC2:

  • Secure message is used, but only on the request. No secure message envelope is sent around EC2 responses, therefore EC2 clients do not expect such a response. It relies on the fact that https is being used to protect responses.

    Both integrity and encryption problems are relevant, be wary of any http endpoint being used with this protocol. For example, you probably want to make sure that add-keypair private key responses are encrypted (!).

  • Also, adjusting the container configuration gets around a timestamp format incompatibility we discovered (the timestamp is normalized after the message envelope signature/integrity is confirmed).

There is a sample container server-config.wsdd configuration to compare against here.

Edit the container deployment configuration:

nimbus $ nano -w etc/globus_wsrf_core/server-config.wsdd

Find the <requestFlow> section and comment out the existing WSSecurityHandler and add this new one:

    <handler type="java:org.globus.wsrf.handlers.JAXRPCHandler">

        <!-- enabled: -->
        <parameter name="className"
                   value="org.nimbustools.messaging.gt4_0_elastic.rpc.WSSecurityHandler" />

        <!-- disabled: -->
        <!--<parameter name="className"
                   value="org.globus.wsrf.impl.security.authentication.wssec.WSSecurityHandler"/> -->
    </handler>

Now find the <responseFlow> section and comment out the existing X509SignHandler and add this new one:

    <handler type="java:org.apache.axis.handlers.JAXRPCHandler">

        <!-- enabled: -->
        <parameter name="className"
                   value="org.nimbustools.messaging.gt4_0_elastic.rpc.SignHandler" />

        <!-- disabled: -->
        <!--<parameter name="className" 
                       value="org.globus.wsrf.impl.security.authentication.securemsg.X509SignHandler"/>-->
    </handler>

If you don't make this configuration, you will see this error when trying to use an EC2 client.

Container restart required after the configuration change.


Configuring the EC2 Query frontend (#)

The EC2 Query frontend supports the same operations as the SOAP frontend. However, it does not run in the same container. It listens on HTTPS using Jetty. Starting with Nimbus 2.4, the query frontend is enabled and listens on port 8444.

Configuration for the query frontend lives in the $NIMBUS_HOME/services/etc/nimbus/query directory. It contains a configuration file query.conf and a sample user mapping file users.txt.

The Query interface does not rely on X509 certificates for security. Instead, it uses a symmetric signature-based approach. Each user is assigned an access identifier and secret key. These credentials are also maintained by the service. Each request is "signed" by the client by generating a hash of parts of the request and attaching them. The service performs this same signature process and compares its result with the one included in the request.

For the initial release of the Query frontend, Nimbus users are mapped to Query credentials in a flat text file. By default, this file is located at $GLOBUS_LOCATION/etc/nimbus/query/users.txt but it may be placed elsewhere by altering the query.usermap.path configuration value in query.conf. This file must be manually managed but in the near future it will be tied into the administrative web application. Changes to this file do not require a container restart.

To ease the process of authorizing users for the Query interface during the period before support is added to the administrative web application, we have added a utility to the cloud-admin.sh tool. This utility generates a secret key for a user and adds an entry to the users.txt file.

# ./cloud-admin.sh --add-query-dn "/O=Grid/OU=GlobusTest/OU=uchicago.edu/CN=Some User"
Generated query credentials for user:
	Access ID: b9747c9a
	Secret key: vay/1xelRSr9Koq2MX09S+SvD3vrSQIsmfO4Cq16fZY=
*Securely* distribute these tokens to the user.

This utility is by no means a complete administrative tool for query users. For all other management, you'll need to edit the users.txt file directly.

In addition to this utility, there is basic support for distributing query tokens via the Nimbus Web application. The admin can paste a user's credentials into the app and the user can retrieve them with their username and password. In the near future this functionality will be greatly expanded to allow management of tokens directly in this interface.


Configuring the Nimbus Web interface (#)

Starting with Nimbus 2.4, the Nimbus Web application is bundled with the service distribution but is disabled by default. To enable it, edit the $NIMBUS_HOME/nimbus-setup.conf file and change the value of web.enabled to True. Next you should run nimbus-configure to propagate the change. Now you can use nimbusctl to start/stop the web application.

By default, the web application listens on port 1443. This and other configuration options are location in the $NIMBUS_HOME/web/nimbusweb.conf file. Changes to this file require a restart of the service.


Using the Nimbus Web interface (#)

While the Nimbus Web application is still under heavy development, its current functionality is useful enough to release in its early form. In the 2.4 release, Nimbus Web provides basic facilities for distributing new X509 credentials and EC2 query tokens to users. Previously this was a tedious process that was difficult to do in a secure way that was also user friendly. Nimbus Web allows an admin to upload credentials for a user and then send them a custom URL which invites them to create an account.

To get started, log into the web interface as a superuser and go to the Administrative Panel. This page has a section for creating users as well as viewing pending and existing users. The initial release does not have embedded CA functionality (it is being planned). You must use an external CA to generate the user credentials and there is no way to distribute a password so the key should be passwordless. You must also manually authorize the new user for the Nimbus service (it is recommended that you use the cloud-admin tool for quickly adding new users).

Create a new user by filling in the appropriate fields and uploading an X509 certificate and key for the user. Note that the application expects plain text files, so depending on your browser you may need to rename files to have a .txt extension before you can upload them. Once the new account is created, you will be provided with a custom URL. You must paste this URL into an email to the user along with usage instructions.

When the user accesses the custom URL, they will be asked to create a password and login. Inside their account, they can download the certificate and key which were provided for them by the admin. Note that the design of the application attempts to maximize the security of the process, with several important features:

  • The URL token can only be successfully used once. After a user creates a password and logs in, future attempts to access that URL will fail. This is to prevent someone from intercepting the URL and using it to access the user's credentials. If this happens, the real user will be unable to login and will (hopefully) contact the administrator immediately (there is a message urging them to do so).
  • In the same spirit, the URL token will expire after a configurable number of hours (default: 12).
  • The user private key can be downloaded once and only once. After this download occurs, the key will be deleted from the server altogether. In an ideal security system, no person or system will ever be in possession of a private key, except for the user/owner of the key itself. Because we don't follow this for the sake of usability, we attempt to minimize the time that the private key is in the web app database.
  • When a URL token is accessed or a private key is downloaded, the time and IP address of this access is logged and displayed in the administrative panel.

Configuring a different host certificate (#)

The Nimbus installer creates a Certificate Authority which is used for (among other things) generating a host certificate for the various services. There are three files involved in your host certificate and they are all generated during the install by the nimbus-configure program. By default, these files are placed in "$NIMBUS_HOME/var/" but you can control their placement with properties in the "$NIMBUS_HOME/nimbus-setup.conf" file.

  • hostcert.pem - The host certificate. The certificate for the issuing CA must be in the Nimbus trusted-certs directory, in hashed format.
  • hostkey.pem - The private key. Must be unencrypted and readable by the Nimbus user.
  • keystore.jks - Some Nimbus services require this special Java Key Store format. The nimbus-configure program generates this file from the host cert and key. If you delete the file, it can be regenerated by running nimbus-configure again.

To use a custom host certificate, you can delete (or relocate) these three files, copy in your own hostcert.pem and hostkey.pem, and run nimbus-configure, which will generate the keystore.

NOTE: It is important that the issuing CA cert is trusted by Nimbus (and any clients used to access the Nimbus services). This is done by placing the hashed form of the CA files in the trusted-certs directory, by default "$NIMBUS_HOME/var/ca/trusted-certs/". For example, these three files:

3fc18087.0
3fc18087.r0
3fc18087.signing_policy

If you simply want to generate new host certificates using the Nimbus internal CA (perhaps using a different hostname), you can follow a similar procedure. Delete or relocate the hostcert.pem, hostkey.pem, and keystore.jks files and then run nimbus-configure. New files will be generated.

You can also keep these files outside of the Nimbus install (for example if you use the same host certificate for multiple services on the same machine. Just edit the $NIMBUS_HOME/nimbus-setup.conf file and adjust the hostcert, hostkey, and keystore properties. Then run nimbus-configure. If these files do not exist, they will be created.


Configuring Nimbus basics manually without the auto-configuration program (#)

What follows is the instructions for setting up a container as they existed before the auto-configuration program or the installer came into being (see here for information about the auto-configuration program).


* Service hostname:

Navigate to the workspace-service configuration directory:

nimbus $ cd $GLOBUS_LOCATION/etc/nimbus/workspace-service

Edit the "ssh.conf" file:

nimbus $ nano -w ssh.conf

Find this setting:

service.sshd.contact.string=REPLACE_WITH_SERVICE_NODE_HOSTNAME:22

... and replace the CAPS part with your service node hostname. This hostname and port should be accessible from the VMM nodes.

(The guide assumes you will have the same privileged account name on the service node and VMM nodes, but if not, this is where you would make the changes as you can read in the ssh.conf file comments).

* VMM names:

Navigate to the workspace service VMM pools directory:

nimbus $ cd $GLOBUS_LOCATION/etc/nimbus/workspace-service/vmm-pools

Each file in this directory represents a distinct pool of VMM nodes that are available for the service to run VMs on. Leave this as one pool (one file, the example file).

Edit the example "pool1" file to list only one test VMM node, the node where you installed Xen above. List the amount of memory you would like allocated to the guest VMs.

some-vmm-node 1024

You can SSH there without password from the nimbus account, right?

nimbus $ ssh some-vmm-node
nimbus@some-vmm-node $ ...

* Networks:

Navigate to the workspace service networks directory:

nimbus $ cd $GLOBUS_LOCATION/etc/nimbus/workspace-service/network-pools/

The service is packaged with two sample network files, public and private.

You can name these files anything you want. The file names will be the names of the networks that are offered to clients. It's a convention to provide "public" and "private" but these can be anything.

The public file has some comments in it. Edit this file to have the one DNS line at the top and one network address to give out. The subnet and network you choose should be something the VMM node can bridge to (there are some advanced configs to be able to do DHCP and bridge addresses for addresses that are foreign to the VMM, but this is not something addressed in this guide).

nimbus $ nano -w public
192.168.0.1
fakepub1 192.168.0.3 192.168.0.1 192.168.0.255 255.255.255.0

Resource pool and pilot configurations (#)

There are modules for two resource management strategies currently distributed with Nimbus: the default "resource pool" mode and the "pilot" mode.

The "resource pool" mode is where the service has direct control of a pool of VMM nodes. The service assumes it can start VMs

The "pilot" mode is where the service makes a request to a cluster's Local Resource Management System (LRMS) such as PBS. The VMMs are equipped to run regular jobs in domain 0. But if pilot jobs are submitted, the nodes are secured for VM management for a certain time period by the workspace service. If the LRM or administrator preempts/kills the pilot job earlier than expected, the VMM is no longer available to the workspace service.

The "etc/nimbus/workspace-service/other/resource-locator-ACTIVE.xml" file dictates what mode is in use (container restart required if this changes). See the available "etc/nimbus/workspace-service/other/resource-locator-*" files.

Resource pool (#)

This is the default, see the overview.

A directory of files is located at "etc/nimbus/workspace-service/vmm-pools/"

The pool file format is currently very simple: for each node in the pool, list the hostname and the amount of RAM it can spare for running guest VMs.

Optionally, you can also specify that certain hosts can only support a subset of the available networking associations (see the file comments for syntax).

If you change these configuration files after starting the container, only a fresh container reboot will actually incur the changes.

  • If you add a node, this will be available immediately after the container reboot.
  • If you remove a node that is currently in use, no new deployments will be mapped to this VMM. However, this will not destroy (or migrate) any current VMs running there. If that is necessary it currently needs to be accomplished explicitly.
  • If you change a node that is currently in use, the change will take effect for the next lease.

    If you've removed support for an association on a VMM that the current VM(s) is using, this change will not destroy (or migrate) the VM(s) to adjust to this restriction. If that is necessary it currently needs to be accomplished explicitly.

    If you've reduced the memory allocation below what the current VM(s) on the node is/are currently using, this will not destroy (or migrate) the current VM(s) to adjust to this restriction. If that is necessary it currently needs to be accomplished explicitly. Once the VM(s) are gone, the maximum memory available on that VMM will be the new, lower maximum.

Pilot (#)

  1. The first step to switching to the pilot based infrastructure is to make sure you have at least one working node configured with workspace-control, following the instructions in this guide as if you were not going to use dynamically allocated VMMs via the pilot.

    If the only nodes available are in the LRM pool, it would be best to drain the jobs from one and take it offline while you confirm the setup.

  2. Next, make sure that the system account the container is running in can submit jobs to the LRM. For example, run echo "/bin/true" | qsub

  3. Next, decide how you would like to organize the cluster nodes, such that the request for time on the nodes from the workspace service in fact makes it end up with usable VMM nodes.

    For example, if there are only a portion of nodes configured with Xen and workspace-control, you can set up a special node property (e.g. 'xen') or perhaps a septe queue or server. The service supports submitting jobs with node property requirements and also supports the full Torque/PBS '[queue][@server]' destination syntax if desired.

  4. Copy the "etc/nimbus/workspace-service/other/resource-locator-pilot.xml" to "etc/nimbus/workspace-service/other/resource-locator-ACTIVE.xml"

    The configuration comments in "etc/nimbus/workspace-service/pilot.conf" should be self explanatory. There are a few to highlight here (and note that advanced configs are in resource-locator-ACTIVE.xml).

    • HTTP digest access authentication based notifications is a mechanism for pilot notifications. Each message from a pilot process to the workspace service takes on the order of 10ms on our current testbed which is reasonable.

      The contactPort setting is used to control what port the embedded HTTP server listens on. It is also the contact URL passed to the pilot program, an easy way to get this right is to use an IP address rather than a hostname.

      Note the accountsPath setting. Navigate to that file ("etc/nimbus/workspace_service/pilot-authz.conf" by default) and change the shared secret to something not dictionary based and 15 or more characters. A script in that directory will produce suggestions.

      This port may be blocked off entirely from WAN access via firewall if desired, only the pilot programs need to connect to it. If it is not blocked off, the use of HTTP digest access authentication for connections is still guarding access.

      Alternatively, you can configure only SSH for these notifications as well as configure both and use SSH as a fallback mechanism. When used as a fallback mechanism, the pilot will try to contact the HTTP server and if that fails will then attempt to use SSH. Those message are written to a file and will be read when the workspace service recovers. This is an advanced configuration, setting up the infrastructure without this configured is recommended for the first pass (reduce your misconfiguration chances).

    • The maxMB setting is used to set a hard maximum memory allotment across all workspace requests (no matter what the authorization layers allow). This a "fail fast" setting, making sure dubious requests are not sent to the LRM.

      To arrive at that number, you must arrive at the maximum amount of memory to give domain 0 in non-hosting mode. This should be as much as possible and you will also configure this later into the pilot program settings (the pilot will make sure domain 0 gets this memory back when returning the node from hosting mode to normal job mode).

      When the node boots and xend is first run, you should configure things such that domain 0 is already at this memory setting. This way, it will be ready to give jobs as many resources as possible from its initial boot state.

      Domain 0's memory is set in the boot pmeters. On the "kernel" line you can add a parameter like this: dom0_mem=2007M

      If it is too high you will make the node unbootable, 2007M is an example from a 2048M node and was arrived at experimentally. We are working on ways to automatically figure out the highest number this can be without causing boot issues.

      Take this setting and subtract at least 128M from it, allocating the rest for guest workspaces. Let's label 128M in this example as dom0-min and 2007 as dom0-max. Some memory is necessary for domain 0 to at least do privileged disk and net I/O for guest domains.

      These two memory setting will be configured into the pilot to make sure domain 0 is always in the correct state. Domain 0's memory will never be set below the dom0-min setting and will always be returned to the dom0-max when the pilot program vacates the node.

      Instead of letting the workspace request fail on the backend just before instantiation, the maxMB setting is configured in the service so b requests for more memory will be rejected up front.

      So [ dom0-max minus dom0-min equals maxMB ]. And again maxMB is the maximum allowed for guest workspaces.

      ( You could make it smaller. But it would not make sense to make it bigger than [ dom0-max minus dom0-min ] because this will cause the pilot program itself to reject the request. )

    • The pilotPath setting must be gotten right and double checked. See this bugzilla item

  5. Next, note your pilotPath setting and put a copy of workspacepilot.py there. Run chmod +x on it and that is all that should be necessary for the installation.

    Python 2.3 or higher (though not Python 3.x) is also required but this was required for workspace-control as well.

    A sudo rule to the xm program is also required but this was configured when you set up workspace-control. If the account the pilot jobs are run under is different than the account that runs workspace-control, copy the xm sudo rule for the account.

  6. Open the workspacepilot.py file in an editor. These things must be configured correctly and require your intervention (i.e., the software cannot guess at them):

    • Search for "secret: pw_here" around line 80. Replace "pw_here" with the shared secret you configured above.
    • Below that, set the "minmem" setting to the value you chose above that we called dom0-min.
    • Set the "dom0_mem" setting to the value you chose above that we called dom0-max.

    The other configurations should be explained enough in the comments and they also usually do not need to be altered.

    You might like to create a directory for the pilot's logfiles instead of the default setting of "/tmp" for the "logfiledir" configuration. You might also wish to septe out the config file from the program. The easiest way to do that is to configure the service to call a shell script instead of workspacepiloy.py. This in turn could wrap the call to the pilot program, for example: "/opt/workspacepilot.py -p /etc/workspace-pilot.conf $@"

  7. Now restart the GT container and submit test workspace requests as usual (cloud requests work too).

Network configuration details (#)

For the Workspace backend to support networking information delivery to VMs, you are required to install DHCP and ebtables on each hypervisor node. When networking information in a workspace needs to change at its startup (which is typical upon deployment), workspace-control will make a call via sudo to a program that adds a MAC address to IP mapping into the local DHCP server for each of the workspace's NICs that need to be configured. It will also adjust ebtables rules for each of the workspace's NICs that need to be configured: these make sure the NICs are using the proper MAC and IP address as well as directing DHCP requests to the local DHCP server only.

To actually enact networking changes, the VM must set its own L3 networking information (IP address, default gateway, etc) from inside the VM. Currently we only support delivery of the information via DHCP. Booting into DHCP client mode is well supported in virtually every operating system in existence. Previously we passed information via kernel parameters which required a special understanding inside the VM. The result of using DHCP is that workspace images are easier to create and easier to maintain.

A DHCP server is required to run on each hypervisor node. The purpose of this server is to respond to broadcast requests from workspace's that are booting locally. Before starting a VM, if any of its NICs need to be configured via DHCP, workspace-control will call out to "dhcp-config.sh" via sudo, passing it a specific MAC address to IP mapping (as well as other information to be passed to the VM such as hostname, dns servers, broadcast address, default gateway, etc).

  • "Won't this interfere with my current DHCP server?" No.
  • "Will this cause unwanted packets on my physical LAN?" No.
  • "Will other workspaces be able to send broadcasts and get the wrong DHCP lease?" No.

In addition to a DHCP server, we also insert custom ebtables rules when the workspace is deployed. These rules accomplish three things:

  1. Only packets with the correct MAC address for this virtual interface are permitted.
  2. Broadcasted DHCP requests are only permitted to be bridged to the correct virtual interface (configured in workspace-control's configuration file).
  3. Only packets with the correct IP address for this virtual interface are permitted (as the NIC does not have an IP address yet when making a DHCP request, the IP check only happens if it is not a DHCP request.

A version of the workspace DHCP design document is available here: pdf.


Configuring a standalone context broker (#)

The context broker is used to facilitate one click clusters.

The context broker (see above) is installed and configured automatically starting with Nimbus 2.4, but there is not a dependency on any Nimbus service component. It can run by itself in a GT container. You can use it for deploying virtual clusters on EC2 for example without any other Nimbus service running (the cloud client #11 has an "ec2script" option that will allow you to do this).

If you want to install the broker separately from Nimbus, download the Nimbus source tarball, extract it, and run scripts/broker-build-and-install.sh with an appropriate $GLOBUS_LOCATION set.

To set up a standalone broker that is compatible with post-010 cloud clients, follow these steps:

  1. Create a passwordless CA certificate.

    You can do this from an existing CA. To unencrypt an RSA key, run: openssl rsa -in cakey.pem -out cakey-unencrypted.pem

    Alternatively, you can use the CA created by the Nimbus installer under $NIMBUS_HOME/var/ca

  2. Make very sure that the CA certificate and key files are read-only and private to the container running account.

  3. Add the CA certificate to your container's trusted certificates directory. The context broker (running in the container) creates short term credentials on the fly for the VMs. The VMs use this to contact the broker: the container needs to be able to verify who is calling.

  4. Navigate to "$GLOBUS_LOCATION/etc/nimbus-context-broker" and adjust the "caCertPath" and "caKeyPath" parameters in the "jndi-config.xml" file to point to the CA certificate and key files you created in previous steps.

    Note that the old and new context brokers can both use the same CA certificate and key file.

  5. Container restart is required.