Mirantis Official Blog: May 2011

Friday, May 27, 2011

OpenStack Nova and Dashboard authorization using existing LDAP

Our current integration task involves using goSA as the central management utility. goSA internally uses the LDAP repository for all of its data. So we had to find a solution to make both OpenStack Nova and Dashboard authenticate and authorize users using goSA's LDAP structures.

LDAP in Nova

Nova stores its users, projects and roles (global and per-project) in LDAP. Necessary schema files are in /nova/auth dir in the Nova source distribution. The following describes how Nova stores each of these object types.

Users are stored as objects with a novaUser class. They have mandatory accessKey, secretKey and isNovaAdmin (self-explanatory) attributes along with customizable attributes set by flags ldap_user_id_attribute (uid by default) and ldap_user_name_attribute (cn). To use the latter ones, it assigns person, organizationalPerson and inetOrgPerson to all newly created users. All users are stored and searched for in the LDAP subtree defined by ldap_user_subtree and ldap_user_unit.

If you want to manage user creation and deletion from some other place (such as goSA in our case), you can set the ldap_user_modify_only flag to True.

Projects are objects with the widely used groupOfNames class in the subtree defined by the ldap_project_subtree flag. Nova uses the cn attribute for the project name, description for description, member for the list of members' DNs, owner for the project manager's DN. All of these attributes are common for user (and any object) groups management, so it's easy to integrate Nova projects with an existing user groups management system (e.g. goSA).

Roles are also stored as groupOfNames, with similar cn, description and member attributes. Nova has hard-coded roles: cloudadmin, itsec, sysadmin, netadmin, developer. Global roles are stored in a subtree defined by role_project_subtree, cn's are defined by the ldap_cloudadmin, ldap_itsec, ldap_sysadmin, ldap_netadmin and ldap_developer flags respectively. Per-project roles are stored right under the project's DN with cn set to the role's name.

LDAP in Dashboard

To make Dashboard authorize users in LDAP, I use the django-ldap-auth module.
First, you need to install it using your preferred package manager (easy_install django-auth-ldap is sufficient). Second, you need to add it to Dashboard's local_settings.py in AUTHENTICATION_BACKENDS and set up AUTH_LDAP_SERVER_URI to your LDAP URI and AUTH_LDAP_USER_DN_TEMPLATE to Python's template of users' DN; in our case, it should be "ldap_user_id_attribute=%(user)s,ldap_user_subtree".

Note that in local_settings.py you override default settings, so if you want to just add a backend to AUTHENTICATION_BACKENDS, you should use +=. Also if you want to totally disable ModelBackend like we did, you can use = as well.

Also note that to make Dashboard work, you'll have to create an account in Nova with admin privileges and a project with the same name as the account. You can either set all parameters in LDAP by hand or add it using nova-manage user admin using one of usernames from LDAP.

Configuration examples

Let's say goSA is managing the organization exampleorg in the domain example.com on LDAP at ldap://ldap.example.com. To make use of its users and groups for Nova's user, projects and roles, we wrote configs like this:

By the way, to make goSA the central user management utility, we created a special plugin that manages Nova users. The plugin can be found here. It looks like this:

Thursday, May 19, 2011

Shared storage for OpenStack based on DRBD

Storage is a tricky part of the cloud environment. We want it to be fast, to be network-accessible and to be as reliable as possible. One way is to go to the shop and buy yourself a SAN solution from a prominent vendor for solid money. Another way is to take commodity hardware and use open source magic to turn it into distributed network storage. Guess what we did?

We have several primary goals ahead. First, our storage has to be reliable. We want to survive both minor and major hardware crashes - from HDD failure to host power loss. Second, it must be flexible enough to slice it fast and easily and resize slices as we like. Third, we will manage and mount our storage from cloud nodes over the network. And, last but not the least, we want decent performance from it.

For now, we have decided on the DRBD driver for our storage. DRBD® refers to block devices designed as a building block to form high availability (HA) clusters. This is done by mirroring a whole block device via an assigned network. DRBD can be understood as network-based RAID-1. It has lots of features, has been tested and is reasonably stable.

DRBD has been supported by the Linux kernel since version 2.6.33. It is implemented as a kernel module and included in the mainline. We can install the DRBD driver and command line interface tools using a standard package distribution mechanism; in our case it is Fedora 14:

The DRBD configuration file is /etc/drbd.conf, but usually it contains only 'include' statements. The configuration itself resides in global_common.conf and *.res files inside /etc/drbd.d/. An important parameter in global_common.conf is 'protocol'. It defines the sync level of the replication:

A (async). Local write operations on the primary node are considered completed as soon as the local disk write has occurred, and the replication packet has been placed in the local TCP send buffer. Data loss is possible in case of fail-over.

B (semi-sync or memory-sync). Local write operations on the primary node are considered completed as soon as the local disk write has occurred, and the replication packet has reached the peer node. Data loss is unlikely unless the primary node is irrevocably destroyed.

C (sync). Local write operations on the primary node are considered completed only after both the local and the remote disk write have been confirmed. As a result, loss of a single node is guaranteed not to lead to any data loss. This is the default replication mode.

Other sections of the common configuration are usually left blank and can be redefined in per-resource configuration files. To create a usable resource, we must create a configuration file for our resource in /etc/drbd.d/drbd0.res. Basic parameters for the resource are:

Name of the resource. Defined with 'resource' parameter, open main configuration section.

'on' directive opens the host configuration section. Only 2 'on' host sections are allowed per resource. Common parameters for both hosts can be defined once in the main resource configuration section.

'address' directive is unique to each host and must contain the IP-address and port number to which the DRBD driver listens.

'device' directive defines the path to the device created on the host for the DRBD resource.

'disk' is the path to the back-end device for the resource. This can be a hard drive partition (i.e. /dev/sda1), soft- or hardware RAID device, LVM Logical Volume or any other block device, configured by the Linux device-mapper infrastructure.

'meta-disk' defines how DRBD stores meta-data. It can be 'internal' when meta-data resides on the same back-end device as user data, or 'external' on a separate device.

Configuration Walkthrough

We are creating a relatively simple configuration: one DRBD resource shared between two nodes. On each node, the back-end for the resource is the software RAID-0 (stripe) device /dev/md3 made of two disks. The hosts are connected back-to-back by GigabitEthernet interfaces with private addresses.

As we need write access to the resource on both nodes, we must make it 'primary' on both nodes. A DRBD device in the primary role can be used unrestrictedly for read and write operations. This mode is called 'dual-primary' mode. Dual-primary mode requires additional configuration. In the 'startup' section directive, 'become-primary-on' is set to 'both'. In the 'net' section, the following is recommended:

The 'allow-two-primaries' directive allows both ends to send data.
Next, three parameters define I/O errors handling.
The 'sndbuf-size' is set to 0 to allow dynamic adjustment of the TCP buffer size.

Resource configuration with all of these considerations applied will be as follows:

Enabling Resource For The First Time

To create the device /dev/drbd0 for later use, we use the drbdadm command:

After the front-end device is created, we bring the resource up:

This command set must be executed on both nodes. We may collapse the steps drbdadm attach, drbdadm syncer, and drbdadm connect into one, by using the shorthand command drbdadm up.
Now we can observe the /proc/drbd virtual status file and get the status of our resource:

We must now synchronize resources on both nodes. If we want to replicate data that are already on one of the drives, it's important to run the next command on the host which contains data. Otherwise, this can be issued on any of two hosts.

This command puts the node host1 in 'primary' mode and makes it the synchronization source. This is reflected in the status file /proc/drbd:

We can adjust the syncer rate to make initial and background synchronization faster. To speed up the initial sync drbdsetup command used:

This allows us to consume almost all bandwidth of Gigabit Ethernet. The background syncer rate is configured in the corresponding config file section:

The exact rate depends on available bandwidth and must be about 0.3 of the slowest I/O subsystem (network or disk). DRBD seems to make it slower if it interferes with data flow.

LVM Over DRBD Configuration

Configuration of LVM over DRBD requires changes to /etc/lvm/lvm.conf. First, physical volume is created:

This command writes LVM Physical Volume data on the drbd0 device and also on the underlying md3 device. This can pose a problem as LVM default behavior is to scan all block devices for the LVM PV signatures. This means two devices with the same UUID will be detected and an error issued. This can be avoided by excluding /mnt/md3 from scanning in the /etc/lvm/lvm.conf file by using the 'filter' parameter:

The vgscan command must be executed after the file is changed. It forces LVM to discard its configuration cache and re-scan the devices for PV signatures.
Different 'filter' configurations can be used, but it must ensure that: 1. DRBD devices used as PVs are accepted (included); 2. Corresponding lower-level devices are rejected (excluded).

It is also nessesary to disable the LVM write cache:

These steps must be repeated on the peer node. Now we can create a Volume Group using the configured PV /dev/drbd0 and Logical Volume in this VG. Execute these commands on one of nodes:

To make use of this VG and LV on the peer node, we must make it active on it:

When the new PV is configured, it is possible to proceed to adding it to the Volume Group or creating a new one from it. This VG can be used to create Logical Volumes as usual.

Conclusion
We are going to install Openstack on nodes with shared storage as a private cloud controller. The architecture of our system presumes that storage volumes will reside on the same nodes as nova-compute. This makes it very important to have some level of disaster survival on the cloud nodes.

With DRBD we can survive any I/O errors on one of nodes. DRBD internal error handling can be configured to mask any errors and go to diskless mode. In this mode, all I/O operations are transparently redirected from the failed node to the replicant. This gives us time to restore a faulty disk system.

If we have a major system crash, we still have all of the data on the second node. We can use them to restore or replace the failed system. Network failure can put us into a 'split brain' situation, when data differs between hosts. This is dangerous, but DRBD also has rather powerful mechanisms to deal with these kinds of problems.

Wednesday, May 18, 2011

OpenStack Deployment on Fedora using Kickstart

Overview

In this article, we discuss our approach to performing an Openstack installation on Fedora using our RPM repository and Kickstart. When we first started working with OpenStack, we found that the most popular platform for deploying OpenStack was Ubuntu, which seemed like a viable option for us, as there are packages for it available, as well as plenty of documentation. However, because our internal infrastructure is running on Fedora, instead of migrating the full infrastructure to Ubuntu, we decided to make OpenStack Fedora-friendly. The challenge in using Fedora, however, is that there aren't any packages, nor is there much documentation available. Details of how we worked around these limitations are discussed below.

OpenStack RPM Repository

Of course, installing everything from sources and bypassing the system's package manager is always an option, but this approach has some limitations:

OpenStack has a lot of dependencies, so it's hard to track them all
Installations that bypass the system's package manager take quite some time (compared to executing a single Yum installation)
When some packages are installed from repositories, and some are installed from sources, managing upgrades can become quite tricky

Because of these limitations, we decided to create RPMs for Fedora. In order to avoid reinventing the wheel, we've based these RPMs on RHEL6 OpenStack Packages, as RHEL6 and Fedora are fairly similar. There are two sets of packages available for various OpenStack versions:

Cactus - click here for the latest official release
Hourly - click here for hourly builds from trunk

There are two key metapackages:

node-full: installing a complete cloud controller infrastructure, including RabbitMQ, dnsmasq, etc.
node-compute: installing only node-compute services

To use the repository, just install the RPM:

In addition to installing everything with a single "yum install" command, we also need to perform the configuration. For a bare metal installation, we've created a Kickstart script. Kickstart by itself is a set of answers for the automated installation of Fedora distributive. We use it for automated hosts provisioning with PXE. The post-installation part of the Kickstart script was extended to include the OpenStack installation and configuration procedures.

Cloud Controller

To begin with, you can find the post-installation part of the Kickstart file for deploying a cloud controller below.
There are basic settings you will need to change. In our case, we are using a MySQL database.

Your server must be accessible by hostname, because RabbitMQ uses "node@host" identification. Also, because OpenStack uses hostnames to register services, if you want to change the hostname, you must stop all nova services and RabbitMQ, and then start it again after making the change. So make sure you set a resolvable hostname.

Add required repos and install the cloud controller.

qemu 0.14+ is needed to support creating custom images.
(UPD: Fedora 15 release already has qemu 0.14.0 in repository)

If you're running nova under a non-privileged user ("nova" in this case), libvirt configs should be changed to provide access to the libvirtd unix socket for nova services. Access over TCP is required for live migration, so all of our nodes should have read/write access to the TCP socket.

Now we can apply our db credentials to the nova config and generate the root certificate.

And finally, we add services to "autostart", prepare the database, and run the migration. Don't forget the setup root password for the MySQL server.

Compute Node

Compute Node script is much easier:

The config section differs very little; there is a cloud controller IP variable, which points to full nova infrastructure and other support services, such as MySQL and rabbit.

That code is very similar to cloud controller, except that it installs the openstack-nova-node-compute package, instead of node-full.

It is required to change the Cloud Controller IP address (CC_IP variable) for Compute Node installation.

IMPORTANT NOTE: All of your compute nodes should have synchronized time with the cloud controller for heartbeat control.

Monday, May 16, 2011

Make your bet on open source infrastructure computing

Today we are launching our company blog, focused on open source infrastructure computing. We plan to cover various emerging technologies and market paradigms related to this segment of IT. As you might imagine, we did not choose this topic by accident. Aside from being the focus for our blog, it is also the focus of Mirantis as a company. Employing Silicon Valley industry veterans backed by 150 open source hackers and programming champions from Russia we have built this company because we believe in a few basic principles. I felt there is no better way to open our blog, than to share these principles with the world. So here we go:

1. Cloud Drives Adoption of Open Source

Until recently the biggest selling point of commercial enterprise software was its reliability and scalability when it comes to mission-critical tasks. Open source was considered OK by enterprises for tactical purposes, but a no-no for mission-critical, enterprise wide stuff. Now after Amazon, Rackspace, salesforce.com etc. have built out their systems on top of what’s now largely available in open source the argument of OSS being unreliable no longer holds water.

Moreover, today, cloud essentially refers to the new paradigm for delivery of IT services… i.e. it is an economic model that revolves around “pay for what you get, when you get it.” Surprisingly, it took enterprises a very long time to accept this approach, but last year was pivotal in showing that it is tracking and is the way of the future. Open source historically, has been monetized, leveraging a model that is much closer to “cloud” than that of commercial software. I.e. in case of commercial software you buy the license and pay for implementation upfront. If you are lucky to implement, you continue to pay subscription which is sold in various forms – support, service assurance etc. With open source – you always implement first, if it works – you may (or may not) buy commercial support, which is also frequently sold as a subscription service. Therefore, as enterprises wrap their mindset around cloud, they shy further away from the traditional commercial software model and closer to the open source / services focused model.

2. OSS is The Future of Enterprise Infrastructure Computing

I expect that enterprise adoption of open source will be particularly concentrated in the infrastructure computing space. I.e. open source databases (NoSQL, MySQL instead of Oracle, DB2 etc.), application servers (SrpingDM, JBoss vs. WebSphere, WebLogic), messaging engines (RabbitMQ vs. Tibco), infrastructure monitoring and security tools etc. Adoption of OSS initiatives higher up the stack (Alfresco, Compiere ERP, Pentaho etc.) in my opinion will lag behind infrastructure projects. One of the reasons here being greater end user dependence on tools that are higher up the stack. If you have 100 employees that are used to getting their BI reports in Cognos, it is hard to get them to switch to Pentaho and get used to the new user interface and report formats. However, if your Cognos BI runs on Oracle, switching it to MySQL will likely only affect a few IT folks, while 100 users will not notice the difference.

More importantly, however, the lower down the stack you are, the more “techie” the consumer of your product is. The more techie your consumer, the more likely he is to a) prefer customizing the product to the process and not the other way around; b) ultimately contribute to the open source product. Lower level OSS products tend to be more popular and more in demand overall. The extreme example would be to look at operating system vs. end user apps. Linux powers more than half of enterprise servers, but how many people use open source text editing software?

3. Public PaaS is not for Everyone

An alternative to dealing with infrastructure computing is to not deal with it at all and use a platform like Google App Engine or Force.com to build your apps. Why deal with lower end of the stack at all if the guys that know how to do it best already today allow you to use their platform? I believe that PaaS will become the dominant answer in the SMB market, however, organizations that fall in the category of “technology creators” such as cloud service vendors themselves, financial services, large internet portals etc. will always want to keep control over their entire stack to be able to innovate ahead of the curve and remain vendor independent. Therefore, technology driven companies (those that differentiate with technology) will be the primary market for proprietary OSS based infrastructure computing.

4. Infrastructure Computing is Nobody’s Core Competency

Although infrastructure computing is a necessary component in every organization and most technology driven companies want to have full control over their entire stack, there are no technology companies out there that differentiate themselves based on the awesomeness of their infrastructure stack. Yes, everybody knows that Google’s application infrastructure is great and so is that of salesforce.com, but in the end, the customers don’t care if it takes 2K servers to power salesforce.com or 100K servers, as long as the features are there. In that context, it almost always makes sense to outsource infrastructure computing functions to some third party so as to enable the company to focus on those aspects of its technology that differentiate it from the competition.