Running Ceph inside Docker

Ceph is a fully open source distributed object store, network block device, and file system designed for reliability, performance, and scalability from terabytes to exabytes. Ceph utilizes a novel placement algorithm (CRUSH), active storage nodes, and peer-to-peer gossip protocols to avoid the scalability and reliability problems associated with centralized controllers and lookup tables. Ceph is part of a tremendous and growing ecosystem where it is integrated in virtualization platforms (Proxmox), Cloud platforms (OpenStack, CloudStack, OpenNebula), containers (Docker), and big data (Hadoop, as a meted server for HDFS).

Almost two years have passed since my first attempt to run Ceph inside Docker. I hadn’t really had the time to resume this work until recently. For the last couple of months, I have been devoting some of my time to contributing on deploying Ceph in Docker.

(Before we start, I would like to highlight that nothing of this work would have been possible without the help of Seán C. McCord. Indeed, the current ceph-docker repository is based on Seán’s initial work.)

Now let’s dive in and see how you can get this running!

Rationale

Running Ceph inside Docker is a bit controversial, as many people might believe that there is no point to doing this. While it’s not really a problem for monitors, the metadata server, and RADOS gateway to be containerized, things get tricky when it comes to the OSDs (object storage daemons). The Ceph OSD is optimized to the machine it runs on, and has a strong relationship with the hardware. The OSD cannot work if the disk that it relies on dies, and this is a bit of an issue in this container world.

To be honest, at one point I found myself thinking:

I don’t know why I’m doing this. I just know that people out there want it (and yes, they probably don’t know why either). I think it’s important to try anyway, so let’s do it.

This does not sound really optimistic, I know, but it’s the truth. My view has changed slightly though, so for what it’s worth, let me explain why. Maybe it will change your mind as well. (And yes, my explanation will be more than: Docker is fancy, so let’s Dockerize everything.)

People have started investing a lot of engineering effort into running containerized softwares on their platforms. Thus, they have been using various tools to build and orchestrate their environment. I wouldn’t be surprised to see Kubernetes as the orchestration tool for this matter. Some people also love to run bleeding-edge technologies on production, as they might find other things boring. So with the containerize everything approach, they will be happy that something is happening on their favorite open source storage solution.

Unlike with yum or apt-get, where it is not always easy to roll back, containers are a little different. Upgrades and rollback are made easier, as you can easily use docker stop and docker run to roll out a new version of your daemons. You can also potentially run different clusters on an isolated fashion on the same machine. This is ideal for development.

The project

As mentioned, everything started from the work of Seán C. McCord, and we iterated around his work together. Currently if you use ceph-docker you will be able to run every single Ceph daemon either on Ubuntu or CentOS. We have a lot of images available on the Docker Hub. We use the Ceph namespace, so our images are prefixed as ceph/<daemon>. We use automated builds; as a result, every time we merge a new patch, a new build gets triggered and produces a new version of the container image. As we are currently in a refactoring process, you will see that a lot of images are available. Historically, we had (and we still do until we merge this patch) one image per daemon. So one container image for monitor, osd, mds, and radosgw. This is not really ideal, and in practice, not needed. This is why we worked on a single container image called daemon. This image contains all the Ceph daemons, and you activate the one you want with a parameter while invoking the docker run command. That being said, if you want to start I encourage you to directly use the ceph/daemon image. I’ll show an example of how to run it in the next section.

Containerize Ceph

Monitors

Given that monitors can not communicate through a NATed network, we need to use the --net=host to expose Docker’s host machine network stack:

$ sudo docker run -d --net=host \
-v /etc/ceph:/etc/ceph \
-v /var/lib/ceph/:/var/lib/ceph \
-e MON_IP=192.168.0.20 \
-e CEPH_PUBLIC_NETWORK=192.168.0.0/24 \
ceph/daemon mon

Here are the options available to you.

  • MON_IP is the IP address of your host running Docker.
  • MON_NAME is the name of your monitor (DEFAULT: $(hostname)).
  • CEPH_PUBLIC_NETWORK is the CIDR of the host running Docker. It should be in the same network as the MON_IP.
  • CEPH_CLUSTER_NETWORK is the CIDR of a secondary interface of the host running Docker. Used for the OSD replication traffic.

Object Storage Daemon

The current implementation allows you to run a single OSD process per container. Following the microservice mindset, we should not run more than one service inside our container. In our case, running multiple OSD processes into a single container breaks this rule and will likely introduce undesirable behaviors. This will also increase the setup and maintenance complexity of the solution.

In this configuration, the usage of --privileged=true is strictly required because we need full access to /dev/ and other kernel functions. However, we support another configuration based on exposing OSD directories where the operators will do the appropriate preparation of the devices. Then he or she will simply expose the OSD directory and populating (ceph-osd mkfs) the OSD will be done by the entry point. The configuration I’m presenting now is easier to start with because you only need to specify a block device and the entry point will do the rest.

Those who do not want to use --privileged=true, can fall back on the second example.

$ sudo docker run -d --net=host \
--privileged=true \
-v /etc/ceph:/etc/ceph \
-v /var/lib/ceph/:/var/lib/ceph \
-v /dev/:/dev/ \
-e OSD_DEVICE=/dev/vdd \
ceph-daemon osd_ceph_disk

If you don’t want to use --privileged=true you can always prepare the OSD by yourself with the help of your configuration management of your choice.

Example without a privileged mode, in this example we assume that you partitioned, put a filesystem and mounted the OSD partition. To create your OSDs simply run the following command:

$ sudo docker exec <mon-container-id> ceph osd create.

Then run your container like so:

docker run -v /osds/1:/var/lib/ceph/osd/ceph-1 -v /osds/2:/var/lib/ceph/osd/ceph-2

$ sudo docker run -d --net=host \
-v /etc/ceph:/etc/ceph \
-v /var/lib/ceph/:/var/lib/ceph \
-v /osds/1:/var/lib/ceph/osd/ceph-1 \
ceph-daemon osd_disk_directory

Here are the options available to you.

  • OSD_DEVICE is the OSD device, ie: /dev/sdb
  • OSD_JOURNAL is the device that will be used to store the OSD’s journal, ie: /dev/sdz
  • HOSTNAME is the hostname of the hostname of the container where the OSD runs (DEFAULT: $(hostname))
  • OSD_FORCE_ZAP will force zapping the content of the given device (DEFAULT: 0 and 1 to force it)
  • OSD_JOURNAL_SIZE is the size of the OSD journal (DEFAULT: 100)

Metadata Server

This one is pretty straightforward and easy to bootstrap. The only caveat at the moment is that we require the Ceph admin key to be available in the Docker. This key will be used to create the CephFS pools and the filesystem.

If you run an old version of Ceph (prior to 0.87) you don’t need this, but you might want to know since it’s always best to run the latest version!

$ sudo docker run -d --net=host \
-v /var/lib/ceph/:/var/lib/ceph \
-v /etc/ceph:/etc/ceph \
-e CEPHFS_CREATE=1 \
ceph-daemon mds

Here are the options available to you.

  • MDS_NAME is the name of the Metadata server (DEFAULT: mds-$(hostname)).
  • CEPHFS_CREATE will create a filesystem for your Metadata server (DEFAULT: 0 and 1 to enable it).
  • CEPHFS_NAME is the name of the Metadata filesystem (DEFAULT: cephfs).
  • CEPHFS_DATA_POOL is the name of the data pool for the Metadata Server (DEFAULT: cephfs_data).
  • CEPHFS_DATA_POOL_PG is the number of placement groups for the data pool (DEFAULT: 8).
  • CEPHFS_DATA_POOL is the name of the metadata pool for the Metadata Server (DEFAULT: cephfs_metadata).
  • CEPHFS_METADATA_POOL_PG is the number of placement groups for the metadata pool (DEFAULT: 8).

RADOS gateway

For the RADOS gateway, we deploy it with civetweb enabled by default. However, it is possible to use different CGI frontends by simply giving remote address and port.

$ sudo docker run -d --net=host \
-v /var/lib/ceph/:/var/lib/ceph \
-v /etc/ceph:/etc/ceph \
ceph-daemon rgw

Here are the options available to you.

  • RGW_REMOTE_CGI defines if you use the embedded web server of RADOS gateway or not (DEFAULT: 0 and 1 to disable it).
  • RGW_REMOTE_CGI_HOST is the remote host running a CGI process.
  • RGW_REMOTE_CGI_PORT is the remote port of the host running a CGI process.
  • RGW_CIVETWEB_PORT is the listening port of civetweb (DEFAULT: 80).
  • RGW_NAME is the name of the RADOS gateway instance (DEFAULT: $(hostname)).

Further work

Configuration store backends

By default, the ceph.conf and all the Ceph keys are generated during the initial monitor bootstrap. This process assumes that to extend your cluster to multiple nodes you have to distribute these configurations across all the nodes. This is not really flexible and we want to improve this. One thing that I will propose soon is to use Ansible to generate the configuration/keys and to distribute them on all the machines.

Alternatively, we want to be able to store various configuration files on different backends like etcd and consul.

Orchestrating the deployment

The very first step is to use ceph-ansible, where the logic is already implemented. I still need to push some changes, but most of the work is already present. For Kubernetes, a preview on how to bootstrap monitors is already available.

Extending to Rocket and beyond

There’s not much to do here, as you can simply port your Docker images into Rocket and launch them (pun intended).

Want to learn more? A video demonstration of the process is available below.

What I Learned About MapR

MapR, based in San Jose, California, provides a commercial version of Hadoop noted for its fast performance.  This week at the Strata Conference, I got a chance to talk to the folks at MapR and found out how MapR differentiates itself from other Hadoop offerings.

MapR Booth at Strata Conference

The fast speed of MapR appears to come from its filesystem design.  It’s fully compatible with standard open source Hadoop including Hadoop 2.x and YARN and HBase, but with a more optimized filesystem structure to provide the additional speed boost.

MapR promotes these benefits below.

  • No single point of failure
    Normally the NameNode is the single point of failure for a Hadoop installation.  MapR’s design avoids this issue.
  • NFS mount data files
    MapR allows you to NFS mount files into an HDFS cluster.  This ability saves you time from copying files into MapR and you might not even need tools like Flume.  The direct write into the files opens up additional options such as querying Hadoop on near-real-time data.
  • Fast access
    MapR has clocked the fastest data processing with sorting 1.5 trillion bytes in one minute using its MapR Hadoop software on Google Compute Engine cloud service.
  • Binary compatible with Hadoop
    MapR is binary compatible with open source Hadoop, which gives more flexibility in adding other third party components or migrating
  • Enterprise support
    Professional services, enterprise support, and training and certifications

MapR has attracted a number of featured customers including the following:

  • Comscore
  • Cision
  • Linksmart
  • HP
  • Return Path
  • Dotomi
  • Solutionary
  • Trueffect
  • Sociocast
  • Zions Bank
  • Live Nation
  • Cisco
  • Rubicon Project

MapR is also partnering with both Google and Amazon Web Services for cloud-based Hadoop systems.

MapR currently comes in 3 editions.

  • M3 Standard Edition
  • M5 Enterprise Edition (with “99.999% high availability and self-healing”)
  • M7 Enterprise Edition for Hadoop (with fast database)

Additionally, in conjunction with the Strata Conference this week, MapR has announced the release of the MapR Sandbox.  Any user can download the MapR Sandbox for free and run a full MapR Hadoop installation within a VMware or Virtualbox virtual machine.  This sandbox provides a suitable learning environment for those who want to experience the use and operation of MapR Hadoop without investing a lot of effort in the installation.  I haven’t downloaded and installed the MapR Sandbox yet.  If you have already done this and tried it out, tell me what you think in the comments below.

MapR website: http://www.mapr.com

Big Data Analytics – What is that ?

In a recent statistics, IBM estimates that every day 2.5 quintillion bytes of data are created – so much that 90% of the data in the world today has been created in the last two years. It is a mind-boggling figure and the irony is that we feel less informed in spite of having more information available today.

The surprising growth in volumes of data has badly affected today’s business. The online users create content like blog posts, tweets, social networking site interactions and photos. And the servers continuously log messages about what online users are doing.

The online data comes from the posts on the social media sites like Facebook and Twitter, YouTube video, cell phone conversation records etc. This data is called Big Data.

WHAT IS BIG DATA ?

Big Data concept means a datasets which continues to grow so much that it becomes difficult to manage it using existing database management concepts & tools. The difficulty can be related to data capture, storage, search, sharing, analytics and visualization etc.

The Big Data spans across three dimensions: Volume, Velocity and Variety.

  • Volume – The size of data is very large and in terabytes and petabytes.
  • Velocity – It should be used when streaming in to the enterprise in order to maximize its value to the business. The role of time is very critical here.
  • Variety – It extends beyond the structured data, including unstructured data of all varieties: text, audio, video, posts, log files etc.

WHY BIG DATA?

When an enterprise can leverage all the information available with large data rather than just a subset of its data then it has a powerful advantage over the market competitors. Big Data can help to gain insights and make better decisions.

Big Data presents an opportunity to create unprecedented business advantage and better service delivery. It also requires new infrastructure and a new way of thinking about the way business and IT industry works. The concept of Big Data is going to change the way we do things today.

The International Data Corporation (IDC) study predicts that overall data will grow by 50 times by 2020, driven in large part by more embedded systems such as sensors in clothing, medical devices and structures like buildings and bridges. The study also determined that unstructured information – such as files, email and video – will account for 90% of all data created over the next decade. But the number of IT professionals available to manage all that data will only grow by 1.5 times today’s levels.

The digital universe is 1.8 trillion gigabytes in size and stored in 500 quadrillion files. And its size gets more than double in every two years time frame. If we compare the digital universe with our physical universe then it’s nearly as many bits of information in the digital universe as stars in our physical universe.

CHARACTERISTICS OF BIG DATA

A Big Data platform should give a solution which is designed specifically with the needs of the enterprise in the mind. The following are the basic features of a Big Data offering-

  • Comprehensive – It should offer a broad platform and address all three dimensions of the Big Data challenge -Volume, Variety and Velocity.
  • Enterprise-ready – It should include the performance, security, usability and reliability features.
  • Integrated – It should simplify and accelerates the introduction of Big Data technology to enterprise. It should enable integration with information supply chain including databases, data warehouses and business intelligence applications.
  • Open source based – It should be open source technology with the enterprise-class functionality and integration.
  • Low latency reads and updates
  • Robust and fault-tolerant
  • Scalability
  • Extensible
  • Allows adhoc queries
  • Minimal maintenance

BIG DATA CHALLENGES

The main challenges of Big Data are data variety, volume, analytical workload complexity and agility. Many organizations are struggling to deal with the increasing volumes of data. In order to solve this problem, the organizations need to reduce the amount of data being stored and exploit new storage techniques which can further improve performance and storage utilization.

SUMMARY AND CONCLUSION

Big Data is a new gold rush & key enabler for the social business. A large or medium sized company can neither make sense of all the user generated content online nor can collaborate with customers, suppliers and partners effectively on social media channels without using Big Data analytics. The collaboration with customers and insights from user generated online contents are critical for the success in the age of social media.

In a study by McKinsey’s Business Technology Office and McKinsey Global Institute (MGI) firm calculated that the U.S. faces a shortage of 140,000 to 190,000 people with analytical expertise and 1.5 million managers and analysts with the skills to understand and make decisions based on the analysis of Big Data.

The biggest gap is the lack of the skilled managers to make decisions based on analysis by a factor of 10x.Growing talent and building teams to make analytic-based decisions is the key to realize the value of Big Data.

Thank you for reading. Happy Learning !!