Cool Solution - Ceph Storage with UCS

aulpts · July 15, 2019, 11:43am

Introduction

Ceph is a massively scalable, open source, distributed storage system. It is composed of an object store, block store, and a distributed file system. Ceph allows the operation of high-available, highly scalable storage-clusters with no single-point of failure. It is designed to store data with a defined replication-level across multiple storage-nodes over a TCP/IP-network. Ceph is able to scale into exabyte-levels and guarantee that data is stored in a redundant, resilient way even if complete storage-nodes become inoperative.

Before you begin installing ceph, it is highly recommended reading the intro (http://docs.ceph.com/docs/master/start/intro/) and dig deeper into the documentation (http://docs.ceph.com/docs/jewel/rados/) to gather a better understanding of ceph. The following wording is used in this article:

node: a single server that runs ceph-services
object: a chunk of data, ceph uses to achieve redundancy over TCP/IP-networks.
mon: a server that works as a monitor for the ceph-cluster
mds: a server that handles the needed metadata for the cephfs-filesystem
osd: a device ceph uses to store objects, typically a single hard-disk-drive
journal: a device for write-caching

This article covers the installation and integration of Ceph Luminous in UCS-4.4 as well as the integration of cephfs - a distributed-filesystem based on ceph - on all 3 nodes.

Note: Ceph is only available for Debain Stretch until the Luminous Release! Other Releases like mimic or nautilus are not official available for Debain Stretch (UCS 4.4 is based on Debian Stretch) (State July 2019)

Requirements

To rollout a ceph cluster we will need at least 3 nodes which have a minimum of 2 harddisk drives, one as a system-drive and another as a ceph-osd. It is highly recommended to use dedicated harddisks for ceph-osds. In this article, we use the nodes ucs-ceph01, ucs-ceph02 and ucs-ceph03 as ceph storage nodes. Since ceph runs over TCP/IP network, the nodes in this example will have the following IPv4-addresses:

ucs-ceph01: 10.0.0.101
ucs-ceph02: 10.0.0.102
ucs-ceph03: 10.0.0.103

Also, we have to declare one of the nodes as admin-node, which runs ceph-deploy, a tool which handles the installation and configuration of ceph on all nodes. In this example, our admin-node will be ucs-ceph01. All nodes are newly installed UCS 4.2 systems and all steps of this guide will be performed on the admin node. Ssh is used to execute commands remotely, so you don’t have to change through the nodes.

Preparing the admin-node

To install ceph we have to activate the online-repositories, add the official ceph-repository and install some required packages. Please perform all the following steps on the admin-node:

ucr set repository/online/unmaintained='yes'
wget -q -O- 'https://download.ceph.com/keys/release.asc' | apt-key add -
echo -e "deb https://download.ceph.com/debian-jewel/ stretch main\ndeb https://download.ceph.com/debian-luminous/ stretch main" | tee /etc/apt/sources.list.d/ceph.list
univention-install -y ceph-deploy patch

Since ceph does not recognize UCS as debian jessie, we need to patch ceph-deploy:
(ATTENTION: if the ceph-deploy-package is updated later, you have to reapply the patch if you want to use ceph-deploy later.)

At first we create a file and insert the following content:
vim /tmp/remotes.py.patch



@@ -34,6 +34,11 @@
     if not codename and 'oracle' in distro.lower(): # this could be an empty string in Oracle linux
         codename = 'oracle'
 
+    if 'univention' in distro.lower() and '4.4' in release:
+        distro = 'Debian'
+        release = '9'
+        codename = 'stretch'
+
     return (
         str(distro).rstrip(),
         str(release).rstrip(),

After that you use patch to insert the content to the original file:

patch /usr/lib/python2.7/dist-packages/ceph_deploy/hosts/remotes.py < /tmp/remotes.py.patch

Configuring SSH, DNS and univention-firewall

Since ceph relies on SSH and DNS, we need to make sure, our hosts can resolve each other and the admin-node is able to login via ssh-pubkey on all systems. Generate SSH-keys and copy them on the remote systems. You have to accept their fingerprint and enter the root password:

ucr set hosts/static/10.0.0.102='ucs-ceph02' && ucr set hosts/static/10.0.0.103='ucs-ceph03'
ssh-keygen && ssh-copy-id root@ucs-ceph02 && ssh-copy-id root@ucs-ceph03

ssh root@ucs-ceph02 "ucr set hosts/static/10.0.0.101='ucs-ceph01' \ 
&& ucr set hosts/static/10.0.0.103='ucs-ceph03' && ucr set repository/online/unmaintained='yes'"

ssh root@ucs-ceph03 "ucr set hosts/static/10.0.0.101='ucs-ceph01' \ 
&& ucr set hosts/static/10.0.0.102='ucs-ceph02' && ucr set repository/online/unmaintained='yes'"

To allow all ceph-nodes to communicate via TCP/IP network, we will set the firewall-rules and open ports 6789 and 6800-7300:

ucr set \
security/packetfilter/package/ceph/tcp/6789/all="ACCEPT" \
security/packetfilter/package/ceph/tcp/6800:7300/all="ACCEPT" && /etc/init.d/univention-firewall restart

ssh root@ucs-ceph02 "ucr set \
security/packetfilter/package/ceph/tcp/6789/all='ACCEPT' \
security/packetfilter/package/ceph/tcp/6800:7300/all='ACCEPT' && /etc/init.d/univention-firewall restart"

ssh root@ucs-ceph03 "ucr set \
security/packetfilter/package/ceph/tcp/6789/all='ACCEPT' \
security/packetfilter/package/ceph/tcp/6800:7300/all='ACCEPT' && /etc/init.d/univention-firewall restart"

Deploying ceph

Now all requirements for the deployment of ceph are fullfilled, we can now start creating the initial config:

mkdir ~/ceph-cluster && cd ~/ceph-cluster
ceph-deploy new ucs-ceph01 ucs-ceph02 ucs-ceph03

We now can install ceph on the nodes via ceph-deploy
Attention! The names of the nodes have to match with the host names!

ceph-deploy install --release luminous ucs-ceph01 ucs-ceph02 ucs-ceph03

After the installation finished, we can create the initial monitors, so our cluster becomes operative:

ceph-deploy mon create-initial

After finishing the last step the auth keys must be copied to all nodes:

scp ceph.client.admin.keyring root@ucs-ceph03:/etc/ceph/
scp ceph.client.admin.keyring root@ucs-ceph02:/etc/ceph/
scp ceph.client.admin.keyring root@ucs-ceph01:/etc/ceph/

That’s it. To check the status of our newly created ceph-cluster run:

ceph -s

If you get a cluster uuid like: “cluster 20aeb40a-43ce-4114-aee2-5af4da04c1c6” the all went correct and you now have a working ceph cluster on UCS.

Adding OSDs

We will create the OSDs on a btrfs backend. Also, ceph needs ntp to work correctly. We will install the required packages:

univention-install -y btrfs-tools ntp
ssh ucs-ceph02 "univention-install -y btrfs-tools ntp"
ssh ucs-ceph03 "univention-install -y btrfs-tools ntp"

At this stage, it is recommended to restart all ceph nodes. We experienced some difficulties with the btrfs formatting when we did not reboot the nodes.

ssh ucs-ceph02 "reboot" \
ssh ucs-ceph03 "reboot"

reboot

After the reboot, reconnect via SSH with the admin node and do the following to create the storage OS’s. Keep attention to change your hard disk devices to meet your system:

cd ~/ceph-cluster

Take a look at the identifier of your disks, before you create the OSDs. Make sure you do not use your system-disk!

ceph-deploy disk list ucs-ceph01 ucs-ceph02 ucs-ceph03

In the next step you must “zap” on all 3 Nodes the drives that you want to add to the OSD’s. Zapping is for cleaning and preparing the disks for ceph. Otherwise, ceph-deploy won’t create the OSD’s.

BE REALLY SURE WHICH DRIVE YOU CHOOSE AND HOW (Journal SSD’s for example)
In our example we have the simplest compilation. We have 2 drives on each node. So we take the second HDD (/dev/sdb) for our OSD’s without drives for Journal. Execute the following commands on each node:

ceph-deploy disk zap ucs-ceph01 /dev/sdb
ceph-deploy disk zap ucs-ceph02 /dev/sdb
ceph-deploy disk zap ucs-ceph03 /dev/sdb

After the zapping we can finally create the OSD’s:

ceph-deploy osd create --fs-type btrfs --data /dev/sdb ucs-ceph01
ceph-deploy osd create --fs-type btrfs --data /dev/sdb ucs-ceph02
ceph-deploy osd create --fs-type btrfs --data /dev/sdb ucs-ceph03

Configuration of MGR’s

Ceph Luminous need’s manager nodes which can be installed on all 3 nodes:

ceph-deploy mgr create ucs-ceph01 ucs-ceph02 ucs-ceph03

If all went correct, you can check the cluster. We now should have 3 OSDs and the cluster starts balancing the first objects:

ceph -s
ceph osd tree

Activate Dashboard

You can also activate a status dashboard for Ceph. For this you must configure a firewall rule because the dashboard will be available on port 7000:

ceph mgr module enable dashboard
ucr set security/packetfilter/package/ceph/tcp/7000/all="ACCEPT"
/etc/init.d/univention-firewall restart

The dashboard should now be reachable in browser on port 7000 on the manager node.

Configuring of mds and CephFS

Installing the mds
To use cephfs, we have to deploy a minimum of one mds. Since we are deploying two, one is used as active mds and the other one is used as standby mds.

ceph-deploy mds create ucs-ceph02 ucs-ceph03

Creating CephFS-pools
After this, we create the cephfs-pools in the cluster:

ceph osd pool create cephfs_data 128
ceph osd pool create cephfs_metadata 128
ceph fs new cephfs cephfs_metadata cephfs_data

(Note: The number “128” is for the size of the Placement-Groups(PG’s). It depends on your cluster compilation how big the number must has to be. It can be calculated at: https://ceph.com/pgcalc/)

If all went well, we now have a working cephfs mds. You can check with following commands:

ceph mds stat

You should now see the fsmap: 1/1/1 up {0=ucs-ceph02=up:active}, 1 up:standby

Generate and distribute the auth-keys

So that the nodes can mount the just created CephFS-Target, an authentication must be created and the keys must be distributed to all nodes:

ceph auth get-or-create client.cephfs mon 'allow r' osd 'allow rwx pool=cephfs_data' -o /etc/ceph/client.cephfs.keyring
ceph auth caps client.cephfs mon 'allow r' mds 'allow rw' osd 'allow rwx pool=cephfs_metadata,allow rwx pool=cephfs_data'
ceph-authtool -p -n client.cephfs /etc/ceph/client.cephfs.keyring > /etc/ceph/client.cephfs
scp /etc/ceph/client.cephfs root@ucs-ceph02:/etc/ceph/
scp /etc/ceph/client.cephfs root@ucs-ceph03:/etc/ceph/

Mount cephfs / fstab

At first, we create a mountpoint on all nodes:

mkdir /mnt/cephfs
ssh ucs-ceph02 "mkdir /mnt/cephfs"
ssh ucs-ceph03 "mkdir /mnt/cephfs"

To mount the CephFS-Target we must create an entry in the /etc/fstab on all 3 nodes:

ucs-ceph01,ucs-ceph02,ucs-ceph03:6789:/     /mnt/cephfs    ceph   name=cephfs,secretfile=/etc/ceph/client.cephfs,noatime,_netdev    0       2

And finally mount the CephFS-Target:

mount /mnt/cephfs
ssh ucs-ceph02 "mount /mnt/cephfs"
ssh ucs-ceph03 "mount /mnt/cephfs"

If everything was successful, we can now test the cluster and check if all nodes share the same cephfs mount. Create a file on the admin node and check if it exists on the other nodes:

dd if=/dev/zero of=/mnt/cephfs/100M_test.bin bs=1024 count=100K