ceph实践笔记第一篇-ceph集群部署

一、环境准备

host ip role component os resource
ceph-node01 10.0.0.30 master monitor osd ceph-deploy centos7.9 2c4g
ceph-node02 10.0.0.31 worker osd centos7.9 2c4g
ceph-node03 10.0.0.32 worker osd centos7.9 2c4g

三节点进行如下操作
yum -y install ntpdate
ntpdate 10.0.0.19
echo "0 0,6,12,18 * * * /usr/sbin/ntpdate ntp1.aliyun.com;/sbin/hwclock -w" >> /etc/crontab && systemctl restart crond
rpm -Uvh https://download.ceph.com/rpm-jewel/el7/noarch/ceph-release-1-0.el7.noarch.rpm添加ceph源

ceph-node01上操作

cat << EOF | tee /etc/hosts
10.0.0.30 ceph-node01
10.0.0.31 ceph-node02
10.0.0.32 ceph-node03
EOF

二、部署

2.1、集群初始化

ceph-node01上操作
yum -y install ceph-deploy

如果安装报错,提示ceph源不对,可以按照以下步骤操作
1、yum -y remove ceph-deploy
2、rpm -qa|grep ceph-release
rpm -e --nodeps ceph-release-1-1.el7.noarch
rpm -e --nodeps ceph-release-1-1.el7.noarch | xargs rpm -e --nodeps
3、按照错误提示添加新的ceph源

mkdir -p /etc/ceph && cd /etc/ceph && ceph-deploy new ceph-node01创建ceph集群,会在当前目录生成集群配置文件和密钥文件

ceph-deploy install ceph-node01 ceph-node02 ceph-node03安装ceph

ceph -v返回

ceph version 10.2.11 (e4b061b47f07f583c92a050d9e84b1813a35671e)

表示ceph正常,远程到其他节点执行该操作查看

cd /etc/ceph && ceph-deploy mon create-initial创建monitor

ceph -s查看集群

cluster 3aab7f86-f7f7-40f4-afd7-2d1918301266
     health HEALTH_ERR
            no osds
     monmap e1: 1 mons at {ceph-node01=10.0.0.30:6789/0}
            election epoch 3, quorum 0 ceph-node01
     osdmap e1: 0 osds: 0 up, 0 in
            flags sortbitwise,require_jewel_osds
      pgmap v2: 64 pgs, 1 pools, 0 bytes data, 0 objects
            0 kB used, 0 kB / 0 kB avail
                  64 creating

此时集群状态处于不正常状态

2.2、部署osd

ceph-node01上操作
1、列出ceph-node01上所有磁盘
ceph-deploy disk list ceph-node01返回

[ceph-node01][INFO  ] Running command: /usr/sbin/ceph-disk list
[ceph-node01][DEBUG ] /dev/sr0 other, iso9660
[ceph-node01][DEBUG ] /dev/vda :
[ceph-node01][DEBUG ]  /dev/vda1 other, ext4, mounted on /
[ceph-node01][DEBUG ] /dev/vdb other, unknown

2、选择若干磁盘创建ceph osd(不包括系统盘,不分区)
ceph-deploy disk zap ceph-node01:vdb返回

[ceph-node01][INFO  ] Running command: /usr/sbin/ceph-disk zap /dev/vdb
[ceph-node01][DEBUG ] Creating new GPT entries.
[ceph-node01][DEBUG ] GPT data structures destroyed! You may now partition the disk using fdisk or
[ceph-node01][DEBUG ] other utilities.
[ceph-node01][DEBUG ] Creating new GPT entries.
[ceph-node01][DEBUG ] The operation has completed successfully.

3、格式化
ceph-deploy osd create ceph-node01:vdb返回

[ceph-node01][INFO  ] Running command: systemctl enable ceph.target
[ceph-node01][INFO  ] checking OSD status...
[ceph-node01][DEBUG ] find the location of an executable
[ceph-node01][INFO  ] Running command: /bin/ceph --cluster=ceph osd stat --format=json
[ceph_deploy.osd][DEBUG ] Host ceph-node01 is now ready for osd use

4、再次查看
ceph -s返回

  cluster 3aab7f86-f7f7-40f4-afd7-2d1918301266
     health HEALTH_ERR
            64 pgs are stuck inactive for more than 300 seconds
            64 pgs degraded
            64 pgs stuck inactive
            64 pgs stuck unclean
            64 pgs undersized
     monmap e1: 1 mons at {ceph-node01=10.0.0.30:6789/0}
            election epoch 3, quorum 0 ceph-node01
     osdmap e5: 1 osds: 1 up, 1 in
            flags sortbitwise,require_jewel_osds
      pgmap v8: 64 pgs, 1 pools, 0 bytes data, 0 objects
            107 MB used, 97124 MB / 97231 MB avail
                  64 undersized+degraded+peered

仍然处于不健康状态,这是因为需要添加节点到集群,使集群对象复制三次才能变为健康状态

2.3、扩展ceph集群

在上面已经在ceph-node01上运行了一个ceph集群,部署了一个mon、一个osd,ceph集群必须要至少要一个mon才能正常运行,为了实现高可用,这里在其他两个节点也部署mon,注意mon数目必须是奇数个

2.3.1、扩展mon

ceph-node01上操作
echo "public network = 10.0.0.0/24" >> /etc/ceph/ceph.conf添加公共网络到配置文件
ceph-deploy mon create ceph-node02
ceph-deploy mon create ceph-node03
ceph -s查看

 cluster 3aab7f86-f7f7-40f4-afd7-2d1918301266
     health HEALTH_ERR
            64 pgs are stuck inactive for more than 300 seconds
            64 pgs degraded
            64 pgs stuck degraded
            64 pgs stuck inactive
            64 pgs stuck unclean
            64 pgs stuck undersized
            64 pgs undersized
     monmap e3: 3 mons at {ceph-node01=10.0.0.30:6789/0,ceph-node02=10.0.0.31:6789/0,ceph-node03=10.0.0.32:6789/0}
            election epoch 18, quorum 0,1,2 ceph-node01,ceph-node02,ceph-node03
     osdmap e5: 1 osds: 1 up, 1 in
            flags sortbitwise,require_jewel_osds
      pgmap v8: 64 pgs, 1 pools, 0 bytes data, 0 objects
            107 MB used, 97124 MB / 97231 MB avail
                  64 undersized+degraded+peered

ceph mon stat

e3: 3 mons at {ceph-node01=10.0.0.30:6789/0,ceph-node02=10.0.0.31:6789/0,ceph-node03=10.0.0.32:6789/0}, election epoch 18, quorum 0,1,2 ceph-node01,ceph-node02,ceph-node03

目前ceph集群状态仍然为非健康状态,还需要在其他节点配置OSD,默认需求ceph集群必须至少osd要有3个副本,数据会被存放到三个不同节点的osd上

2.3.2、扩展osd

ceph-node01上操作

ceph-deploy disk list ceph-node02 ceph-node03 
ceph-deploy disk zap ceph-node02:vdb
ceph-deploy disk zap ceph-node03:vdb
ceph-deploy osd create ceph-node02:vdb
ceph-deploy osd create ceph-node03:vdb

ceph -s查看

    cluster 3aab7f86-f7f7-40f4-afd7-2d1918301266
     health HEALTH_OK
     monmap e3: 3 mons at {ceph-node01=10.0.0.30:6789/0,ceph-node02=10.0.0.31:6789/0,ceph-node03=10.0.0.32:6789/0}
            election epoch 18, quorum 0,1,2 ceph-node01,ceph-node02,ceph-node03
     osdmap e14: 3 osds: 3 up, 3 in
            flags sortbitwise,require_jewel_osds
      pgmap v121: 64 pgs, 1 pools, 0 bytes data, 0 objects
            322 MB used, 1208 GB / 1208 GB avail
                  64 active+clean

调整存储池
pg值在ceph集群中是非常重要的一个概念,它会影响到数据在集群的存储分布,pg与pool相关,一个pool包含多少个pg这与osd有关,下面是一个规划pg的公式
One pool total PGs = (Total_number_of_OSD * 100) / max_replication_count
比如我们本文的例子pgs=3 * 100 / 3=100,这个值接近于128,故设置pgp_num等于128
ceph osd pool set rbd pg_num 128
ceph osd pool set rbd pgp_num 128

ceph -s

    cluster 3aab7f86-f7f7-40f4-afd7-2d1918301266
     health HEALTH_OK
     monmap e3: 3 mons at {ceph-node01=10.0.0.30:6789/0,ceph-node02=10.0.0.31:6789/0,ceph-node03=10.0.0.32:6789/0}
            election epoch 18, quorum 0,1,2 ceph-node01,ceph-node02,ceph-node03
     osdmap e18: 3 osds: 3 up, 3 in
            flags sortbitwise,require_jewel_osds
      pgmap v146: 128 pgs, 1 pools, 0 bytes data, 0 objects
            324 MB used, 1208 GB / 1208 GB avail
                 128 active+clean

2.3.3、检查集群

ceph -s或者ceph status检查集群安装状态
ceph -w观察集群健康

    cluster 3aab7f86-f7f7-40f4-afd7-2d1918301266
     health HEALTH_OK
     monmap e3: 3 mons at {ceph-node01=10.0.0.30:6789/0,ceph-node02=10.0.0.31:6789/0,ceph-node03=10.0.0.32:6789/0}
            election epoch 18, quorum 0,1,2 ceph-node01,ceph-node02,ceph-node03
     osdmap e18: 3 osds: 3 up, 3 in
            flags sortbitwise,require_jewel_osds
      pgmap v146: 128 pgs, 1 pools, 0 bytes data, 0 objects
            324 MB used, 1208 GB / 1208 GB avail
                 128 active+clean

2021-08-02 10:00:00.000193 mon.0 [INF] HEALTH_OK
...

ceph quorum_status --format json-pretty检查集群仲裁状态

ceph df检查集群使用状态

ceph mon dump导出mon信息

ceph pg dump导出pg信息

ceph osd stat查看osd状态

ceph pg stat查看pg状态

ceph osd lspools查看存储池

ceph osd tree查看osd的CURSH map

ceph auth list查看群集的密钥

installed auth entries:

osd.0
        key: AQDUiwVhm8BhJRAAwvxjJWIXR7c62mqMEFVY5Q==
        caps: [mon] allow profile osd
        caps: [osd] allow *
osd.1
        key: AQDiTQdhi+aeFRAAfvtawdk7bz2NRkCIF5llJg==
        caps: [mon] allow profile osd
        caps: [osd] allow *
osd.2
        key: AQAJTgdhgRaXExAA7YrvuAKd5JZLBDQ4tDszfg==
        caps: [mon] allow profile osd
        caps: [osd] allow *
client.admin
        key: AQBriAVhhlUvNxAAU2KkI63T05sf6Jqwb4zzDQ==
        caps: [mds] allow *
        caps: [mon] allow *
        caps: [osd] allow *
client.bootstrap-mds
        key: AQBsiAVhEivxJxAAJ33N2D1IJCovuNsevxGwDg==
        caps: [mon] allow profile bootstrap-mds
client.bootstrap-mgr
        key: AQBviAVh5/PYABAAFJpioWJe8hh71H9Fkn/ejQ==
        caps: [mon] allow profile bootstrap-mgr
client.bootstrap-osd
        key: AQBsiAVhqIFWChAA6lHtkus6zXX3Y9DJsyVy3g==
        caps: [mon] allow profile bootstrap-osd
client.bootstrap-rgw
        key: AQBsiAVhCzE9GRAAovmo/vj+TYkEoTgFrxojuA==
        caps: [mon] allow profile bootstrap-rgw

三、遇到的问题

1、初始化ceph集群后报错

clock skew detected on mon

解决方式:
ceph-node01上操作
1、时间同步
2、修改ceph配置中的时间偏差阈值
echo "mon clock drift allowed = 2" >/etc/ceph/ceph.conf
echo "mon clock drift warn backoff = 30"
3、ceph-deploy --overwrite-conf config push ceph-node{01..03}
4、systemctl restart mon.target重启三节点的mon
5、systemctl restart osd.target
6、ceph health detail