通过 kubeadm 安装的 etcd 单节点环境,和 kubernetes 二进制文件部署的多 etcd节点环境,分别进行 etcd 的备份与还原操作。快照备份后,恢复到/var/lib/etcd--restore 目录里,不将默认的/var/lib/etcd 的目录进行覆盖。以下展示操作过程:
单节点备份与还原
单节点以 kubeadm 安装的单节点为例:
节点名称 | 节点 IP |
---|---|
k8s-master01 | 192.168.4.101 |
/etc/kubernetes/manifests/etcd.yaml 文件内容如下:
apiVersion: v1
kind: Pod
metadata:
annotations:
kubeadm.kubernetes.io/etcd.advertise-client-urls: https://192.168.4.101:2379
creationTimestamp: null
labels:
component: etcd
tier: control-plane
name: etcd
namespace: kube-system
spec:
containers:
- command:
- etcd
- --advertise-client-urls=https://192.168.4.101:2379
- --cert-file=/etc/kubernetes/pki/etcd/server.crt
- --client-cert-auth=true
- --data-dir=/var/lib/etcd
- --experimental-initial-corrupt-check=true
- --experimental-watch-progress-notify-interval=5s
- --initial-advertise-peer-urls=https://192.168.4.101:2380
- --initial-cluster=k8s-master01=https://192.168.4.101:2380
- --key-file=/etc/kubernetes/pki/etcd/server.key
- --listen-client-urls=https://127.0.0.1:2379,https://192.168.4.101:2379
- --listen-metrics-urls=http://127.0.0.1:2381
- --listen-peer-urls=https://192.168.4.101:2380
- --name=k8s-master01
- --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
- --peer-client-cert-auth=true
- --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
- --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
- --snapshot-count=10000
- --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
image: registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:3.5.9-0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 8
httpGet:
host: 127.0.0.1
path: /health?exclude=NOSPACE&serializable=true
port: 2381
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
name: etcd
resources:
requests:
cpu: 100m
memory: 100Mi
startupProbe:
failureThreshold: 24
httpGet:
host: 127.0.0.1
path: /health?serializable=false
port: 2381
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
volumeMounts:
- mountPath: /var/lib/etcd
name: etcd-data
- mountPath: /etc/kubernetes/pki/etcd
name: etcd-certs
hostNetwork: true
priority: 2000001000
priorityClassName: system-node-critical
securityContext:
seccompProfile:
type: RuntimeDefault
volumes:
- hostPath:
path: /etc/kubernetes/pki/etcd
type: DirectoryOrCreate
name: etcd-certs
- hostPath:
path: /var/lib/etcd
type: DirectoryOrCreate
name: etcd-data
status: {}
单节点备份
etcd 的备份,需要用到 3 个证书文件,分别是连接 etcd 的 cert 证书文件、key 密钥文件、受信任的 ca 文件。
这几个文件具体位置可以在 etcd 的/etc/kubernetes/manifests/etcd.yaml 启动文件中找到,在command
中的--listen-client-urls
获取监听地址,以及--cert-file
、--key-file
、--trusted-ca-file
获取到证书信息:
--listen-client-urls=https://127.0.0.1:2379,https://192.168.4.101:2379
--cert-file=/etc/kubernetes/pki/etcd/server.crt
--key-file=/etc/kubernetes/pki/etcd/server.key
--trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
通过以上获取的信息,使用 etcdctl 备份到/opt/etcd-snapshot.db:
ETCDCTL_API=3 etcdctl snapshot save /opt/etcd-snapshot.db \
--endpoints="https://127.0.0.1:2379" \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
最后提示Snapshot saved at /opt/etcd-snapshot.db
表示备份成功。
使用以下命令,可以查看备份的状态:
ETCDCTL_API=3 etcdctl snapshot status /opt/etcd-snapshot.db --write-out=table
单节点还原
停止 kube-apiserver 服务,由于是 kubeadm 启动的 k8s,只需要把 /etc/kubernetes/manifests/kube-apiserver.yaml 文件移动到别的地方即可停止:
sudo mv /etc/kubernetes/manifests/kube-apiserver.yaml /opt/
停止后,进行恢复 etcd。以恢复到/var/lib/etcd--restore 目录为例。原来的/var/lib/etcd 目录将不动,启动的时候把 etcd 的数据目录指向/var/lib/etcd--restore 目录即可:
ETCDCTL_API=3 etcdctl snapshot restore /opt/etcd-snapshot.db \
--initial-cluster="k8s-master01=https://192.168.4.101:2380" \
--initial-cluster-token=etcd-k8s-cluster \
--initial-advertise-peer-urls=https://192.168.4.101:2380 \
--name=k8s-master01 \
--data-dir=/var/lib/etcd--restore \
--wal-dir=/var/lib/etcd--restore/wal
这一步在以前的版本中或许需要--skip-hash-check
来忽略 hash 的检验。
这里需要注意恢复/var/lib/etcd--restore 目录里的 wal 目录位置。如果不指定--wal-dir=
,恢复的 wal 是否和原来的/var/lib/etcd 目录中的 wal 目录位置一样。如果不一样就需要再在恢复的时候,增加--wal-dir=
参数指定 wal 目录的恢复位置。
如果通过 kubeadm 安装的,则修改/etc/kubernetes/manifests/etcd.yaml 文件中 hostPath 的 etcd-data 的 path 目录位置:
volumes:
- hostPath:
path: /var/lib/etcd-restore
type: DirectoryOrCreate
name: etcd-data
把原来的path: /var/lib/etcd
修改为path: /var/lib/etcd-restore
。更改后,kubeadm 会把 etcd 会自动重新启动。
最后,重启启动 kube-apiserver:
mv /opt/kube-apiserver.yaml /etc/kubernetes/manifests/
systemctl restart kubelet
稍等片刻后,即可通过kubectl
检查环境。
多节点备份与还原
多节点以二进制文件部署的集群为例,以 3 节点的 etcd,分别是:
节点名称 | 节点 IP |
---|---|
k8s-master01 | 192.168.4.11 |
k8s-master02 | 192.168.4.12 |
k8s-master03 | 192.168.4.13 |
其中 k8s-master01 的/etc/etcd/etcd.config.yml 示例配置文件如下:
name: 'k8s-master01'
data-dir: /var/lib/etcd
wal-dir: /var/lib/etcd/wal
snapshot-count: 5000
heartbeat-interval: 100
election-timeout: 1000
quota-backend-bytes: 0
listen-peer-urls: 'https://192.168.4.11:2380'
listen-client-urls: 'https://192.168.4.11:2379,http://127.0.0.1:2379'
max-snapshots: 3
max-wals: 5
cors:
initial-advertise-peer-urls: 'https://192.168.4.11:2380'
advertise-client-urls: 'https://192.168.4.11:2379'
discovery:
discovery-fallback: 'proxy'
discovery-proxy:
discovery-srv:
initial-cluster: 'k8s-master01=https://192.168.4.11:2380,k8s-master02=https://192.168.4.12:2380,k8s-master03=https://192.168.4.13:2380'
initial-cluster-token: 'etcd-k8s-cluster'
initial-cluster-state: 'new'
strict-reconfig-check: false
enable-v2: true
enable-pprof: true
proxy: 'off'
proxy-failure-wait: 5000
proxy-refresh-interval: 30000
proxy-dial-timeout: 1000
proxy-write-timeout: 5000
proxy-read-timeout: 0
client-transport-security:
cert-file: '/etc/kubernetes/pki/etcd/etcd.pem'
key-file: '/etc/kubernetes/pki/etcd/etcd-key.pem'
client-cert-auth: true
trusted-ca-file: '/etc/kubernetes/pki/etcd/etcd-ca.pem'
auto-tls: true
peer-transport-security:
cert-file: '/etc/kubernetes/pki/etcd/etcd.pem'
key-file: '/etc/kubernetes/pki/etcd/etcd-key.pem'
peer-client-cert-auth: true
trusted-ca-file: '/etc/kubernetes/pki/etcd/etcd-ca.pem'
auto-tls: true
debug: false
log-package-levels:
log-outputs: [default]
force-new-cluster: false
多节点备份
多节点备份与单节点一样,只需要备份一个 etcd 节点即可。此示例在 master01 上面操作:
ETCDCTL_API=3 etcdctl snapshot save /opt/etcd-snapshot.db \
--endpoints="https://192.168.4.11:2379" \
--cacert=/etc/kubernetes/pki/etcd/etcd-ca.pem \
--cert=/etc/kubernetes/pki/etcd/etcd.pem \
--key=/etc/kubernetes/pki/etcd/etcd-key.pem
查看备份信息:
ETCDCTL_API=3 etcdctl snapshot status /opt/etcd-snapshot.db --write-out=table
多节点还原
停止所有 Master 上 kube-apiserver 服务:
systemctl stop kube-apiserver
停止集群中的所有 etcd:
systemctl stop etcd
把刚刚备份的快照文件发送给 master02 和 master03 两个节点:
scp /opt/etcd-snapshot.db k8s-master02:/opt
scp /opt/etcd-snapshot.db k8s-master03:/opt
在 master01 节点执行:
ETCDCTL_API=3 etcdctl snapshot restore /opt/etcd-snapshot.db \
--initial-cluster="k8s-master01=https://192.168.4.11:2380,k8s-master02=https://192.168.4.12:2380,k8s-master03=https://192.168.4.13:2380" \
--initial-cluster-token=etcd-k8s-cluster \
--initial-advertise-peer-urls=https://192.168.4.11:2380 \
--name=k8s-master01 \
--data-dir=/var/lib/etcd--restore \
--wal-dir=/var/lib/etcd--restore/wal
在 master02 节点执行:
ETCDCTL_API=3 etcdctl snapshot restore /opt/etcd-snapshot.db \
--initial-cluster="k8s-master01=https://192.168.4.11:2380,k8s-master02=https://192.168.4.12:2380,k8s-master03=https://192.168.4.13:2380" \
--initial-cluster-token=etcd-k8s-cluster \
--initial-advertise-peer-urls=https://192.168.4.12:2380 \
--name=k8s-master02 \
--data-dir=/var/lib/etcd--restore \
--wal-dir=/var/lib/etcd--restore/wal
在 master03 节点执行:
ETCDCTL_API=3 etcdctl snapshot restore /opt/etcd-snapshot.db \
--initial-cluster="k8s-master01=https://192.168.4.11:2380,k8s-master02=https://192.168.4.12:2380,k8s-master03=https://192.168.4.13:2380" \
--initial-cluster-token=etcd-k8s-cluster \
--initial-advertise-peer-urls=https://192.168.4.13:2380 \
--name=k8s-master03 \
--data-dir=/var/lib/etcd--restore \
--wal-dir=/var/lib/etcd--restore/wal
将每个节点的/etc/etcd/etcd.config.yml 文件的data-dir
和wal-dir
位置修改成以下新位置:
data-dir: /var/lib/etcd--restore
wal-dir: /var/lib/etcd--restore/wal
启动集群中的所有 etcd:
systemctl start etcd
启动所有 Master 上 kube-apiserver 服务:
systemctl start kube-apiserver
所有 Master 上 kube-apiserver 服务启动后,即可通过kubectl
检查环境。