0%

K8s环境部署及安装(kubeadm方式)

K8s环境部署及安装(kubeadm方式)

1 CentOS 安装及初始化

1.1 切换阿里源

以CentOS-7.9-minimal为例:

  1. 备份
mv /etc/yum.repos.d/CentOS-Base.repo /etc/yum.repos.d/CentOS-Base.repo.backup
  1. 安装wget
yum install -y wget
  1. 下载新的 CentOS-Base.repo 到 /etc/yum.repos.d/
wget -O /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-7.repo
  1. 运行 yum makecache 生成缓存
  2. 其他

​ 非阿里云ECS用户会出现 Couldn’t resolve host ‘mirrors.cloud.aliyuncs.com’ 信息,不影响使用。用户也可自行修改相 关配置: eg:

sed -i -e '/mirrors.cloud.aliyuncs.com/d' -e '/mirrors.aliyuncs.com/d' /etc/yum.repos.d/CentOS-Base.repo

1.2 切换epel源

  1. 备份(如有配置其他epel源)
mv /etc/yum.repos.d/epel.repo /etc/yum.repos.d/epel.repo.backup
mv /etc/yum.repos.d/epel-testing.repo /etc/yum.repos.d/epel-testing.repo.backup
  1. 下载新repo 到/etc/yum.repos.d/
wget -O /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo

1.3 安装常用命令

yum install -y vim wget pcre pcre-devel zlib zlib-devel openssl openssl-devel iproute net-tools iotop gcc

1.4 安装docker-ce

wget -O /etc/yum.repos.d/docker-ce.repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo # docker阿里源
yum install -y docker-ce docker-ce-cli containerd.io
# 启动并设为开机自启
systemctl start docker
systemctl enble docker

1.5 k8s高可用集群环境规划信息:

按照实际环境需求,进行规划与部署响应单master或者多master高可用k8s运行环境

1.5.1 单master环境

image-20220412142415985

个人电脑配置不高的,可以部署一个最小化的环境

  • master节点一台:1CPU,2G内存,50G硬盘
  • etcd1台,1CPU,1G内存,50G硬盘
  • node节点两台:2CPU,2G内存,50G硬盘

1.5.2 多master环境

image-20220412104337524

类型 备注
ansible * 2 k8s集群部署服务器,可以和其他服务器混用
k8s master * 3 k8s控制端,通过一个vip做主备做高可用
harbor * 2 高可用镜像服务器
etcd * 3 保存k8s集群数据的服务器
haproxy * 2 高可用etcd代理服务器
node节点 * (2 - N) 真正运行容器的服务器端,高可用环境至少两台

2 haproxy及 keepalived 高可用集群安装

在生产环境中 haproxy 广泛用于四层和七层的反向负载,haproxy 则通过 VRRP 技术实现虚拟 IP 高可用从而实现 haproxy 的高可用,本文将侧重于介绍 keepalived 方面的知识及相关配置介绍,haproxy 只用于测试 web 代理,具体如下:

2.1 安装haproxy

[root@ha1 soft]# wget https://www.haproxy.org/download/2.5/src/haproxy-2.5.5.tar.gz
[root@ha1 soft]# less INSTALL # 安装相关说明信息
[root@ha1 soft]# make TARGET=linux-glibc USE_PCRE=1 USE_OPENSSL=1 USE_ZLIB=1 PREFIX=/usr/local/haproxy
[root@ha1 soft]# make install PREFIX=/usr/local/haproxy

创建配置文件目录

[root@ha1 haproxy-2.5.5]# mkdir -p /usr/local/haproxy/conf
[root@ha1 haproxy-2.5.5]# mkdir -p /etc/haproxy/

从配置文件模版复制配置文件并添加配置文件软连接

[root@ha1 haproxy-2.5.5]# cp examples/option-http_proxy.cfg /usr/local/haproxy/conf/haproxy.cfg
[root@ha1 haproxy-2.5.5]# ln -s /usr/local/haproxy/conf/haproxy.cfg /etc/haproxy/haproxy.cfg
[root@ha1 haproxy-2.5.5]# ln -s /usr/local/haproxy/sbin/haproxy /usr/sbin/haproxy

拷贝错误页面,并添加目录软连接(HTTP 模式选配)

[root@ha1 haproxy-2.5.5]# cp -r examples/errorfiles /usr/local/haproxy/
[root@ha1 haproxy-2.5.5]# ln -s /usr/local/haproxy/errorfiles /etc/haproxy/errorfiles

拷贝开机启动文件,并赋予可执行权限

[root@ha1 haproxy-2.5.5]# cp examples/haproxy.init /etc/init.d/haproxy
[root@ha1 haproxy-2.5.5]# chmod +x /etc/init.d/haproxy

添加 haproxy 用户组和用户, 创建 chroot 运行的路径

[root@localhost haproxy]# groupadd haproxy
[root@localhost haproxy]# useradd -g haproxy haproxy
[root@localhost haproxy]# mkdir /usr/share/haproxy

设置 HAProxy 开机启动 (可选)

[root@ha1 haproxy-2.5.5]# chkconfig --add haproxy
[root@ha1 haproxy-2.5.5]# chkconfig haproxy on

编辑haproxy.cfg内容:

[root@ha1 init.d]# cat /etc/haproxy/haproxy.cfg
global
chroot /usr/local/haproxy
#stats socket /var/lib/haproxy/haproxy.sock mode 600 level admin
user haproxy
group haproxy
daemon
#nbproc 1
pidfile /var/run/haproxy.pid
log 127.0.0.1 local3 info

defaults
option http-keep-alive
option forwardfor
mode http
timeout connect 300000ms
timeout client 300000ms
timeout server 300000ms

listen stats
mode http
bind 192.168.200.16:9999 # 与keepalive的vip地址一致
stats enable
log global
stats uri /haproxy-status
stats auth haadmin:123456

listen web_port
bind 0.0.0.0:80
mode http
log global
balance roundrobin
server web1 192.168.68.152:8080 check inter 3000 fall 2 rise 5
server web2 192.168.68.153:8080 check inter 3000 fall 2 rise 5

此时curl对应vip地址有返回(后端服务还没起,503正常)

image-20220412120532559

补充haproxy配置说明:

##
## Haproxy 配置文件
##
###################################################################################################

## global配置中的参数为进程级别的参数,通常与其运行的操作系统有关
global
log 127.0.0.1 local0 info ## 定义全局的syslog服务器,最多可以定义2个
### local0是日志设备,对应于/etc/rsyslog.conf中的配置,默认回收info的日志级别
#log 127.0.0.1 local1 info
chroot /usr/share/haproxy ## 修改HAProxy的工作目录至指定的目录并在放弃权限之前执行
### chroot() 操作,可以提升 haproxy 的安全级别
group haproxy ## 同gid,不过这里为指定的用户组名
user haproxy ## 同uid,但这里使用的为用户名
daemon ## 设置haproxy后台守护进程形式运行
nbproc 1 ## 指定启动的haproxy进程个数,
### 只能用于守护进程模式的haproxy;默认为止启动1个进程,
### 一般只在单进程仅能打开少数文件描述符的场中中才使用多进程模式
maxconn 4096 ## 设定每个haproxy进程所接受的最大并发连接数,
### 其等同于命令行选项"-n","ulimit-n"自动计算的结果正式参照从参数设定的
# pidfile /var/run/haproxy.pid ## 进程文件(默认路径 /var/run/haproxy.pid)
node edu-haproxy-01 ## 定义当前节点的名称,用于HA场景中多haproxy进程共享同一个IP地址时
description edu-haproxy-01 ## 当前实例的描述信息

## defaults:用于为所有其他配置段提供默认参数,这默认配置参数可由下一个"defaults"所重新设定
defaults
log global ## 继承global中log的定义
mode http ## mode:所处理的模式 (tcp:四层 , http:七层 , health:状态检查,只会返回OK)
### tcp: 实例运行于纯tcp模式,在客户端和服务器端之间将建立一个全双工的连接,
#### 且不会对7层报文做任何类型的检查,此为默认模式
### http:实例运行于http模式,客户端请求在转发至后端服务器之前将被深度分析,
#### 所有不与RFC模式兼容的请求都会被拒绝
### health:实例运行于health模式,其对入站请求仅响应“OK”信息并关闭连接,
#### 且不会记录任何日志信息 ,此模式将用于相应外部组件的监控状态检测请求
option httplog
retries 3
option redispatch ## serverId对应的服务器挂掉后,强制定向到其他健康的服务器
maxconn 2000 ## 前端的最大并发连接数(默认为2000)
### 其不能用于backend区段,对于大型站点来说,可以尽可能提高此值以便让haproxy管理连接队列,
### 从而避免无法应答用户请求。当然,此最大值不能超过“global”段中的定义。
### 此外,需要留心的是,haproxy会为每个连接维持两个缓冲,每个缓存的大小为8KB,
### 再加上其他的数据,每个连接将大约占用17KB的RAM空间,这意味着经过适当优化后 ,
### 有着1GB的可用RAM空间时将维护40000-50000并发连接。
### 如果指定了一个过大值,极端场景中,其最终所占据的空间可能会超过当前主机的可用内存,
### 这可能会带来意想不到的结果,因此,将其设定一个可接受值放为明智绝对,其默认为2000
timeout connect 5000ms ## 连接超时(默认是毫秒,单位可以设置us,ms,s,m,h,d)
timeout client 50000ms ## 客户端超时
timeout server 50000ms ## 服务器超时

## HAProxy的状态信息统计页面
listen admin_stats
bind :48800 ## 绑定端口
stats uri /admin-status ## 统计页面URI
stats auth admin:admin ## 设置统计页面认证的用户和密码,如果要设置多个,另起一行写入即可
mode http
option httplog ## 启用日志记录HTTP请求

## listen: 用于定义通过关联“前端”和“后端”一个完整的代理,通常只对TCP流量有用
listen mycat_servers
bind :3307 ## 绑定端口 ---------------------------------------------------------------------这里一定要注意,在测试连接的时候,端口指定3307
mode tcp
option tcplog ## 记录TCP请求日志
option tcpka ## 是否允许向server和client发送keepalive
option httpchk OPTIONS * HTTP/1.1\r\nHost:\ www ## 后端服务状态检测
### 向后端服务器的48700端口(端口值在后端服务器上通过xinetd配置)发送 OPTIONS 请求
### (原理请参考HTTP协议) ,HAProxy会根据返回内容来判断后端服务是否可用.
### 2xx 和 3xx 的响应码表示健康状态,其他响应码或无响应表示服务器故障。
balance roundrobin ## 定义负载均衡算法,可用于"defaults"、"listen"和"backend"中,默认为轮询方式
server mycat_01 192.168.9.169:8066 check port 48700 inter 2000ms rise 2 fall 3 weight 10 ------------------------------------------------第一台mycat
server mycat_02 192.168.9.170:8066 check port 48700 inter 2000ms rise 2 fall 3 weight 10 ------------------------------------------------第二台mycat
## 格式:server <name> <address>[:[port]] [param*]
### serser 在后端声明一个server,只能用于listen和backend区段。
### <name>为此服务器指定的内部名称,其将会出现在日志及警告信息中
### <address>此服务器的IPv4地址,也支持使用可解析的主机名,但要在启动时需要解析主机名至响应的IPV4地址
### [:[port]]指定将客户端连接请求发往此服务器时的目标端口,此为可选项
### [param*]为此server设定的一系列参数,均为可选项,参数比较多,下面仅说明几个常用的参数:
#### weight:权重,默认为1,最大值为256,0表示不参与负载均衡
#### backup:设定为备用服务器,仅在负载均衡场景中的其他server均不可以启用此server
#### check:启动对此server执行监控状态检查,其可以借助于额外的其他参数完成更精细的设定
#### inter:设定监控状态检查的时间间隔,单位为毫秒,默认为2000,
##### 也可以使用fastinter和downinter来根据服务器端专题优化此事件延迟
#### rise:设置server从离线状态转换至正常状态需要检查的次数(不设置的情况下,默认值为2)
#### fall:设置server从正常状态转换至离线状态需要检查的次数(不设置的情况下,默认值为3)
#### cookie:为指定server设定cookie值,此处指定的值将会在请求入站时被检查,
##### 第一次为此值挑选的server将会被后续的请求所选中,其目的在于实现持久连接的功能
#### maxconn:指定此服务器接受的最大并发连接数,如果发往此服务器的连接数目高于此处指定的值,
#####其将被放置于请求队列,以等待其他连接被释放

2.2 安装keepalived

[root@ha1 soft]# wget https://www.keepalived.org/software/keepalived-2.2.7.tar.gz
[root@ha1 soft]# tar -xf keepalived-2.2.7.tar.gz && cd keepalived-2.2.7
[root@ha1 soft]# /configure --prefix=/usr/local/keepalived --sysconf=/etc
[root@ha1 soft]# make && make install

从源码包拷贝服务配置文件

[root@ha1 soft]# cp ./keepalived/etc/init.d/keepalived /etc/init.d/
[root@ha1 soft]# systemctl daemon-reload
[root@ha1 soft]# systemctl start keepalived.service # 起进程前先修 改/etc/keepalive/keepalive.config

修改ha1,ha2对应keepalived配置文件,实现keepalived双机主备高可用:

keepalived双机主备

此处将ha2设为master,ha1设为backup:

[root@ha1 keepalived]# cat keepalived.conf
global_defs {
router_id keepalive_192.168.1.16
}
vrrp_script check_nginx_alive{
script "/etc/keepalived/check_nginx_alive_or_not.sh"
interval 2
weight 10
}
vrrp_instance VI_1 {
state BACKUP # 注意!此处state的名称并不能决定对应主从身份,主从身份是由priority决定的!
interface ens33
virtual_router_id 51
priority 50
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.200.16
}
}
[root@ha2 keepalived]# cat keepalived.conf
global_defs {
router_id keepalive_192.168.68.149
}
vrrp_script check_nginx_alive{
script "/etc/keepalived/check_nginx_alive_or_not.sh"
interval 2
weight 10
}
vrrp_instance VI_1 {
state MASTER
interface ens33
virtual_router_id 51
priority 100 # ha2的priority值比ha1高,决定了它会被选为master
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.200.16
}
}

默认情况下,如果ha1/2中启动一个以后,另外一个服务无法起动,可能是因为linux默认不允许IP地址绑定非本机的端口,需要修改对应内核参数, 修改/etc/sysctl.conf

net.ipv4.ip_forward = 1
net.ipv4.ip_nonlocal_bind = 1

输入sysctl -p 使新内核参数生效

此时起进程查看对应网卡状态

image-20220411214148201

image-20220411214225989

此时,在不间断ping该地址时,将ha2主机挂起,keepalived会将vip飘到ha1上,ping不会中断:

image-20220411214441361

说明keepalive服务生效

3 部署harbor仓库

3.1 安装docker-ce

wget -O /etc/yum.repos.d/docker-ce.repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
yum install -y docker-ce

harbor软硬件要求:

Hardware

The following table lists the minimum and recommended hardware configurations for deploying Harbor.

Resource Minimum Recommended
CPU 2 CPU 4 CPU
Mem 4 GB 8 GB
Disk 40 GB 160 GB

Software

The following table lists the software versions that must be installed on the target host.

Software Version Description
Docker engine Version 17.06.0-ce+ or higher For installation instructions, see Docker Engine documentation
Docker Compose Version 1.18.0 or higher For installation instructions, see Docker Compose documentation
Openssl Latest is preferred Used to generate certificate and keys for Harbor

3.2 下载安装harbor

# 推荐使用离线包方式安装
wget https://github.com/goharbor/harbor/releases/download/v2.5.0/harbor-offline-installer-v2.5.0.tgz
tar -xf harbor-offline-installer-v2.5.0.tgz
cd harbor && mkdir certs && cd certs
# 创建自签名证书
openssl genrsa -out certs/harbor-ca.key # 生成私钥
touch /root/.rnd
openssl req -x509 -new -nodes -key certs/harbor-ca.key -subj "/CN=harbor.magedu.net" -days 7120 -out certs/harbor-ca.crt # 生成自签证书
cp harbor.yml.tmpl harbor.yml

编辑harbor.yml配置文件:

[root@ha1 harbor]# grep -v '#' harbor.yml | tr -s '\n'
hostname: harbor.magedu.net
http:
port: 80
https:
port: 443
certificate: /data/soft/harbor/certs/harbor-ca.crt # 私钥路径
private_key: /data/soft/harbor/certs/harbor-ca.key # 证书路径
harbor_admin_password: 123456 # 默认admin密码
database:
password: root123
max_idle_conns: 100
max_open_conns: 900
data_volume: /data/harbor
trivy:
ignore_unfixed: false
skip_update: false
insecure: false
jobservice:
max_job_workers: 10
notification:
webhook_job_max_retry: 10
chart:
absolute_url: disabled
log:
level: info
local:
rotate_count: 50
rotate_size: 200M
location: /var/log/harbor
_version: 2.3.0
proxy:
http_proxy:
https_proxy:
no_proxy:
components:
- core
- jobservice
- trivy

./prepare,准备配置文件(更新配置)

image-20220412182211896

./install.sh --help查看帮助,./install.sh --with-trivy执行安装:

image-20220412182440001

出现如下提示安装成功,查看镜像

image-20220412182515745

image-20220412182537130

打开浏览器,输入HARBORIP, 用户名admin, 密码填写harbor.xml中设置密码进入管理页面:

image-20220412182653326

3.3 harbor安装使用踩的坑

  1. 在配置完Harbor 后发现push 功能不可用出现以下问题
[root@ha1 harbor]# docker push harbor.magedu.net/test/centos-base:7
The push refers to repository [harbor.magedu.net/test/centos-base]
076db42d2a09: Preparing
174f56854903: Preparing
unauthorized: unauthorized to access repository: test/centos-base, action: push: unauthorized to access repository: test/centos-base, action: push

解决方式:

查看 ~/.docker/config.json:

[root@ha1 harbor]# cat ~/.docker/config.json
{
"harbor.magedu.net:443": {
"auth": "YWRtaW46MTIzNDU2"
}
}
}

猜测登陆信息获取不对,修改登录地址再次登陆:

[root@ha1 harbor]# docker login harbor.magedu.net/test/centos-base
Username: admin
Password:
WARNING! Your password will be stored unencrypted in /root/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded

此时查看 ~/.docker/config.json:

[root@ha1 harbor]# cat ~/.docker/config.json
{
"auths": {
"harbor.magedu.net": {
"auth": "dGVzdDpEZHJiZGd6eUAwMDE="
},
"harbor.magedu.net:443": {
"auth": "YWRtaW46MTIzNDU2"
}
}
}

再次push到harbor成功:

[root@ha1 harbor]# docker push harbor.magedu.net/test/centos-base:7
The push refers to repository [harbor.magedu.net/test/centos-base]
076db42d2a09: Pushed
174f56854903: Pushed
7: digest: sha256:f24bb6fa09e33d9737a7e9385106f1e40624de4be2a111c027b579d9a9e8054a size: 742

image-20220413174334373

  1. push时提示证书错误
[root@ha1 harbor]# docker push harbor.magedu.net/test/centos-base:7
The push refers to repository [harbor.magedu.net/test/centos-base]
Get "https://harbor.magedu.net/v2/": x509: certificate signed by unknown authority

解决方式:

修改/usr/lib/systemd/system/docker.service,增加 –insecure-registry 参数

ExecStart=/usr/bin/dockerd --insecure-registry harbor.magedu.net -H fd:// --containerd=/run/containerd/containerd.sock

重新加载配置、重启docker:

systemctl daemon-reload
systemctl restart docker

3.push时报legacy Common Name field, use SANs or temporarily enable Common Name matching with GODEBUG=x509ignoreCN=0

解决方法:

因为 go 1.15 版本开始废弃 CommonName,因此推荐使用 SAN 证书。 如果想兼容之前的方式,需要设置环境变量 GODEBUG 为 x509ignoreCN=0,参考:

openssl genrsa -out ca.key 2048
openssl req -new -x509 -days 365 -key ca.key -subj "/C=CN/ST=GD/L=SZ/O=Acme, Inc./CN=Acme Root CA" -out ca.crt

openssl req -newkey rsa:2048 -nodes -keyout server.key -subj "/C=CN/ST=GD/L=SZ/O=Acme, Inc./CN=*.example.com" -out server.csr
openssl x509 -req -extfile <(printf "subjectAltName=DNS:example.com,DNS:www.example.com") -days 365 -in server.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out server.crt

接下来在/etc/docker/certs.d/创建对应目录并拷贝生成的证书:

[root@ha1 harbor]# mkdir -pv /etc/docker/certs.d/harbor.magedu.net:443
mkdir: 已创建目录 "/etc/docker/certs.d/harbor.magedu.net:443"
[root@ha1 harbor]# cp certs/harbor-ca.crt /etc/docker/certs.d/harbor.magedu.net:443/

4 etcd 安装

image-20210421170705957

etcd是Kubernetes集群的大脑。使用etcd的“Watch”功能监控更改序列。通过这个功能,Kubernetes可以订阅集群内的更改并执行来自API服务器的任何状态请求。etcd与分布式集群中的不同组件协作。etcd对组件状态的更改作出反应,其他组件可能会对更改作出反应。

可能存在这样一种情况:在维护集群中一组etcd组件的所有状态的相同副本时,需要将相同的数据存储在两个etcd实例中。但是,etcd不应该在不同的实例中更新相同的记录。

在这种情况下,etcd不会处理每个集群节点上的写操作。相反,只有一个实例负责在内部处理写操作。那个节点叫做leader。集群内其他节点采用RAFT算法选择一个leader。一旦leader选定,其他节点就成为follower节点。

现在,当写请求到达leader节点时,leader处理写入。leader etcd节点向其他节点广播数据的副本。如果一个follower节点在那一刻处于不活动或脱机状态,那么基于大多数可用节点,写请求将得到一个完整的标志。通常,如果leader得到集群中其他成员的同意,写操作将获得完整标志。

这是它们选择leader,以及如何确保在所有实例中传播写操作的方式。利用raft协议在etcd中实现了这种分布式共识。

最好采用奇数搭建etcd集群, 本次采用三节点搭建高可用集群

4.1 创建TLS证书和密钥

前往下载cloudfare的 PKI 工具集 cfssl 来生成 Certificate Authority (CA) 和其它证书

生成的 CA 证书和秘钥文件如下:

  • ca-key.pem
  • ca.pem
  • kubernetes-key.pem
  • kubernetes.pem
  • kube-proxy.pem
  • kube-proxy-key.pem
  • admin.pem
  • admin-key.pem

使用证书的组件如下:

  • etcd:使用 ca.pem、kubernetes-key.pem、kubernetes.pem;
  • kube-apiserver:使用 ca.pem、kubernetes-key.pem、kubernetes.pem;
  • kubelet:使用 ca.pem;
  • kube-proxy:使用 ca.pem、kube-proxy-key.pem、kube-proxy.pem;
  • kubectl:使用 ca.pem、admin-key.pem、admin.pem;
  • kube-controller-manager:使用 ca-key.pem、ca.pem

创建 CA (Certificate Authority)

创建 CA 配置文件

mkdir /root/ssl
cd /root/ssl
cfssl print-defaults config > config.json
cfssl print-defaults csr > csr.json
# 根据config.json文件的格式创建如下的ca-config.json文件
# 过期时间设置成了 87600h
cat > ca-config.json <<EOF
{
"signing": {
"default": {
"expiry": "87600h"
},
"profiles": {
"kubernetes": {
"usages": [
"signing",
"key encipherment",
"server auth",
"client auth"
],
"expiry": "87600h"
}
}
}
}
EOF

字段说明

  • ca-config.json:可以定义多个 profiles,分别指定不同的过期时间、使用场景等参数;后续在签名证书时使用某个 profile;
  • signing:表示该证书可用于签名其它证书;生成的 ca.pem 证书中 CA=TRUE
  • server auth:表示client可以用该 CA 对server提供的证书进行验证;
  • client auth:表示server可以用该CA对client提供的证书进行验证;

创建 CA 证书签名请求

创建 ca-csr.json 文件,内容如下:

{
"CN": "kubernetes",
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "BeiJing",
"L": "BeiJing",
"O": "k8s",
"OU": "System"
}
],
"ca": {
"expiry": "87600h"
}
}
  • “CN”:Common Name,kube-apiserver 从证书中提取该字段作为请求的用户名 (User Name);浏览器使用该字段验证网站是否合法;
  • “O”:Organization,kube-apiserver 从证书中提取该字段作为请求用户所属的组 (Group);

生成 CA 证书和私钥

$ cfssl gencert -initca ca-csr.json | cfssljson -bare ca
$ ls ca*
ca-config.json ca.csr ca-csr.json ca-key.pem ca.pem

(后面的证书不是给etcd而是k8s的其他组件使用的)

创建 kubernetes 证书

创建 kubernetes 证书签名请求文件 kubernetes-csr.json

{
"CN": "kubernetes",
"hosts": [
"127.0.0.1",
"172.20.0.112",
"172.20.0.113",
"172.20.0.114",
"172.20.0.115",
"10.254.0.1",
"kubernetes",
"kubernetes.default",
"kubernetes.default.svc",
"kubernetes.default.svc.cluster",
"kubernetes.default.svc.cluster.local"
],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "BeiJing",
"L": "BeiJing",
"O": "k8s",
"OU": "System"
}
]
}
  • 如果 hosts 字段不为空则需要指定授权使用该证书的 IP 或域名列表,由于该证书后续被 etcd 集群和 kubernetes master 集群使用,所以上面分别指定了 etcd 集群、kubernetes master 集群的主机 IP 和 kubernetes 服务的服务 IP(一般是 kube-apiserver 指定的 service-cluster-ip-range 网段的第一个IP,如 10.254.0.1)。
  • 这是最小化安装的kubernetes集群,包括一个私有镜像仓库,三个节点的kubernetes集群,以上物理节点的IP也可以更换为主机名。

生成 kubernetes 证书和私钥

$ cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=kubernetes kubernetes-csr.json | cfssljson -bare kubernetes
$ ls kubernetes*
kubernetes.csr kubernetes-csr.json kubernetes-key.pem kubernetes.pem

或者直接在命令行上指定相关参数:

echo '{"CN":"kubernetes","hosts":[""],"key":{"algo":"rsa","size":2048}}' | cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=kubernetes -hostname="127.0.0.1,172.20.0.112,172.20.0.113,172.20.0.114,172.20.0.115,kubernetes,kubernetes.default" - | cfssljson -bare kubernetes

创建 admin 证书

创建 admin 证书签名请求文件 admin-csr.json

{
"CN": "admin",
"hosts": [],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "BeiJing",
"L": "BeiJing",
"O": "system:masters",
"OU": "System"
}
]
}
  • 后续 kube-apiserver 使用 RBAC 对客户端(如 kubeletkube-proxyPod)请求进行授权;
  • kube-apiserver 预定义了一些 RBAC 使用的 RoleBindings,如 cluster-admin 将 Group system:masters 与 Role cluster-admin 绑定,该 Role 授予了调用kube-apiserver所有 API的权限;
  • O 指定该证书的 Group 为 system:masterskubelet 使用该证书访问 kube-apiserver 时 ,由于证书被 CA 签名,所以认证通过,同时由于证书用户组为经过预授权的 system:masters,所以被授予访问所有 API 的权限;

注意:这个admin 证书,是将来生成管理员用的kube config 配置文件用的,现在我们一般建议使用RBAC 来对kubernetes 进行角色权限控制, kubernetes 将证书中的CN 字段 作为User, O 字段作为 Group(具体参考 Kubernetes中的用户与身份认证授权中 X509 Client Certs 一段)。

在搭建完 kubernetes 集群后,我们可以通过命令: kubectl get clusterrolebinding cluster-admin -o yaml ,查看到 clusterrolebinding cluster-admin 的 subjects 的 kind 是 Group,name 是 system:mastersroleRef 对象是 ClusterRole cluster-admin。 意思是凡是 system:masters Group 的 user 或者 serviceAccount 都拥有 cluster-admin 的角色。 因此我们在使用 kubectl 命令时候,才拥有整个集群的管理权限。

$ kubectl get clusterrolebinding cluster-admin -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
annotations:
rbac.authorization.kubernetes.io/autoupdate: "true"
creationTimestamp: 2017-04-11T11:20:42Z
labels:
kubernetes.io/bootstrapping: rbac-defaults
name: cluster-admin
resourceVersion: "52"
selfLink: /apis/rbac.authorization.k8s.io/v1/clusterrolebindings/cluster-admin
uid: e61b97b2-1ea8-11e7-8cd7-f4e9d49f8ed0
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: Group
name: system:masters

生成 admin 证书和私钥:

$ cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=kubernetes admin-csr.json | cfssljson -bare admin
$ ls admin*
admin.csr admin-csr.json admin-key.pem admin.pem

创建 kube-proxy 证书

创建 kube-proxy 证书签名请求文件 kube-proxy-csr.json

{
"CN": "system:kube-proxy",
"hosts": [],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "BeiJing",
"L": "BeiJing",
"O": "k8s",
"OU": "System"
}
]
}
  • CN 指定该证书的 User 为 system:kube-proxy
  • kube-apiserver 预定义的 RoleBinding system:node-proxier 将User system:kube-proxy 与 Role system:node-proxier 绑定,该 Role 授予了调用 kube-apiserver Proxy 相关 API 的权限;

生成 kube-proxy 客户端证书和私钥

$ cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=kubernetes  kube-proxy-csr.json | cfssljson -bare kube-proxy
$ ls kube-proxy*
kube-proxy.csr kube-proxy-csr.json kube-proxy-key.pem kube-proxy.pem

校验证书

以 Kubernetes 证书为例。

使用 openssl 命令

$ openssl x509  -noout -text -in  kubernetes.pem
...
Signature Algorithm: sha256WithRSAEncryption
Issuer: C=CN, ST=BeiJing, L=BeiJing, O=k8s, OU=System, CN=Kubernetes
Validity
Not Before: Apr 5 05:36:00 2017 GMT
Not After : Apr 5 05:36:00 2018 GMT
Subject: C=CN, ST=BeiJing, L=BeiJing, O=k8s, OU=System, CN=kubernetes
...
X509v3 extensions:
X509v3 Key Usage: critical
Digital Signature, Key Encipherment
X509v3 Extended Key Usage:
TLS Web Server Authentication, TLS Web Client Authentication
X509v3 Basic Constraints: critical
CA:FALSE
X509v3 Subject Key Identifier:
DD:52:04:43:10:13:A9:29:24:17:3A:0E:D7:14:DB:36:F8:6C:E0:E0
X509v3 Authority Key Identifier:
keyid:44:04:3B:60:BD:69:78:14:68:AF:A0:41:13:F6:17:07:13:63:58:CD

X509v3 Subject Alternative Name:
DNS:kubernetes, DNS:kubernetes.default, DNS:kubernetes.default.svc, DNS:kubernetes.default.svc.cluster, DNS:kubernetes.default.svc.cluster.local, IP Address:127.0.0.1, IP Address:172.20.0.112, IP Address:172.20.0.113, IP Address:172.20.0.114, IP Address:172.20.0.115, IP Address:10.254.0.1
...
  • 确认 Issuer 字段的内容和 ca-csr.json 一致;
  • 确认 Subject 字段的内容和 kubernetes-csr.json 一致;
  • 确认 X509v3 Subject Alternative Name 字段的内容和 kubernetes-csr.json 一致;
  • 确认 X509v3 Key Usage、Extended Key Usage 字段的内容和 ca-config.jsonkubernetes profile 一致;

使用 cfssl-certinfo 命令

$ cfssl-certinfo -cert kubernetes.pem
...
{
"subject": {
"common_name": "kubernetes",
"country": "CN",
"organization": "k8s",
"organizational_unit": "System",
"locality": "BeiJing",
"province": "BeiJing",
"names": [
"CN",
"BeiJing",
"BeiJing",
"k8s",
"System",
"kubernetes"
]
},
"issuer": {
"common_name": "Kubernetes",
"country": "CN",
"organization": "k8s",
"organizational_unit": "System",
"locality": "BeiJing",
"province": "BeiJing",
"names": [
"CN",
"BeiJing",
"BeiJing",
"k8s",
"System",
"Kubernetes"
]
},
"serial_number": "174360492872423263473151971632292895707129022309",
"sans": [
"kubernetes",
"kubernetes.default",
"kubernetes.default.svc",
"kubernetes.default.svc.cluster",
"kubernetes.default.svc.cluster.local",
"127.0.0.1",
"10.64.3.7",
"10.254.0.1"
],
"not_before": "2017-04-05T05:36:00Z",
"not_after": "2018-04-05T05:36:00Z",
"sigalg": "SHA256WithRSA",
...

复制所有证书到/etc/etcd/目录下备用

4.2 安装配置etcd

实验环境

  • node-1:192.168.68149
  • node-2:192.168.68.150
  • node-3:192.168.68.148

下载最新版本的etcd

https://github.com/etcd-io/etcd/releases
# 下载相应平台最新版解压之后,将 etcd、etcdctl 放入 /usr/bin
# 可执行权限 chmod +x /usr/bin/etcd*

创建 etcd 的 systemd unit 文件

在/usr/lib/systemd/system/目录下创建文件etcd.service,内容如下。注意替换IP地址为你自己的etcd集群的主机IP。

[root@ha1 etcd]# cat /lib/systemd/system/etcd.service
[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target
Documentation=https://github.com/coreos

[Service]
Type=notify
WorkingDirectory=/var/lib/etcd/
EnvironmentFile=-/etc/etcd/etcd.conf
ExecStart=/usr/local/bin/etcd \
--cert-file=/etc/kubernetes/ssl/kubernetes.pem \
--key-file=/etc/kubernetes/ssl/kubernetes-key.pem \
--peer-cert-file=/etc/kubernetes/ssl/kubernetes.pem \
--peer-key-file=/etc/kubernetes/ssl/kubernetes-key.pem \
--trusted-ca-file=/etc/kubernetes/ssl/ca.pem \
--peer-trusted-ca-file=/etc/kubernetes/ssl/ca.pem \
--initial-cluster infra1=https://192.168.68.149:2380,infra2=https://192.168.68.150:2380,infra3=https://192.168.68.148:2380 \
--initial-cluster-state new
Restart=on-failure
RestartSec=5
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
[root@ha1 etcd]#
  • 指定 etcd 的工作目录为 /var/lib/etcd,数据目录为 /var/lib/etcd,需在启动服务前创建这个目录,否则启动服务的时候会报错“Failed at step CHDIR spawning /usr/bin/etcd: No such file or directory”;
  • 为了保证通信安全,需要指定 etcd 的公私钥(cert-file和key-file)、Peers 通信的公私钥和 CA 证书(peer-cert-file、peer-key-file、peer-trusted-ca-file)、客户端的CA证书(trusted-ca-file);
  • 创建 kubernetes.pem 证书时使用的 kubernetes-csr.json 文件的 hosts 字段包含所有 etcd 节点的IP,否则证书校验会出错;
  • --initial-cluster-state 值为 new 时,--name 的参数值必须位于 --initial-cluster 列表中;

配置环境变量文件

[root@ha1 etcd]# cat /etc/etcd/etcd.conf
# [member]
ETCD_NAME=infra1
ETCD_DATA_DIR="/var/lib/etcd"
ETCD_LISTEN_PEER_URLS="https://192.168.68.149:2380"
ETCD_LISTEN_CLIENT_URLS="https://192.168.68.149:2379"

#[cluster]
ETCD_INITIAL_ADVERTISE_PEER_URLS="https://192.168.68.149:2380"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"
ETCD_ADVERTISE_CLIENT_URLS="https://192.168.68.149:2379"
[root@ha1 etcd]#

Node-2,Node-3配置与此类似,ETCD_NAME换成对应节点的infra2/3

启动etcd服务:

# systemctl daemon-reload
# systemctl enable etcd
# systemctl start etcd
# systemctl status etcd

踩到的坑

  • 启动etcd服务时报错

    : {"level":"fatal","ts":1649936424.1166,"caller":"flags/flag.go:85","msg":"conflicting environment variable is shadowed by corresponding command-line flag (either unset environment variable or disable flag))","environment-variable":"ETCD_DATA_DIR","stacktrace":"go.etcd.io/etcd/pkg/v3/flags.verifyEnv\n\t/tmp/etcd-release-3.5.2/etc

environment variable定义重复了,比较新的版本etcd启动会自动读取/etc/etcd/etcd.config(或其他指定目录)中的变量,不需要重复再在/lib/systemd/system/etcd.service中的ExecStart重复指定

  • cannot access data directory: directory "/application/kubernetes/data/","drwxr-xr-x" exist without desired file permission "-rwx------".

将权限设置为700即可

  • etcdctl member list查看成员列表,显示leader cluster may be unhealthy: failed to list members Error: unexpected status code 404

    解决1:etcdctl通过ETCDCTL_API=3查询的member list就是没有leader信息的,需要通过API2.0进行查询。ETCD3.4版本ETCDCTL_API=3 etcdctl 和 etcd --enable-v2=false 成为了默认配置,进行API2.0进行查询需要如下设置:客户端:export ETCDCTL_API=2 服务端:etcd.config.yaml中增加enable-v2: true,并重启etcd服务。
    解决2:API=3,使用如下命令:ETCDCTL_API=3 etcdctl endpoint status --cluster -w table注意:etcd集群所有节点都要启动enable-v2: true,否则会出现在API2.0下执行etcdctl命令,时而成功、时而报“unexpected status code 404”

5 使用kubeadm部署工具安装k8s

为了方便学习此处使用kubeadm方式安装,后续考虑使用二进制方式部署。实验环境:

host CPU RAM IP
master 2 2 192.168.68.151
node-1 2 2 192.168.68.149
node-2 2 2 192.168.68.150
node-3 2 2 192.168.68.148

系统:CentOS Linux release 7.9.2009 (Core), 内核版本:3.10.0-1160.62.1.el7.x86_64

5.1 安装前准备

环境要求

  • 一台兼容的 Linux 主机。Kubernetes 项目为基于 Debian 和 Red Hat 的 Linux 发行版以及一些不提供包管理器的发行版提供通用的指令
  • 每台机器 2 GB 或更多的 RAM (如果少于这个数字将会影响你应用的运行内存)
  • 2 CPU 核或更多
  • 集群中的所有机器的网络彼此均能相互连接(公网和内网都可以)
  • 节点之中不可以有重复的主机名、MAC 地址或 product_uuid。请参见这里了解更多详细信息。
  • 开启机器上的某些端口。请参见这里 了解更多详细信息。
  • 禁用交换分区。为了保证 kubelet 正常工作,你 必须 禁用交换分区。

确保每个节点上 MAC 地址和 product_uuid 的唯一性

  • 你可以使用命令 ip linkifconfig -a 来获取网络接口的 MAC 地址
  • 可以使用 sudo cat /sys/class/dmi/id/product_uuid 命令对 product_uuid 校验

一般来讲,硬件设备会拥有唯一的地址,但是有些虚拟机的地址可能会重复。 Kubernetes 使用这些值来唯一确定集群中的节点。 如果这些值在每个节点上不唯一,可能会导致安装 失败

允许 iptables 检查桥接流量

确保 br_netfilter 模块被加载。这一操作可以通过运行 lsmod | grep br_netfilter 来完成。若要显式加载该模块,可执行 sudo modprobe br_netfilter

为了让你的 Linux 节点上的 iptables 能够正确地查看桥接流量,你需要确保在你的 sysctl 配置中将 net.bridge.bridge-nf-call-iptables 设置为 1。例如:

cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
br_netfilter
EOF

cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
sudo sysctl --system

安装 runtime

默认情况下,Kubernetes 使用 容器运行时接口(Container Runtime Interface,CRI) 来与你所选择的容器运行时交互。

如果你不指定运行时,则 kubeadm 会自动尝试检测到系统上已经安装的运行时, 方法是扫描一组众所周知的 Unix 域套接字。 下面的表格列举了一些容器运行时及其对应的套接字路径:

运行时 域套接字
Docker /var/run/dockershim.sock
containerd /run/containerd/containerd.sock
CRI-O /var/run/crio/crio.sock

如果同时检测到 Docker 和 containerd,则优先选择 Docker。 这是必然的,因为 Docker 18.09 附带了 containerd 并且两者都是可以检测到的, 即使你仅安装了 Docker。 如果检测到其他两个或多个运行时,kubeadm 输出错误信息并退出。

kubelet 通过内置的 dockershim CRI 实现与 Docker 集成。

参阅容器运行时 以了解更多信息。

安装 kubeadm、kubelet 和 kubectl

你需要在每台机器上安装以下的软件包:

  • kubeadm:用来初始化集群的指令。
  • kubelet:在集群中的每个节点上用来启动 Pod 和容器等。
  • kubectl:用来与集群通信的命令行工具。

此处以阿里源安装为例:

cat <<EOF > /etc/yum.repos.d/kubernetes.repo
> [kubernetes]
> name=Kubernetes
> baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
> enabled=1
> gpgcheck=1
> repo_gpgcheck=0
> gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
> EOF

ps: 由于官网未开放同步方式, 可能会有索引gpg检查失败的情况, 这时请用 yum install -y --nogpgcheck kubelet kubeadm kubectl 安装;或者关闭repo_gpgcheck,否则yum安装时会报错https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/repodata/repomd.xml: [Errno -1] repomd.xml signature could not be verified for kubernetes

将 SELinux 设置为 permissive 模式(相当于将其禁用)

setenforce 0
sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config

安装kubeadm、kubelet 和 kubectl

yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetes
systemctl enable --now kubelet

关闭swap(1.8版本以后必须,否则报错)

  • swapoff -a 临时关闭
[root@master etc]# swapoff -a  # 临时关闭
[root@master etc]# free -m
total used free shared buff/cache available
Mem: 1980 312 134 17 1533 1486
Swap: 0 0 0
  • 在/etc/fstab中注视掉swap(永久)
[root@master etc]#sed -i "s/\/dev\/mapper\/centos-swap/\#\/dev\/mapper\/centos-swap/g" /etc/fstab
[root@master etc]# mount -a # 修改后重新挂载全部挂载点

5.2 节点初始化

master 节点初始化

初始化master节点,kubeadm init命令支持两种初始化方式,一是通过命令行选项传递关键的部署设定,另一个是基于yaml格式的专用配置文件,后一种允许用户自定义各个部署参数。

命令中的各选项简单说明如下:

  • –pod-network-cidr:Pod网络的地址范围,其值为CIDR格式的网络地址,通常,Flannel网络插件的默认为10.244.0.0/16,Project Calico插件的默认值为192.168.0.0/16;
  • –service-cidr:Service的网络地址范围,其值为CIDR格式的网络地址,默认为10.96.0.0/12;通常,仅Flannel一类的网络插件需要手动指定该地址
  • –apiserver-advertise-address:apiserver通告给其他组件的IP地址,一般应该为Master节点的用于集群内部通信的IP地址,0.0.0.0表示节点上所有可用地址;
  • –token-ttl:共享令牌(token)的过期时长,默认为24小时,0表示永不过期;为防止不安全存储等原因导致的令牌泄露危及集群安全,建议为其设定过期时长。未设定该选项时,在token过期后,若期望再向集群中加入其它节点,可以使用如下命令重新创建token,并生成节点加入命令。

有关 kubeadm init 参数的更多信息,请参见 kubeadm 参考指南

要使用配置文件配置 kubeadm init 命令,请参见带配置文件使用 kubeadm init

[root@master ~]#   kubeadm init --apiserver-advertise-address=192.168.68.151 --service-cidr=10.1.0.0/16 --pod-network-cidr=10.244.0.0/16  # 151是mater本机的ip
[init] Using Kubernetes version: v1.23.5
[preflight] Running pre-flight checks
[WARNING Firewalld]: firewalld is active, please ensure ports [6443 10250] are open or your cluster may not function correctly
[WARNING Service-Kubelet]: kubelet service is not enabled, please run 'systemctl enable kubelet.service'
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local master] and IPs [10.1.0.1 192.168.68.151]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost master] and IPs [192.168.68.151 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost master] and IPs [192.168.68.151 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 6.003818 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.23" in namespace kube-system with the configuration for the kubelets in the cluster
NOTE: The "kubelet-config-1.23" naming of the kubelet ConfigMap is deprecated. Once the UnversionedKubeletConfigMap feature gate graduates to Beta the default name will become just "kubelet-config". Kubeadm upgrade will handle this transition transparently.
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node master as control-plane by adding the labels: [node-role.kubernetes.io/master(deprecated) node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node master as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: v7a8ke.zqnhzofo3cn69lge
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 192.168.68.151:6443 --token v7a8ke.zqnhzofo3cn69lge \
--discovery-token-ca-cert-hash sha256:6d362f1c226c8e8ae42a87f57608264104b58abd7ec5cfde4795353eba8c4a11

初始化前可以输入kubeadm config images pull 事先拉取相关镜像, 加快初始化速度

如果kubeadm初始化时出现问题,参考官方故障排查手册

按照说明,添加cluster相关配置:

# 普通用户
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
# root用户
export KUBECONFIG=/etc/kubernetes/admin.conf

此时检查 pods, 发现 coredns 在pending,是因为在等待网络插件的安装

[root@master .kube]# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-64897985d-7d9cf 0/1 Pending 0 17m
kube-system coredns-64897985d-gdkpg 0/1 Pending 0 17m
kube-system etcd-master 1/1 Running 0 17m
kube-system kube-apiserver-master 1/1 Running 0 17m
kube-system kube-controller-manager-master 1/1 Running 0 17m
kube-system kube-proxy-4cv47 1/1 Running 0 17m
kube-system kube-scheduler-master 1/1 Running 1 17m

还有非常重要的一步是 部署网络插件 ,参考 插件列表 , 这里我们选择安装 Flannel

[root@master ~]# kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
Warning: policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
podsecuritypolicy.policy/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds created

此时再次检查pods:

## 此时coredns的pods也变成了Running状态
[root@master ~]# kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-64897985d-7d9cf 0/1 Running 0 130m
kube-system coredns-64897985d-gdkpg 0/1 Running 0 130m
kube-system etcd-master 1/1 Running 0 131m
kube-system kube-apiserver-master 1/1 Running 0 131m
kube-system kube-controller-manager-master 1/1 Running 0 131m
kube-system kube-flannel-ds-5kwrh 1/1 Running 0 95s
kube-system kube-proxy-4cv47 1/1 Running 0 130m
kube-system kube-scheduler-master 1/1 Running 1 131m

## 此时只有一个node
[root@master ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master Ready control-plane,master 132m v1.23.5

slave节点加入集群

[root@node-1 ~]# kubeadm join 192.168.68.151:6443 --token v7a8ke.zqnhzofo3cn69lge --discovery-token-ca-cert-hash sha256:6d362f1c226c8e8ae42a87f57608264104b58abd7ec5cfde4795353eba8c4a11
[preflight] Running pre-flight checks
^C
[root@node-1 ~]# kubeadm join 192.168.68.151:6443 --token v7a8ke.zqnhzofo3cn69lge --discovery-token-ca-cert-hash sha256:6d362f1c226c8e8ae42a87f57608264104b58abd7ec5cfde4795353eba8c4a11
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

如法炮制将node-2、node-3 也加入,此时查看nodes状态,由notready变为ready

[root@master ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master Ready control-plane,master 167m v1.23.5
node-1 NotReady <none> 40s v1.23.5
node-2 NotReady <none> 45s v1.23.5
node-3 NotReady <none> 61s v1.23.5
[root@master ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master Ready control-plane,master 173m v1.23.5
node-1 Ready <none> 6m33s v1.23.5
node-2 Ready <none> 6m38s v1.23.5
node-3 Ready <none> 6m54s v1.23.5

查看pods

[root@master ~]# kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-64897985d-7d9cf 1/1 Running 0 177m
kube-system coredns-64897985d-gdkpg 1/1 Running 0 177m
kube-system etcd-master 1/1 Running 0 177m
kube-system kube-apiserver-master 1/1 Running 0 177m
kube-system kube-controller-manager-master 1/1 Running 0 177m
kube-system kube-flannel-ds-5kwrh 1/1 Running 0 48m
kube-system kube-flannel-ds-hgcb4 1/1 Running 0 10m
kube-system kube-flannel-ds-hmthh 1/1 Running 0 10m
kube-system kube-flannel-ds-w7wl5 1/1 Running 0 10m
kube-system kube-proxy-2mk4k 1/1 Running 0 10m
kube-system kube-proxy-4cv47 1/1 Running 0 177m
kube-system kube-proxy-s6d6h 1/1 Running 0 10m
kube-system kube-proxy-zlj2x 1/1 Running 0 10m
kube-system kube-scheduler-master 1/1 Running 1 177m

至此一个master,三个node的kubernetes集群基础设施已经部署完成

5.x 踩到的坑

  • kubeadm 初始化报错18797 server.go:302] "Failed to run kubelet" err="failed to run Kubelet: misconfiguration: kubelet cgroup driver: \"systemd\" is different from docker cgroup driver: \"cgroupfs\""

    原因

     kubernetes1.14之后的版本推荐使用systemd,但docker默认的Cgroup Driver 是Cgroup,使得kubelet部署报错

    解决方案

    修改docker配置文件

    cat <<EOF | sudo tee /etc/docker/daemon.json
    {
    "exec-opts": ["native.cgroupdriver=systemd"],
    "log-driver": "json-file",
    "log-opts": {
    "max-size": "100m"
    },
    "storage-driver": "overlay2"
    }
    EOF

    完成后重启docker

    systemctl enable docker
    systemctl daemon-reload
    systemctl restart docker
  • kubeadm初始化时拉取镜像超时,

    原因 谷歌repo被墙

    解决方案

    kubeadm init初始化时加入--image-repository registry.aliyuncs.com/google_containers, 或者在上一步中更换为阿里源

欢迎关注我的其它发布渠道