https://kubernetes.io/docs/concepts/cluster-administration/logging/
总体分为三种方式:
容器日志驱动:
https://docs.docker.com/config/containers/logging/configure/
查看当前的docker主机的驱动:
$ docker info --format ‘{{.LoggingDriver}}‘
json-file格式,docker会默认将标准和错误输出保存为宿主机的文件,路径为:
/var/lib/docker/containers/<container-id>/<container-id>-json.log
并且可以设置日志轮转:
{
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "3",
"labels": "production_status",
"env": "os,customer"
}
}
优势:
劣势:
思路:在pod中启动一个sidecar容器,把容器内的日志文件吐到标准输出,由宿主机中的日志收集agent进行采集。
$ cat count-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: counter
spec:
containers:
- name: count
image: busybox
args:
- /bin/sh
- -c
- >
i=0;
while true;
do
echo "$i: $(date)" >> /var/log/1.log;
echo "$(date) INFO $i" >> /var/log/2.log;
i=$((i+1));
sleep 1;
done
volumeMounts:
- name: varlog
mountPath: /var/log
- name: count-log-1
image: busybox
args: [/bin/sh, -c, ‘tail -n+1 -f /var/log/1.log‘]
volumeMounts:
- name: varlog
mountPath: /var/log
- name: count-log-2
image: busybox
args: [/bin/sh, -c, ‘tail -n+1 -f /var/log/2.log‘]
volumeMounts:
- name: varlog
mountPath: /var/log
volumes:
- name: varlog
emptyDir: {}
$ kubectl create -f counter-pod.yaml
$ kubectl logs -f counter -c count-log-1
优势:
劣势:
思路:直接在业务Pod中使用sidecar的方式启动一个日志收集的组件(比如fluentd),这样日志收集可以将容器内的日志当成本地文件来进行收取。
优势:不用往宿主机存储日志,本地日志完全可以收集
劣势:每个业务应用额外启动一个日志agent,带来额外的资源损耗
目前来讲,最建议的是采用节点级的日志代理。
方案一:自研方案,实现一个自研的日志收集agent,大致思路:
方案二:日志使用开源的Agent进行收集(EFK方案),适用范围广,可以满足绝大多数日志收集、展示的需求。
Elasticsearch
一个开源的分布式、Restful 风格的搜索和数据分析引擎,它的底层是开源库Apache Lucene。它可以被下面这样准确地形容:
Kibana
Kibana是一个开源的分析和可视化平台,设计用于和Elasticsearch一起工作。可以通过Kibana来搜索,查看,并和存储在Elasticsearch索引中的数据进行交互。也可以轻松地执行高级数据分析,并且以各种图标、表格和地图的形式可视化数据。
一个针对日志的收集、处理、转发系统。通过丰富的插件系统,可以收集来自于各种系统或应用的日志,转化为用户指定的格式后,转发到用户所指定的日志存储系统之中。
Fluentd 通过一组给定的数据源抓取日志数据,处理后(转换成结构化的数据格式)将它们转发给其他服务,比如 Elasticsearch、对象存储、kafka等等。Fluentd 支持超过300个日志存储和分析服务,所以在这方面是非常灵活的。主要运行步骤如下
为什么推荐使用fluentd作为k8s体系的日志收集工具?
云原生:https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/fluentd-elasticsearch
将日志文件JSON化
可插拔架构设计
极小的资源占用
基于C和Ruby语言, 30-40MB,13,000 events/second/core
极强的可靠性
https://docs.fluentd.org/v/0.12/quickstart/life-of-a-fluentd-event
Input -> filter 1 -> ... -> filter N -> Buffer -> Output
指令介绍:
source ,数据源,对应Input
通过使用 source 指令,来选择和配置所需的输入插件来启用 Fluentd 输入源, source 把事件提交到 fluentd 的路由引擎中。使用type来区分不同类型的数据源。如下配置可以监听指定文件的追加输入:
<source>
@type tail
path /var/log/httpd-access.log
pos_file /var/log/td-agent/httpd-access.log.pos
tag myapp.access
format apache2
</source>
filter,Event processing pipeline(事件处理流)
filter 可以串联成 pipeline,对数据进行串行处理,最终再交给 match 输出。 如下可以对事件内容进行处理:
<source>
@type http
port 9880
</source>
<filter myapp.access>
@type record_transformer
<record>
host_param “#{Socket.gethostname}”
</record>
</filter>
filter 获取数据后,调用内置的 @type record_transformer 插件,在事件的 record 里插入了新的字段 host_param,然后再交给 match 输出。
label指令
可以在 source
里指定 @label
,这个 source 所触发的事件就会被发送给指定的 label 所包含的任务,而不会被后续的其他任务获取到。
<source>
@type forward
</source>
<source>
### 这个任务指定了 label 为 @SYSTEM
### 会被发送给 <label @SYSTEM>
### 而不会被发送给下面紧跟的 filter 和 match
@type tail
@label @SYSTEM
path /var/log/httpd-access.log
pos_file /var/log/td-agent/httpd-access.log.pos
tag myapp.access
format apache2
</source>
<filter access.**>
@type record_transformer
<record>
# …
</record>
</filter>
<match **>
@type elasticsearch
# …
</match>
<label @SYSTEM>
### 将会接收到上面 @type tail 的 source event
<filter var.log.middleware.**>
@type grep
# …
</filter>
<match **>
@type s3
# …
</match>
</label>
match,匹配输出
查找匹配 “tags” 的事件,并处理它们。match 命令的最常见用法是将事件输出到其他系统(因此,与 match 命令对应的插件称为 “输出插件”)
<source>
@type http
port 9880
</source>
<filter myapp.access>
@type record_transformer
<record>
host_param “#{Socket.gethostname}”
</record>
</filter>
<match myapp.access>
@type file
path /var/log/fluent/access
</match>
事件的结构:
time:事件的处理时间
tag:事件的来源,在fluentd.conf中配置
record:真实的日志内容,json对象
比如,下面这条原始日志:
192.168.0.1 - - [28/Feb/2013:12:00:00 +0900] "GET / HTTP/1.1" 200 777
经过fluentd 引擎处理完后的样子可能是:
2020-07-16 08:40:35 +0000 apache.access: {"user":"-","method":"GET","code":200,"size":777,"host":"192.168.0.1","path":"/"}
Input -> filter 1 -> ... -> filter N -> Buffer -> Output
因为每个事件数据量通常很小,考虑数据传输效率、稳定性等方面的原因,所以基本不会每条事件处理完后都会立马写入到output端,因此fluentd建立了缓冲模型,模型中主要有两个概念:
可以设置的参数,主要有:
大致的过程为:
随着fluentd事件的不断生成并写入chunk,缓存块持变大,当缓存块满足buffer_chunk_limit大小或者新的缓存块诞生超过flush_interval时间间隔后,会推入缓存queue队列尾部,该队列大小由buffer_queue_limit决定。
每次有新的chunk入列,位于队列最前部的chunk块会立即写入配置的存储后端,比如配置的是kafka,则立即把数据推入kafka中。
比较理想的情况是每次有新的缓存块进入缓存队列,则立马会被写入到后端,同时,新缓存块也持续入列,但是入列的速度不会快于出列的速度,这样基本上缓存队列处于空的状态,队列中最多只有一个缓存块。
但是实际情况考虑网络等因素,往往缓存块被写入后端存储的时候会出现延迟或者写入失败的情况,当缓存块写入后端失败时,该缓存块还会留在队列中,等retry_wait时间后重试发送,当retry的次数达到retry_limit后,该缓存块被销毁(数据被丢弃)。
此时缓存队列持续有新的缓存块进来,如果队列中存在很多未及时写入到后端存储的缓存块的话,当队列长度达到buffer_queue_limit大小,则新的事件被拒绝,fluentd报错,error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data"。
还有一种情况是网络传输缓慢的情况,若每3秒钟会产生一个新块,但是写入到后端时间却达到了30s钟,队列长度为100,那么每个块出列的时间内,又有新的10个块进来,那么队列很快就会被占满,导致异常出现。
目标:收集容器内的nginx应用的access.log日志,并解析日志字段为JSON格式,原始日志的格式为:
$ tail -f access.log
...
53.49.146.149 1561620585.973 0.005 502 [27/Jun/2019:15:29:45 +0800] 178.73.215.171 33337 GET https
收集并处理成:
{
"serverIp": "53.49.146.149",
"timestamp": "1561620585.973",
"respondTime": "0.005",
"httpCode": "502",
"eventTime": "27/Jun/2019:15:29:45 +0800",
"clientIp": "178.73.215.171",
"clientPort": "33337",
"method": "GET",
"protocol": "https"
}
思路:
fluent.conf
<source>
@type tail
@label @nginx_access
path /fluentd/access.log
pos_file /fluentd/nginx_access.posg
tag nginx_access
format none
@log_level trace
</source>
<label @nginx_access>
<filter nginx_access>
@type parser
key_name message
format /(?<serverIp>[^ ]*) (?<timestamp>[^ ]*) (?<respondTime>[^ ]*) (?<httpCode>[^ ]*) \[(?<eventTime>[^\]]*)\] (?<clientIp>[^ ]*) (?<clientPort>[^ ]*) (?<method>[^ ]*) (?<protocol>[^ ]*)/
</filter>
<match nginx_access>
@type stdout
</match>
</label>
启动服务,追加文件内容:
$ docker run -u root --rm -ti 192.168.136.10:5000/fluentd_elasticsearch/fluentd:v2.5.2 sh
/ # cd /fluentd/
/ # touch access.log
/ # fluentd -c /fluentd/etc/fluent.conf
/ # echo ‘53.49.146.149 1561620585.973 0.005 502 [27/Jun/2019:15:29:45 +0800] 178.73.215.171 33337 GET https‘ >>/fluentd/access.log
使用该网站进行正则校验: http://fluentular.herokuapp.com
<source>
@type tail
@label @nginx_access
path /fluentd/access.log
pos_file /fluentd/nginx_access.posg
tag nginx_access
format none
@log_level trace
</source>
<label @nginx_access>
<filter nginx_access>
@type parser
key_name message
format /(?<serverIp>[^ ]*) (?<timestamp>[^ ]*) (?<respondTime>[^ ]*) (?<httpCode>[^ ]*) \[(?<eventTime>[^\]]*)\] (?<clientIp>[^ ]*) (?<clientPort>[^ ]*) (?<method>[^ ]*) (?<protocol>[^ ]*)/
</filter>
<filter nginx_access>
@type record_transformer
enable_ruby
<record>
host_name "#{Socket.gethostname}"
my_key "my_val"
tls ${record["protocol"].index("https") ? "true" : "false"}
</record>
</filter>
<match nginx_access>
@type stdout
</match>
</label>
efk/elasticsearch.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
addonmanager.kubernetes.io/mode: Reconcile
k8s-app: elasticsearch-logging
version: v7.4.2
name: elasticsearch-logging
namespace: logging
spec:
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
k8s-app: elasticsearch-logging
version: v7.4.2
serviceName: elasticsearch-logging
template:
metadata:
labels:
k8s-app: elasticsearch-logging
version: v7.4.2
spec:
nodeSelector:
es: "true" ## 指定部署在哪个节点。需根据环境来修改
containers:
- env:
- name: NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: cluster.initial_master_nodes
value: elasticsearch-logging-0
- name: ES_JAVA_OPTS
value: "-Xms512m -Xmx512m"
image: 192.168.136.10:5000/elasticsearch/elasticsearch:7.4.2
name: elasticsearch-logging
ports:
- containerPort: 9200
name: db
protocol: TCP
- containerPort: 9300
name: transport
protocol: TCP
volumeMounts:
- mountPath: /usr/share/elasticsearch/data
name: elasticsearch-logging
dnsConfig:
options:
- name: single-request-reopen
initContainers:
- command:
- /sbin/sysctl
- -w
- vm.max_map_count=262144
image: alpine:3.6
imagePullPolicy: IfNotPresent
name: elasticsearch-logging-init
resources: {}
securityContext:
privileged: true
- name: fix-permissions
image: alpine:3.6
command: ["sh", "-c", "chown -R 1000:1000 /usr/share/elasticsearch/data"]
securityContext:
privileged: true
volumeMounts:
- name: elasticsearch-logging
mountPath: /usr/share/elasticsearch/data
volumes:
- name: elasticsearch-logging
hostPath:
path: /esdata
---
apiVersion: v1
kind: Service
metadata:
labels:
k8s-app: elasticsearch-logging
name: elasticsearch
namespace: logging
spec:
ports:
- port: 9200
protocol: TCP
targetPort: db
selector:
k8s-app: elasticsearch-logging
type: ClusterIP
$ kubectl create namespace logging
## 给slave1节点打上label,将es服务调度到slave1节点
$ kubectl label node k8s-slave1 es=true
## 部署服务,可以先去部署es的节点把镜像下载到本地
$ kubectl create -f elasticsearch.yaml
statefulset.apps/elasticsearch-logging created
service/elasticsearch created
## 等待片刻,查看一下es的pod部署到了k8s-slave1节点,状态变为running
$ kubectl -n logging get po -o wide
NAME READY STATUS RESTARTS AGE IP NODE
elasticsearch-logging-0 1/1 Running 0 69m 10.244.1.104 k8s-slave1
# 然后通过curl命令访问一下服务,验证es是否部署成功
$ kubectl -n logging get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
elasticsearch ClusterIP 10.109.174.58 <none> 9200/TCP 71m
$ curl 10.109.174.58:9200
{
"name" : "elasticsearch-logging-0",
"cluster_name" : "docker-cluster",
"cluster_uuid" : "uic8xOyNSlGwvoY9DIBT1g",
"version" : {
"number" : "7.4.2",
"build_flavor" : "default",
"build_type" : "docker",
"build_hash" : "2f90bbf7b93631e52bafb59b3b049cb44ec25e96",
"build_date" : "2019-10-28T20:40:44.881551Z",
"build_snapshot" : false,
"lucene_version" : "8.2.0",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
kibana需要暴露web页面给前端使用,因此使用ingress配置域名来实现对kibana的访问
kibana为无状态应用,直接使用Deployment来启动
kibana需要访问es,直接利用k8s服务发现访问此地址即可,http://elasticsearch:9200
efk/kibana.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: kibana
namespace: logging
labels:
app: kibana
spec:
selector:
matchLabels:
app: "kibana"
template:
metadata:
labels:
app: kibana
spec:
nodeSelector:
kibana: "true" ## 指定部署在哪个节点。需根据环境来修改
containers:
- name: kibana
image: 192.168.136.10:5000/kibana/kibana:7.4.2
resources:
limits:
cpu: 1000m
requests:
cpu: 100m
env:
- name: ELASTICSEARCH_URL
value: http://elasticsearch:9200
ports:
- containerPort: 5601
---
apiVersion: v1
kind: Service
metadata:
name: kibana
namespace: logging
labels:
app: kibana
spec:
ports:
- port: 5601
protocol: TCP
targetPort: 5601
type: ClusterIP
selector:
app: kibana
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: kibana
namespace: logging
spec:
rules:
- host: kibana.luffy.com
http:
paths:
- path: /
backend:
serviceName: kibana
servicePort: 5601
$ kubectl label node k8s-slave2 kibana=true
$ kubectl create -f kibana.yaml
deployment.apps/kibana created
service/kibana created
ingress/kibana created
# 然后查看pod,等待状态变成running
$ kubectl -n logging get po
NAME READY STATUS RESTARTS AGE
elasticsearch-logging-0 1/1 Running 0 88m
kibana-944c57766-ftlcw 1/1 Running 0 15m
## 配置域名解析 kibana.luffy.com,并访问服务进行验证,若可以访问,说明连接es成功
efk/fluentd-es-config-main.yaml
apiVersion: v1
data:
fluent.conf: |-
# This is the root config file, which only includes components of the actual configuration
#
# Do not collect fluentd‘s own logs to avoid infinite loops.
<match fluent.**>
@type null
</match>
@include /fluentd/etc/config.d/*.conf
kind: ConfigMap
metadata:
labels:
addonmanager.kubernetes.io/mode: Reconcile
name: fluentd-es-config-main
namespace: logging
配置文件,fluentd-config.yaml,注意点:
efk/fluentd-configmap.yaml
kind: ConfigMap
apiVersion: v1
metadata:
name: fluentd-config
namespace: logging
labels:
addonmanager.kubernetes.io/mode: Reconcile
data:
containers.input.conf: |-
<source>
@id fluentd-containers.log
@type tail
path /var/log/containers/*.log
pos_file /var/log/es-containers.log.pos
time_format %Y-%m-%dT%H:%M:%S.%NZ
localtime
tag raw.kubernetes.*
format json
read_from_head true
</source>
# Detect exceptions in the log output and forward them as one log entry.
# https://github.com/GoogleCloudPlatform/fluent-plugin-detect-exceptions
<match raw.kubernetes.**>
@id raw.kubernetes
@type detect_exceptions
remove_tag_prefix raw
message log
stream stream
multiline_flush_interval 5
max_bytes 500000
max_lines 1000
</match>
output.conf: |-
# Enriches records with Kubernetes metadata
<filter kubernetes.**>
@type kubernetes_metadata
</filter>
<match **>
@id elasticsearch
@type elasticsearch
@log_level info
include_tag_key true
host elasticsearch
port 9200
logstash_format true
request_timeout 30s
<buffer>
@type file
path /var/log/fluentd-buffers/kubernetes.system.buffer
flush_mode interval
retry_type exponential_backoff
flush_thread_count 2
flush_interval 5s
retry_forever
retry_max_interval 30
chunk_limit_size 2M
queue_limit_length 8
overflow_action block
</buffer>
</match>
daemonset定义文件,fluentd.yaml,注意点:
efk/fluentd.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: fluentd-es
namespace: logging
labels:
k8s-app: fluentd-es
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: fluentd-es
labels:
k8s-app: fluentd-es
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
rules:
- apiGroups:
- ""
resources:
- "namespaces"
- "pods"
verbs:
- "get"
- "watch"
- "list"
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: fluentd-es
labels:
k8s-app: fluentd-es
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
subjects:
- kind: ServiceAccount
name: fluentd-es
namespace: logging
apiGroup: ""
roleRef:
kind: ClusterRole
name: fluentd-es
apiGroup: ""
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
labels:
addonmanager.kubernetes.io/mode: Reconcile
k8s-app: fluentd-es
name: fluentd-es
namespace: logging
spec:
selector:
matchLabels:
k8s-app: fluentd-es
template:
metadata:
labels:
k8s-app: fluentd-es
spec:
containers:
- env:
- name: FLUENTD_ARGS
value: --no-supervisor -q
image: 192.168.136.10:5000/fluentd_elasticsearch/fluentd:v2.5.2
imagePullPolicy: IfNotPresent
name: fluentd-es
resources:
limits:
memory: 500Mi
requests:
cpu: 100m
memory: 200Mi
volumeMounts:
- mountPath: /var/log
name: varlog
- mountPath: /var/lib/docker/containers
name: varlibdockercontainers
readOnly: true
- mountPath: /fluentd/etc/config.d
name: config-volume
- mountPath: /fluentd/etc/fluent.conf
name: config-volume-main
subPath: fluent.conf
nodeSelector:
fluentd: "true"
securityContext: {}
serviceAccount: fluentd-es
serviceAccountName: fluentd-es
volumes:
- hostPath:
path: /var/log
type: ""
name: varlog
- hostPath:
path: /var/lib/docker/containers
type: ""
name: varlibdockercontainers
- configMap:
defaultMode: 420
name: fluentd-config
name: config-volume
- configMap:
defaultMode: 420
items:
- key: fluent.conf
path: fluent.conf
name: fluentd-es-config-main
name: config-volume-main
## 给slave1打上标签,进行部署fluentd日志采集服务
$ kubectl label node k8s-slave1 fluentd=true
$ kubectl label node k8s-slave2 fluentd=true
# 创建服务
$ kubectl create -f fluentd-es-config-main.yaml
configmap/fluentd-es-config-main created
$ kubectl create -f fluentd-configmap.yaml
configmap/fluentd-config created
$ kubectl create -f fluentd.yaml
serviceaccount/fluentd-es created
clusterrole.rbac.authorization.k8s.io/fluentd-es created
clusterrolebinding.rbac.authorization.k8s.io/fluentd-es created
daemonset.extensions/fluentd-es created
## 然后查看一下pod是否已经在k8s-slave1
$ kubectl -n logging get po -o wide
NAME READY STATUS RESTARTS AGE
elasticsearch-logging-0 1/1 Running 0 123m
fluentd-es-246pl 1/1 Running 0 2m2s
kibana-944c57766-ftlcw 1/1 Running 0 50m
上述是简化版的k8s日志部署收集的配置,完全版的可以提供 https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/fluentd-elasticsearch 来查看。
在slave节点中启动服务,同时往标准输出中打印测试日志,到kibana中查看是否可以收集
efk/test-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: counter
spec:
nodeSelector:
fluentd: "true"
containers:
- name: count
image: alpine:3.6
args: [/bin/sh, -c,
‘i=0; while true; do echo "$i: $(date)"; i=$((i+1)); sleep 1; done‘]
$ kubectl get po
NAME READY STATUS RESTARTS AGE
counter 1/1 Running 0 6s
登录kibana界面,按照截图的顺序操作:
可以通过其他元数据来过滤日志数据,比如可以单击任何日志条目以查看其他元数据,如容器名称,Kubernetes 节点,命名空间等,比如kubernetes.pod_name : counter
到这里,我们就在 Kubernetes 集群上成功部署了 EFK ,要了解如何使用 Kibana 进行日志数据分析,可以参考 Kibana 用户指南文档:https://www.elastic.co/guide/en/kibana/current/index.html
原文:https://www.cnblogs.com/Mr-Axin/p/14756592.html