ElastAlert日志监控预警

新增

短信报警优化

新增post报警方式,可用于发送动态短信

1
2
3
4
5
6
7
8
9
10
11
# 报警方式
alert:
- "post"

http_post_url: "http://adminhome.jinhui365.cn/sendSmsForAppAlert"

# phoneList 手机号,逗号分割的字符串。例如:"15235446827,15235446827"
# content 短信内容。${}代表动态内容
http_post_static_payload:
phoneList: "15235446827"
content: "${@timestamp}, IOS ${l} 级别日志报警。版本${v},手机型号${d},日志数${num_hits}."

动态内容为邮件内容,目前仅支持首层的key字段

举个例子

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
邮件内容为:
@timestamp: 2018-01-16T02:38:08.340Z
@version: 1
_id: AWD81Lv0TU1e05j5IHmu
_index: ios-2018.01.16
_type: logs
arg: {
"appkey": "jh28a4c4bc6734f58b",
"branchNo": "88",
"client": "iOS",
"encrypt": 0,
"fundAccount": "881125524",
"signcode": "2FBDC7A4842E30B8F9ACD112261ECF28",
"timestamp": "1504442929685",
"token": "u4ftBaBm-xg=",
"uid": "1861574",
"version": "5.15.0"
}
c: iOS
d: iPhone 6s Plus
host: ubuntu
i: B6C1AB45-C3CC-4C60-BD10-259778C6699B
ip: 223.104.95.169 贵阳市 移动
l: warn
message: request failed:0 /receipt/list
n: WiFi
num_hits: 3182
num_matches: 3
o: 中国移动
p: com.jinhui365.iphone-pay
path: /data/node-dev-tools/logs/iOS.log
s: 10.3.1
t: 1504442928.55
uid: 1861574
v: 5.15.0

1
2
content:"测试,测试,这个是个测试!message:${message},num_hits:${num_hits}"
发送的短信为:测试,测试,这个是个测试!message:request failed:0 /receipt/list,num_hits:3182

jira日志报警记录

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# 报警标题
alert_subject: "{0} {1} {2} {3}"
alert_subject_args:
- "c"
- "v"
- "l"
- "@timestamp"

# 报警方式
alert:
- "jira"

# 配置(规则文件中不写)
jira_server: "http://jira.jinhui365.cn"
jira_project: "ALERT"
jira_account_file: "/home/jhjr/jinhui/java/elaticalert/java_rules/jira_acct.txt"

# 目前只支持Task,受限于jira
jira_issuetype: "Task"

# jira issue优先级(默认普通级别。可选0-3,数字越小优先级越高)
jira_priority: 0

# 添加关注者(会有jira邮件发送到对应邮箱)
jira_watchers:
- xfwei
- yli

结构

安装

  1. 根据ElastAlert Server安装教程进行kibana插件和ElastAlert Server安装.
  2. 根据ElastAlert官方网站安装Elastalert.

配置

Elastalert Server

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
{
"appName": "elastalert-server",
"port": 3030, //指定指定elastalert server端口,为kibana插件提供服务
"elastalertPath": "/opt/elastalert/elastalert", //指定elastalert路径
"verbose": true,
"es_debug": true,
"debug": false,
"rulesPath": { //elastalert报警规则存储相对路径
"relative": true,
"path": "/rules"
},
"templatesPath": { //规则模板相对路径
"relative": true,
"path": "/rule_templates"
},
"dataPath": { //测试数据存储路径
"relative": true, folder.
"path": "/server_data"
}
}

详细配置见Elastalert Server 配置

ElastAlert

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
es_host: 10.0.0.219
es_port: 9200

# 规则文件夹
rules_folder: rules

# 查询频率
run_every:
minutes: 1

# 查询时间片(可覆盖)
buffer_time:
minutes: 60

# 邮箱配置
smtp_host: mail.rxhui.com
smtp_port: 25

smtp_auth_file: /opt/elastalert/config/smtp_auth_file.yaml
email_reply_to: monitor@rxhui.com
from_addr: monitor@rxhui.com

# 写回kibana中的索引
writeback_index: elastalert_status

# 重发机制
alert_time_limit:
days: 2

详细配置见Elastalert 配置

使用教程

打开方式

  1. 该功能嵌入至kibana中,打开kibana:http://10.0.0.219:5601左上角setting右侧有个展开按钮,可供选择进入elastalert功能。
  2. 直接输入http://10.0.0.219:5601/app/elastalert进入。

基本操作

操作动态图

  • + New Rule可添加一条新的规则
  • 点击规则可以删除和修改
  • 规则页面右侧有一些模板可以点击展示
  • 规则页面左上角退出,右上角分别为[测试],[保存],测试完成后右侧会有输出

注意

1.添加新的规则需测试后才能保存,若直接保存可能因规则错误导致监听停止。
2.添加新的规则如果不想继续添加,记得将目录页该规则删除。
3.若因错误操作导致监听停止,可删除错误规则文件后访问http://10.0.0.219:3030/status/control/start重新启动监听,通过http://10.0.0.219:3030/status查看监听状态

ElastAlert 配置和规则说明

ElastAlert 配置

参数  说明 备注
es_host  Elasticsearch的host地址
es_port Elasticsearch的端口号 默认为9200
rules_folder 规则文件夹的名称
run_every 用来设置定时向elasticsearch发送请求
buffer_time 用来设置请求李时间字段的范围
realert 设定报警后的一段时间内忽略报警 默认为1分钟,可以设置为0
query_delay 减去查询所花的时间
writeback_index elastalert产生的日志在elasticsearch中创建的索引
alert_time_limit 失败重试的时间设置
es_send_get_body_as 查询Elasticsearch的请求方式 默认为get

ElastAlert 规则

参数 说明 备注
name 规则名称 英文,不能包含中文
type 报警规则检查类型
alert 报警的方式
index 监视的索引
filter 检索的条件
realert 设置n时间内只警报一次
email 若报警有email方式,为收邮件的邮箱
aggregation 聚合日志,能够攒齐了一段时间的警告再上报。也可以用schedule定时间发送这一段时间的所有警告 可以考虑是否使用
import 可以引用公共部分 后续的规则多了之后考虑将公共部分抽出

报警类型

  • any:只要有匹配就报警;
  • blacklist:compare_key字段的内容匹配上 blacklist数组里任意内容;
  • whitelist:compare_key字段的内容一个都没能匹配上whitelist数组里内容;
  • frequency:在相同 query_key条件下,timeframe 范围内有num_events个被过滤出 来的异常;
  • change:在相同query_key条件下,compare_key字段的内容,在timeframe范围内 发生变化;
  • spike:在相同query_key条件下,前后两个timeframe范围内数据量相差比例超过spike_height。其中可以通过spike_type设置具体涨跌方向是up,down,both 。还可以通过threshold_ref设置要求上一个周期数据量的下限,threshold_cur设置要求当前周期数据量的下限,如果数据量不到下限,也不触发;
  • flatline:timeframe 范围内,数据量小于threshold 阈值;
  • new_term:fields字段新出现之前terms_window_size(默认30天)范围内最多的terms_size (默认50)个结果以外的数据;
  • cardinality:在相同 query_key条件下,timeframe范围内cardinality_field的值超过 max_cardinality 或者低于min_cardinality

报警方式

  • Command
  • email
  • jira
  • post

具体规则书写查看文章末尾规则模板

ElastAlert

查询方式

  • query_string
    查询

    1
    2
    3
    filter:
    - query_string:
    query: "username: bob"

    query_string类型和Lucene的查询规则一致,具体细节可查看Lucene Query
    也可以通过将kibana上面的json格式转化为yaml的格式查询

  • term
    精确匹配键值对

    1
    2
    3
    filter:
    - terms:
    field: ["value1", "value2"]
  • terms
    键值对匹配多个值

  • wildcard
    标准的 shell 通配符

  • range
    范围

    1
    2
    3
    4
    5
    filter:
    - range:
    status_code:
    from: 500
    to: 599
  • Negation, and, or
    与或非

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    filter:
    - or:
    - term:
    field: "value"
    - wildcard:
    field: "foo*bar"
    - and:
    - not:
    term:
    field: "value"
    - not:
    term:
    _type: "something"

以上规则在文档ElastAlert Filters中皆有详细描述

规则模板

复杂的query_string查询

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
# 时间标准
# seconds: 0~60
# minutes: 0~60
# hours: 0~60
# days: n
type: frequency

# 规则名称
name: android_log_test

# 轮询频率,run_every建议小于timeframe
run_every:
minutes: 1

# 轮转日志块,buffer_time大于timeframe,且建议为timeframe的整数倍(2-3倍)
buffer_time:
minutes: 15

# 触发标准,再timeframe时间内发生num_events数量的事件,触发报警
# 事件发生时间范围
timeframe:
minutes: 5
# 事件发生数量
num_events: 2

# 查询索引,参照kibana中的索引
index: android-*

# 检索条件
# 更详细的查询:http://lucene.apache.org/core/2_9_4/queryparsersyntax.html
# query: "message: \"load patch error\" AND message: lo?d"
# message=="load patch error" && message: "lo?d"

# + - && || ! ( ) { } [ ] ^ " ~ * ? : \
# 以上字符需转义

# 基本键值对查询
# query: "key: \"value\""
# 反例:query: "key: value1 value2 value3" 这将查询key==value1 || 包含value2的log || 包含value3的log

# 键值对模糊查询用通配符
# query:"key: v*e"

# 与或非
# key1 AND key2
# key1 OR key2
# NOT key1

# 分组查询: ()代表分组
# query: "title:(return AND \"pink panther\")" title=="return" && title="pink panther" (return可以用通配符,被引号引住的pink panther不能用通配)
# (key1 OR key2) AND key3


filter:
- query:
query_string:
query: "message: \"load patch error\" AND message: lo?d"

# 规定n个时间内不会多次收到相同日志,frequency方式下该字段建议定义为timeframe的整数倍
realert:
minutes: 5

# 报警方式
alert:
- "email"
# - "command"

# phoneList为,分割的手机号字符串,content为接收手机内容
# command: ["curl", "-X", "POST", "--header", "Content-Type: application/json", "--header", "Accept: */*", "http://adminhome.jinhui365.cn:8009/sendSms?phoneList=15235446827&content=测试"]

email:
- "yli@rxhui.com"
# - "15235446827@139.com"

与或非查询

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
# 时间标准
# seconds: 0~60
# minutes: 0~60
# hours: 0~60
# days: n
type: frequency

# 规则名称
name: android_log_test

# 轮询频率,run_every建议小于timeframe
run_every:
minutes: 1

# 轮转日志块,buffer_time大于timeframe,且建议为timeframe的整数倍(2-3倍)
buffer_time:
minutes: 15

# 触发标准,再timeframe时间内发生num_events数量的事件,触发报警
# 事件发生时间范围
timeframe:
minutes: 5
# 事件发生数量
num_events: 2

# 查询索引,参照kibana中的索引
index: android-*

# 检索条件
# 翻译kibana查询语句,特殊字符@要用“”包起来
# message=="text data message" && @version=="1" && (uid=="1111" || !v=="5.22.0")

filter:
- and:
- query:
match:
message:
query: "text data message"
type: "phrase"
- query:
match:
"@version":
query: "1"
type: "phrase"
- or:
- query:
match:
uid:
query: "1111"
type: "phrase"
- not:
- query:
match:
v:
query: "5.22.0"
type: "phrase"

# 规定n个时间内不会多次收到相同日志,frequency方式下该字段建议定义为timeframe的整数倍
realert:
minutes: 5

# 报警方式
alert:
- "email"
# - "command"

# phoneList为,分割的手机号字符串,content为接收手机内容
# command: ["curl", "-X", "POST", "--header", "Content-Type: application/json", "--header", "Accept: */*", "http://adminhome.jinhui365.cn:8009/sendSms?phoneList=15235446827&content=测试"]

email:
- "yli@rxhui.com"
# - "15235446827@139.com"

ElastAlert Server API

This server exposes the following REST API’s:

  • GET /

    Exposes the current version running

  • GET /status

    Returns either ‘SETUP’, ‘READY’, ‘ERROR’, ‘STARTING’, ‘CLOSING’, ‘FIRST_RUN’ or ‘IDLE’ depending on the current ElastAlert process status.

  • GET /status/control/:action

    Where :action can be either ‘start’ or ‘stop’, which will respectively start or stop the current ElastAlert process.

  • [WIP] GET /status/errors

    When /status returns ‘ERROR’ this returns a list of errors that were triggered.

  • GET /rules

    Returns a list of directories and rules that exist in the rulesPath (from the config) and are being run by the ElastAlert process.

  • GET /rules/:id

    Where :id is the id of the rule returned by GET /rules, which will return the file contents of that rule.

  • POST /rules/:id

    Where :id is the id of the rule returned by GET /rules, which will allow you to edit the rule. The body send should be:

    1
    2
    3
    4
    {
    // Required - The full yaml rule config.
    "yaml": "..."
    }
  • DELETE /rules/:id

    Where :id is the id of the rule returned by GET /rules, which will delete the given rule.

  • GET /templates

    Returns a list of directories and templates that exist in the templatesPath (from the config) and are being run by the ElastAlert process.

  • GET /templates/:id

    Where :id is the id of the template returned by GET /templates, which will return the file contents of that template.

  • POST /templates/:id

    Where :id is the id of the template returned by GET /templates, which will allow you to edit the template. The body send should be:

    1
    2
    3
    4
    {
    // Required - The full yaml template config.
    "yaml": "..."
    }
  • DELETE /templates/:id

    Where :id is the id of the template returned by GET /templates, which will delete the given template.

  • POST /test

    This allows you to test a rule. The body send should be:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
          {
    // Required - The full yaml rule config.
    "rule": "...",

    // Optional - The options to use for testing the rule.
    "options": {

    // Can be either "all", "schemaOnly" or "countOnly". "all" will give the full console output.
    // "schemaOnly" will only validate the yaml config. "countOnly" will only find the number of matching documents and list available fields.
    "testType": "all",

    // Can be any number larger than 0 and this tells ElastAlert over a period of how many days the test should be run
    "days": "1"

    // Whether to send real alerts
    "alert": false
    }
    }
    ```

    - **[WIP] GET `/config`**

    Gets the ElastAlert configuration from `config.yaml` in `elastalertPath` (from the config).

    - **[WIP] POST `/config`**

    Allows you to edit the ElastAlert configuration from `config.yaml` in `elastalertPath` (from the config). The required body to be send will be edited when the work on this API is done.

    ## ElastAlert监控规律
    ElastAlert根据config中的run_every设置的时间频率去轮询,每次查询的时间块都是buffer_time

    基本查询规律:
    配置: run_every:20s, buffer_time:1min

    当前时间1月17日9时启动监控
    当前时间 日志时间块
    9:00:00 8:59:00~9:00:00
    9:00:20 9:00:00~9:00:20
    9:00:40 9:00:00~9:00:40
    9:01:00 9:00:00~9:01:00
    9:01:20 9:01:00~9:01:20
    9:01:40 9:01:00~9:01:40
    9:02:00 9:01:00~9:02:00
    `

    传送门

  • Bitsensor博客网站
  • ElastAlert Kibana Plugin
  • Elastalert Server
  • Elastalert Github
  • ElastAlert官方网站