SLO & SLI

时间：2020-08-13 00:01:06 阅读：81 评论：0 收藏：0 [点我收藏+]

Error Budgets includes:

releasing new feature
expected system change
inevitable failure in hardware, network etc
planned downtime
risky experiment

share responsibility for reliability between Ops and Dev teams.
reduce feature iteration speed when our systems are unreliable.

Availability SLI

The proportion of valid requests served successfully.

One commonly used signifier of success or failure is the status code of an HTTP or RPC response. This requires careful, accurate use of status codes within your system so that each code maps distinctly to either success or failure.

A reasonable strategy here is to write that complex logic as code and export a boolean availability measure to your SLO monitoring systems, for use in a bad-minute style SLI like the example above.

Measuring SLI:

Application-level Metrics

Pros	Cons
Often fast and cheap (in terms of engineering time) to add new metrics. Complex logic to derive an SLI implementation can be turned into code and exported as two, much simpler, "good events" and "total events" counters.	Application servers are unable to see requests that do not reach them. Measuring overall performance of multi-request user journeys is difficult if application servers are stateless.

Logs Processing

rocessing server-side logs of requests or data to generate SLI metrics.

Pros	Cons
Existing request logs can be processed retroactively to backfill SLI metrics. Complex user journeys can be reconstructed using session identifiers. Complex logic to derive an SLI implementation can be turned into code and exported as two, much simpler, "good events" and "total events" counters.	Application logs do not contain requests that did not reach servers. Processing latency makes logs-based SLIs unsuitable for triggering an operational response. Engineering effort is needed to generate SLIs from logs; session reconstruction can be time-consuming.

Front-end infrastructure metrcis

Pros	Cons
Metrics and recent historical data most likely already exist, so this option probably requires the least engineering effort to get started. Measures SLIs at the point closest to the user still within serving infrastructure.	Not viable for data processing SLIs or, in fact, any SLIs with complex requirements. Only measure approximate performance of multi-request user journeys.

Probers

Pros	Cons
Synthetic clients can measure all steps of a multi-request user journey. Sending requests from outside your infrastructure captures more of the overall request path in the SLI.	Approximates user experience with synthetic requests. Covering all corner cases is hard and can devolve into integration testing. High reliability targets require frequent probing for accurate measurement. Probe traffic can drown out real traffic.

SLO & SLI

原文：https://www.cnblogs.com/anyu686/p/13493016.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)