Distributed Tracing

Cloud Distributed Tracing System Design

创建时间:2020-01-29 15:39

字数:405 阅读:

1. Why we need it?
1. 1.1 Monolithis software
2. 2.2 Distributed System
2. Key Concepts
3. Implementation
4. Why we need distributed tracing
Reference

1. Why we need it?

Short answer - due to disturbuted applications.

1.1 Monolithis software

Monolithis software is build upon a large and sprawling legacy code base that is often so tightly coupled that any changes in one small section often result in breaking one or several features that depend on it. In such app, high possibly it’ll break and we need to use tech - tracing to follw the course of a request or system event from its source to its ultimate destination.

In this way, each trace comes to be a narrative that tells the request’s story as it travels through system.

2.2 Distributed System

Use distributed tracing to profile and monitor microservice-based apps/ architectures, locate failures, and improve performance.

2. Key Concepts

In general, distributed tracing start with a single request - the entity or event being traced. As the request makes its journey, it generates traces that record complete processing operations performed on it by entities within a distributed system/ network infrastructure.

Each trace is assigned with its own unique ID and passes through a segment that indicates a given activity that a host system performs on the request. Every segments represents a single step within the reqeust’s path and has a name, unique ID, and timestamp. A span(segment) can also carry additional metadata.

The idea is – specific request inflexion points mush be identified within a system and instrumented. All of the trace data mush be coordinated and collated to provide a meaningflow view of a request.

Challenge would be processing the volume of the data generated from increasingly large scale systems.

3. Implementation

Google created Dapper in the past as a middleware that supports using different language within the system. As said, the value of tracing is only realised through:

ubiquitous deployment, and no parts of the system under observation are not instrumented
continuous monitoring
- system mush be monitoring constantly

4. Why we need distributed tracing

Greg Linden commented in 2006 that experiments ran by Amazon.com demonstrated a significant drop in revenue was experienced when 100ms delay to page load was added. Although understanding the flow of a web request through a system can be challenging, there can be significant commercial gains if performance bottlenecks are identified and eliminated.

Reference

转载请注明来源，欢迎对文章中的引用来源进行考证，欢迎指出任何有错误或不够清晰的表达。可以在下面评论区评论，也可以邮件至 stone2paul@gmail.com