Why Dr.Elephant?
- Most of Hadoop optimization tools out there, but they are focused on simplifying the deploy and managment of Hadoop clusters.
- Very few tools are designed to help Hadoop users optimize their flows.
- Dr.Elephant supports Hadoop with a variety of frameworks and can be easily extended to newer frameworks.
- You can plugin and configure as many custom heuristics as you like.
- It is designed to help the users of Hadoop and Spark understand the internals of their flow and to help them tune their jobs easily.
Key Features
- Pluggable and configurable rule-based heuristics that diagnose a job;
- Out-of-the-box integration with Azkaban scheduler and support for adding any other Hadoop scheduler, such as Oozie;
- Representation of historic performance of jobs and flows;
- Job-level comparison of flows;
- Diagnostic heuristics for MapReduce and Spark;
- Easily extensible to newer job types, applications, and schedulers;
- REST API to fetch all the information.
How does it work?
- Dr. Elephant gets a list of all recent succeeded and failed applications, at regular intervals, from the YARN resource manager.
- The metadata for each application—namely, the job counters, configurations, and the task data—are fetched from the Job History server.
- Dr. Elephant runs a set of heuristics on them and generates a diagnostic report on how the individual heuristics and the job as a whole performed.
- These are then tagged with one of five severity levels, to indicate potential performance problems.
Sample Usage
- Once a job completes, it can be found in the Dashboard.
- The color Red means the job is in critical state and requires tuning while Green means the job is running efficiently. As follow

- And u can click into the app to get the complete report, including details on each of the individual heuristics and a link, [Explain], which provides suggestions on how to tune the job to improve that heuristic.
Getting Started
User Guide
FYI
<Dr.Elephant>
原文:http://www.cnblogs.com/wttttt/p/7203099.html