[Improve](StreamingJob) add more metrics to observe the streaming job by JNSimba · Pull Request #60493 · apache/doris

JNSimba · 2026-02-04T08:30:03Z

What problem does this PR solve?

Issue Number: close #xxx

Add more metrics to observe the streaming job:

Metrics	Module	Description
streaming_job_get_meta_latency	FE	Time spent fetching source metadata for streaming jobs
streaming_job_get_meta_count	FE	Number of times source metadata is fetched for streaming jobs
streaming_job_get_meta_fail_count	FE	Number of failures when fetching source metadata for streaming jobs
streaming_job_task_execute_time	FE	Total execution time of streaming job tasks
streaming_job_task_execute_count	FE	Total number of executed streaming job tasks
streaming_job_task_failed_count	FE	Total number of failed streaming job tasks
streaming_job_total_rows	FE	Total number of rows processed by streaming jobs
streaming_job_filter_rows	FE	Total number of rows filtered out by streaming jobs
streaming_job_load_bytes	FE	Total data volume loaded by streaming jobs (in bytes)

Release note

None

Check List (For Author)

Test
- Regression test
- Unit Test
- Manual test (add detailed scripts or steps below)
- No need to test or manual test. Explain why:
  - This is a refactor/code format and no logic has been changed.
  - Previous test can cover this change.
  - No code files have been changed.
  - Other reason
Behavior changed:
- No.
- Yes.
Does this need documentation?
- No.
- Yes.

Check List (For Reviewer who merge this PR)

Confirm the release note
Confirm test cases
Confirm document
Add branch pick label

hello-stephen · 2026-02-04T08:30:10Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

JNSimba · 2026-02-04T08:34:40Z

run buildall

Copilot

Pull request overview

Adds new FE-side metrics to improve observability of streaming insert jobs, along with a regression test that validates the metrics are exposed via the FE /metrics endpoint.

Changes:

Registers new streaming job counter metrics in MetricRepo and adds streaming job state gauges.
Increments the new counters from StreamingInsertJob lifecycle points (meta fetch, task success/failure, offset commit).
Adds a MySQL CDC regression test that polls FE metrics until all expected streaming-job metrics are present.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File	Description
regression-test/suites/job_p0/streaming_job/cdc/test_streaming_mysql_job_metrics.groovy	New regression test validating FE exports the expected streaming job metrics.
fe/fe-core/src/main/java/org/apache/doris/metric/MetricRepo.java	Registers streaming job counters and adds streaming job state gauge metrics.
fe/fe-core/src/main/java/org/apache/doris/job/extensions/insert/streaming/StreamingInsertJob.java	Emits streaming job metric increments during meta fetch, task completion, and offset/stat updates.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

fe/fe-core/src/main/java/org/apache/doris/metric/MetricRepo.java

regression-test/suites/job_p0/streaming_job/cdc/test_streaming_mysql_job_metrics.groovy

...-core/src/main/java/org/apache/doris/job/extensions/insert/streaming/StreamingInsertJob.java

fe/fe-core/src/main/java/org/apache/doris/metric/MetricRepo.java

JNSimba · 2026-02-05T06:34:10Z

run buildall

doris-robot · 2026-02-05T07:13:24Z

TPC-H: Total hot run time: 31401 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 2e260d1c5e99a99bd35c76185fb8945060717a76, data reload: false

------ Round 1 ----------------------------------
q1	17643	4450	4288	4288
q2	1998	347	235	235
q3	10159	1259	711	711
q4	10218	891	317	317
q5	7540	2146	1930	1930
q6	196	175	143	143
q7	854	743	609	609
q8	9261	1384	1098	1098
q9	5061	4854	4812	4812
q10	6832	1938	1539	1539
q11	491	302	285	285
q12	337	380	236	236
q13	17786	4015	3250	3250
q14	232	240	221	221
q15	877	799	810	799
q16	686	675	617	617
q17	632	776	485	485
q18	6745	6467	6935	6467
q19	1379	1023	681	681
q20	411	396	282	282
q21	2936	2271	2108	2108
q22	382	333	288	288
Total cold run time: 102656 ms
Total hot run time: 31401 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4654	4475	4556	4475
q2	273	338	268	268
q3	2388	2884	2514	2514
q4	1433	1893	1407	1407
q5	4750	4547	4683	4547
q6	246	190	150	150
q7	2077	1935	1757	1757
q8	2575	2460	2426	2426
q9	7744	7456	7322	7322
q10	2860	3085	2562	2562
q11	544	481	481	481
q12	732	712	566	566
q13	3599	4017	3206	3206
q14	266	277	254	254
q15	818	785	785	785
q16	651	684	641	641
q17	1070	1242	1291	1242
q18	7477	7235	7225	7225
q19	853	806	802	802
q20	1955	2042	1868	1868
q21	4528	4220	4145	4145
q22	577	544	508	508
Total cold run time: 52070 ms
Total hot run time: 49151 ms

doris-robot · 2026-02-05T07:30:10Z

ClickBench: Total hot run time: 29.11 s

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 2e260d1c5e99a99bd35c76185fb8945060717a76, data reload: false

query1	0.05	0.05	0.06
query2	0.09	0.04	0.05
query3	0.26	0.08	0.08
query4	1.60	0.11	0.11
query5	0.27	0.25	0.27
query6	1.16	0.66	0.66
query7	0.03	0.03	0.02
query8	0.05	0.04	0.04
query9	0.55	0.52	0.47
query10	0.56	0.56	0.53
query11	0.14	0.10	0.09
query12	0.14	0.11	0.11
query13	0.63	0.61	0.63
query14	1.07	1.06	1.05
query15	0.87	0.85	0.88
query16	0.41	0.40	0.43
query17	1.10	1.08	1.11
query18	0.21	0.22	0.21
query19	2.03	1.97	2.02
query20	0.02	0.01	0.01
query21	15.42	0.28	0.15
query22	5.11	0.05	0.06
query23	15.93	0.28	0.11
query24	0.99	1.25	1.07
query25	0.13	0.18	0.20
query26	0.14	0.14	0.14
query27	0.07	0.06	0.05
query28	5.39	1.12	0.96
query29	12.55	3.94	3.15
query30	0.28	0.13	0.11
query31	2.82	0.66	0.40
query32	3.23	0.60	0.49
query33	3.23	3.29	3.24
query34	15.86	5.40	4.75
query35	4.86	4.74	4.86
query36	0.65	0.50	0.49
query37	0.11	0.07	0.07
query38	0.08	0.04	0.04
query39	0.05	0.03	0.03
query40	0.19	0.16	0.16
query41	0.09	0.03	0.03
query42	0.04	0.03	0.03
query43	0.05	0.04	0.04
Total cold run time: 98.51 s
Total hot run time: 29.11 s

hello-stephen · 2026-02-05T07:38:17Z

FE UT Coverage Report

Increment line coverage 53.33% (32/60) 🎉
Increment coverage report
Complete coverage report

JNSimba added 2 commits February 4, 2026 16:24

add metric for streaming job

33bf186

fix

8034de8

fix

b73d998

JNSimba added the dev/4.0.x label Feb 4, 2026

JNSimba requested a review from Copilot February 4, 2026 08:38

Copilot started reviewing on behalf of JNSimba February 4, 2026 08:39 View session

Copilot AI reviewed Feb 4, 2026

View reviewed changes

JNSimba added 2 commits February 4, 2026 17:43

fix

b09ef16

Merge branch 'master-new' into add_streamingjob_metric

2e260d1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Improve](StreamingJob) add more metrics to observe the streaming job#60493

[Improve](StreamingJob) add more metrics to observe the streaming job#60493
JNSimba wants to merge 5 commits intoapache:masterfrom
JNSimba:add_streamingjob_metric

JNSimba commented Feb 4, 2026 •

edited

Loading

Uh oh!

hello-stephen commented Feb 4, 2026

Uh oh!

JNSimba commented Feb 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JNSimba commented Feb 5, 2026

Uh oh!

doris-robot commented Feb 5, 2026

Uh oh!

doris-robot commented Feb 5, 2026

Uh oh!

hello-stephen commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

JNSimba commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

Release note

Check List (For Author)

Check List (For Reviewer who merge this PR)

Uh oh!

hello-stephen commented Feb 4, 2026

Uh oh!

JNSimba commented Feb 4, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JNSimba commented Feb 5, 2026

Uh oh!

doris-robot commented Feb 5, 2026

Uh oh!

doris-robot commented Feb 5, 2026

Uh oh!

hello-stephen commented Feb 5, 2026

FE UT Coverage Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JNSimba commented Feb 4, 2026 •

edited

Loading