Skip to content

[Opt](load) Support using AES_ENCRYPT with internal custom ENCRYPTKEY in stream load and routine load#60503

Open
bobhan1 wants to merge 2 commits intoapache:masterfrom
bobhan1:stream-load-AES_ENCRYPT-key-ref
Open

[Opt](load) Support using AES_ENCRYPT with internal custom ENCRYPTKEY in stream load and routine load#60503
bobhan1 wants to merge 2 commits intoapache:masterfrom
bobhan1:stream-load-AES_ENCRYPT-key-ref

Conversation

@bobhan1
Copy link
Contributor

@bobhan1 bobhan1 commented Feb 4, 2026

What problem does this PR solve?

Summary

Support using CREATE ENCRYPTKEY defined custom keys (via KEY db.keyname syntax) in the columns parameter of Stream Load and Routine Load.

Problem:
Previously, EncryptKeyRef was only folded to the actual key string during the FoldConstantRule phase, which was skipped in load planning (setDebugSkipFoldConstant(true)), causing a Could not find function encryptkeyref error on BE.

Solution:

  • Add RewriteEncryptKeyRef rule in NereidsLoadUtils to fold EncryptKeyRef before ExpressionNormalization, bypassing the skip-fold-constant flag
  • Fix Routine Load privilege check failure by setting CurrentUserIdentity in non-cloud mode during RoutineLoadJob.plan()

Changes

FE Core

  • NereidsLoadUtils.java: Add RewriteEncryptKeyRef inner class that directly invokes FoldConstantRuleOnFE.VISITOR_INSTANCE to fold EncryptKeyRef to StringLiteral, placed before ExpressionNormalization in the analyzer pipeline
  • RoutineLoadJob.java: Set ConnectContext.setCurrentUserIdentity() in non-cloud mode path of plan() method, so that privilege checks during expression rewrite have proper user identity

Regression Tests

  • test_stream_load_with_aes_encrypt.groovy: Tests AES_ENCRYPT in Stream Load columns with both direct key string and KEY db.keyname syntax
  • test_routine_load_with_aes_encrypt.groovy: Tests AES_ENCRYPT in Routine Load columns with both direct key string and KEY db.keyname syntax

Example Usage

-- Create custom encrypt key
CREATE ENCRYPTKEY my_key AS "ABCD123456789";

-- Stream Load with ENCRYPTKEY in columns parameter
curl --location-trusted -u root: \
  -H "columns: id, name, tmp_data, encrypted_data=TO_BASE64(AES_ENCRYPT(tmp_data, KEY db.my_key))" \
  -T data.csv \
  http://host:port/api/db/table/_stream_load

-- Routine Load with ENCRYPTKEY in columns parameter
CREATE ROUTINE LOAD job ON table
COLUMNS(id, name, tmp_data, encrypted_data=TO_BASE64(AES_ENCRYPT(tmp_data, KEY db.my_key)))
FROM KAFKA (...);

Test Plan

  • Run test_stream_load_with_aes_encrypt to verify Stream Load with AES_ENCRYPT using both direct key and ENCRYPTKEY
  • Run test_routine_load_with_aes_encrypt to verify Routine Load with AES_ENCRYPT using both direct key and ENCRYPTKEY
  • Verify decrypted data matches original plaintext
  • Verify existing Stream Load and Routine Load tests are not affected

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@bobhan1
Copy link
Contributor Author

bobhan1 commented Feb 4, 2026

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 31980 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 03871be0a85bcaba5e52087f6ab5b31d8188352f, data reload: false

------ Round 1 ----------------------------------
q1	17642	5230	5102	5102
q2	2067	311	203	203
q3	10174	1288	765	765
q4	10226	887	333	333
q5	7519	2182	1922	1922
q6	203	183	149	149
q7	884	732	610	610
q8	9287	1354	1074	1074
q9	5333	4834	4924	4834
q10	6781	1951	1575	1575
q11	531	272	274	272
q12	337	371	223	223
q13	17777	4061	3230	3230
q14	239	238	219	219
q15	926	836	805	805
q16	660	669	626	626
q17	630	778	554	554
q18	6704	6565	6505	6505
q19	1263	1007	599	599
q20	394	342	233	233
q21	2632	2124	1874	1874
q22	359	308	273	273
Total cold run time: 102568 ms
Total hot run time: 31980 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5319	5325	5307	5307
q2	253	343	260	260
q3	2173	2710	2289	2289
q4	1325	1725	1284	1284
q5	4262	4164	4185	4164
q6	218	186	142	142
q7	2274	2207	1922	1922
q8	2628	2436	2383	2383
q9	7473	7517	7627	7517
q10	2779	2997	2704	2704
q11	556	509	449	449
q12	658	735	687	687
q13	3925	4347	3620	3620
q14	303	347	303	303
q15	854	802	825	802
q16	689	733	702	702
q17	1139	1379	1316	1316
q18	8318	8185	8148	8148
q19	927	868	845	845
q20	2056	2204	2102	2102
q21	4869	4491	4366	4366
q22	601	583	517	517
Total cold run time: 53599 ms
Total hot run time: 51829 ms

gavinchou
gavinchou previously approved these changes Feb 4, 2026
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Feb 4, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Feb 4, 2026

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 4, 2026

PR approved by anyone and no changes requested.

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Feb 4, 2026
@doris-robot
Copy link

ClickBench: Total hot run time: 28.75 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 03871be0a85bcaba5e52087f6ab5b31d8188352f, data reload: false

query1	0.05	0.04	0.05
query2	0.09	0.04	0.04
query3	0.25	0.08	0.08
query4	1.60	0.11	0.11
query5	0.27	0.25	0.28
query6	1.18	0.67	0.66
query7	0.04	0.03	0.03
query8	0.06	0.04	0.04
query9	0.57	0.50	0.49
query10	0.54	0.56	0.55
query11	0.15	0.10	0.10
query12	0.14	0.11	0.10
query13	0.64	0.61	0.61
query14	1.06	1.06	1.07
query15	0.86	0.86	0.87
query16	0.38	0.43	0.39
query17	1.12	1.12	1.14
query18	0.22	0.21	0.21
query19	2.12	2.01	2.04
query20	0.01	0.01	0.02
query21	15.44	0.28	0.15
query22	5.16	0.06	0.05
query23	15.87	0.29	0.11
query24	1.51	0.62	1.54
query25	0.10	0.08	0.06
query26	0.14	0.14	0.14
query27	0.07	0.07	0.06
query28	5.00	1.13	0.96
query29	12.56	3.96	3.18
query30	0.28	0.15	0.13
query31	2.83	0.65	0.41
query32	3.23	0.60	0.52
query33	3.22	3.19	3.34
query34	16.08	5.37	4.75
query35	4.81	4.79	4.80
query36	0.64	0.49	0.52
query37	0.11	0.07	0.07
query38	0.07	0.05	0.04
query39	0.05	0.03	0.03
query40	0.19	0.17	0.15
query41	0.09	0.03	0.04
query42	0.05	0.03	0.03
query43	0.05	0.03	0.04
Total cold run time: 98.9 s
Total hot run time: 28.75 s

@bobhan1 bobhan1 changed the title [Opt](load) Support using AES_ENCRYPT with internal custom ENCRYPTKEY in stream load [Opt](load) Support using AES_ENCRYPT with internal custom ENCRYPTKEY in stream load and routine load Feb 4, 2026
@bobhan1
Copy link
Contributor Author

bobhan1 commented Feb 4, 2026

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 31793 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit dbda5666773efc903e15fae0feeff946ab03aa94, data reload: false

------ Round 1 ----------------------------------
q1	17657	5424	5112	5112
q2	2026	316	213	213
q3	10222	1306	740	740
q4	10194	822	318	318
q5	7534	2148	1881	1881
q6	193	179	148	148
q7	891	754	602	602
q8	9269	1371	1103	1103
q9	5337	4826	4757	4757
q10	6811	1952	1566	1566
q11	497	295	265	265
q12	331	373	223	223
q13	17796	4021	3239	3239
q14	232	249	216	216
q15	923	839	809	809
q16	668	670	623	623
q17	631	774	501	501
q18	6693	6661	6491	6491
q19	1246	1028	616	616
q20	405	344	241	241
q21	2881	2170	1854	1854
q22	363	314	275	275
Total cold run time: 102800 ms
Total hot run time: 31793 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5206	5286	5362	5286
q2	249	337	263	263
q3	2181	2662	2262	2262
q4	1355	1724	1309	1309
q5	4281	4116	4151	4116
q6	216	189	136	136
q7	2285	2087	1824	1824
q8	2645	2471	2427	2427
q9	7693	7461	7450	7450
q10	2894	3032	2609	2609
q11	515	473	450	450
q12	675	712	617	617
q13	3908	4526	3494	3494
q14	310	339	297	297
q15	896	844	824	824
q16	682	723	722	722
q17	1198	1337	1378	1337
q18	8255	8063	7860	7860
q19	888	865	848	848
q20	2142	2267	2088	2088
q21	4803	4200	4095	4095
q22	549	540	507	507
Total cold run time: 53826 ms
Total hot run time: 50821 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 28.51 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit dbda5666773efc903e15fae0feeff946ab03aa94, data reload: false

query1	0.06	0.05	0.05
query2	0.10	0.04	0.05
query3	0.26	0.09	0.08
query4	1.61	0.11	0.11
query5	0.28	0.25	0.25
query6	1.17	0.68	0.68
query7	0.04	0.03	0.02
query8	0.05	0.03	0.04
query9	0.58	0.51	0.49
query10	0.54	0.54	0.55
query11	0.15	0.10	0.10
query12	0.14	0.11	0.10
query13	0.64	0.61	0.61
query14	1.07	1.06	1.08
query15	0.86	0.86	0.87
query16	0.40	0.39	0.39
query17	1.10	1.15	1.16
query18	0.23	0.22	0.22
query19	2.07	2.02	2.04
query20	0.01	0.02	0.01
query21	15.42	0.29	0.16
query22	5.14	0.06	0.05
query23	16.05	0.30	0.11
query24	1.43	0.41	0.20
query25	0.11	0.09	0.06
query26	0.13	0.13	0.13
query27	0.06	0.06	0.06
query28	4.28	1.15	0.97
query29	12.63	3.89	3.19
query30	0.27	0.13	0.12
query31	2.82	0.65	0.41
query32	3.23	0.60	0.49
query33	3.29	3.30	3.23
query34	16.35	5.48	4.83
query35	4.85	4.84	4.83
query36	0.66	0.50	0.48
query37	0.11	0.07	0.06
query38	0.07	0.04	0.04
query39	0.05	0.03	0.03
query40	0.20	0.16	0.15
query41	0.09	0.04	0.03
query42	0.05	0.03	0.04
query43	0.05	0.04	0.04
Total cold run time: 98.7 s
Total hot run time: 28.51 s

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 0.00% (0/5) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 100.00% (5/5) 🎉
Increment coverage report
Complete coverage report

@bobhan1 bobhan1 force-pushed the stream-load-AES_ENCRYPT-key-ref branch from dbda566 to a89c3c1 Compare February 5, 2026 01:59
@bobhan1
Copy link
Contributor Author

bobhan1 commented Feb 5, 2026

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 31149 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit a89c3c1e528c2e585a57404c4e7355664a0c8d54, data reload: false

------ Round 1 ----------------------------------
q1	17633	4417	4296	4296
q2	2008	340	248	248
q3	10155	1261	730	730
q4	10202	777	304	304
q5	7534	2157	1875	1875
q6	191	178	144	144
q7	874	708	575	575
q8	9268	1363	1074	1074
q9	5284	4894	4913	4894
q10	6814	1969	1577	1577
q11	516	298	281	281
q12	334	369	220	220
q13	17787	4047	3231	3231
q14	231	232	213	213
q15	891	820	804	804
q16	668	672	627	627
q17	626	749	518	518
q18	6794	6658	6427	6427
q19	1233	1000	632	632
q20	400	354	249	249
q21	2576	1983	1954	1954
q22	358	320	276	276
Total cold run time: 102377 ms
Total hot run time: 31149 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4329	4336	4367	4336
q2	273	334	254	254
q3	2162	2651	2304	2304
q4	1372	1758	1324	1324
q5	4409	4239	4162	4162
q6	226	183	137	137
q7	1866	1781	1724	1724
q8	2802	2421	2481	2421
q9	7579	7475	7303	7303
q10	2810	3079	2603	2603
q11	574	501	454	454
q12	701	773	617	617
q13	3964	4425	3682	3682
q14	426	362	281	281
q15	849	807	807	807
q16	670	713	678	678
q17	1142	1385	1379	1379
q18	8220	8152	7717	7717
q19	855	885	928	885
q20	2029	2112	1993	1993
q21	4811	4470	4094	4094
q22	586	529	498	498
Total cold run time: 52655 ms
Total hot run time: 49653 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 28.76 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit a89c3c1e528c2e585a57404c4e7355664a0c8d54, data reload: false

query1	0.05	0.04	0.05
query2	0.10	0.05	0.05
query3	0.26	0.08	0.08
query4	1.61	0.11	0.11
query5	0.26	0.26	0.25
query6	1.17	0.70	0.67
query7	0.03	0.02	0.02
query8	0.06	0.04	0.05
query9	0.56	0.51	0.49
query10	0.56	0.56	0.55
query11	0.14	0.10	0.10
query12	0.15	0.11	0.11
query13	0.62	0.62	0.62
query14	1.05	1.08	1.03
query15	0.88	0.85	0.88
query16	0.43	0.41	0.40
query17	1.14	1.18	1.17
query18	0.23	0.22	0.21
query19	2.08	2.02	2.06
query20	0.02	0.02	0.01
query21	15.39	0.24	0.14
query22	5.32	0.05	0.06
query23	16.04	0.30	0.11
query24	0.92	0.70	0.69
query25	0.10	0.20	0.07
query26	0.14	0.15	0.13
query27	0.06	0.09	0.06
query28	4.78	1.16	0.97
query29	12.54	3.94	3.16
query30	0.29	0.14	0.12
query31	2.83	0.64	0.41
query32	3.24	0.59	0.48
query33	3.28	3.21	3.27
query34	16.51	5.38	4.68
query35	4.78	4.79	4.76
query36	0.65	0.50	0.50
query37	0.11	0.07	0.07
query38	0.07	0.05	0.04
query39	0.05	0.03	0.03
query40	0.19	0.18	0.17
query41	0.08	0.03	0.03
query42	0.04	0.03	0.03
query43	0.05	0.03	0.03
Total cold run time: 98.86 s
Total hot run time: 28.76 s

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 100.00% (5/5) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants