Skip to content

[refactor](paimon) Per-catalog Paimon metadata cache with two-level table+snapshot structure#60478

Open
suxiaogang223 wants to merge 11 commits intoapache:masterfrom
suxiaogang223:refac-paimon-meta-cache
Open

[refactor](paimon) Per-catalog Paimon metadata cache with two-level table+snapshot structure#60478
suxiaogang223 wants to merge 11 commits intoapache:masterfrom
suxiaogang223:refac-paimon-meta-cache

Conversation

@suxiaogang223
Copy link
Contributor

What problem does this PR solve?

Summary

Refactor Paimon metadata cache from a single global instance to per-catalog instances,
introduce a two-level Table+Snapshot cache structure, and unify TTL resolution logic
across Iceberg/Paimon/Schema caches.

Motivation

The previous design shared a single PaimonMetadataCache instance and a single global
snapshotCache across all Paimon catalogs. This caused:

  • Different catalogs could not independently configure cache TTL.
  • Cache keys had to carry catalogId for isolation; invalidation required scanning all
    keys and filtering.
  • PaimonExternalTable eagerly fetched the Table object at construction time, incurring
    remote calls even when the table was never subsequently accessed.

Changes

Per-catalog cache instantiation

  • Introduce CatalogScopedCacheMgr<T>, a generic catalog-keyed cache manager backed by
    ConcurrentHashMap.computeIfAbsent.
  • PaimonMetadataCacheMgr now extends CatalogScopedCacheMgr<PaimonMetadataCache>;
    each catalog owns an independent PaimonMetadataCache.
  • Migrate Iceberg's icebergCacheMap (previously hand-rolled double-checked locking)
    in ExternalMetaCacheMgr to the same CatalogScopedCacheMgr.

Two-level table + snapshot cache

  • Replace the single snapshotCache with tableCache
    (key: PaimonTableCacheKey, value: PaimonTableCacheValue).
  • PaimonTableCacheValue holds the Paimon Table object and lazy-loads
    PaimonSnapshotCacheValue via double-checked locking.
    • The Table object is managed by Caffeine LoadingCache, subject to TTL/maxSize.
      When TTL expires, Caffeine creates a new PaimonTableCacheValue, and the snapshot
      is re-lazily-loaded on next access.
    • Normal queries hit tableCache directly; MTMV queries go through the explicit
      snapshot path; branch queries call the Paimon catalog directly (branches have
      independent schemas, not suitable for the main table cache).

Unified TTL resolution

  • Extract ExternalCatalog.resolveCacheTtlSpec() to centralize TTL property parsing:
    • null → use global default (external_cache_expire_time_seconds_after_access)
    • -1 → no expiry (Caffeine does not set expireAfterAccess)
    • 0 → disable cache (maxSize=0, Caffeine evicts immediately)
    • >0 → use as expireAfterAccess
  • Applied uniformly to IcebergMetadataCache, PaimonMetadataCache, and
    ExternalSchemaCache.
  • Add paimon.table.meta.cache.ttl-second catalog property with validation in
    checkProperties(). ALTER CATALOG SET triggers cache reinitialization via
    notifyPropertiesUpdated(), consistent with Iceberg's existing pattern.

Lazy table access in PaimonExternalTable

  • Remove the eagerly-loaded paimonTable field from the constructor.
  • All Table object access now goes through PaimonUtils
    PaimonMetadataCache.tableCache, making it lazy and cache-aware.
  • Introduce PaimonUtils as the centralized static accessor for Paimon cache
    operations, simplifying call sites.

Iceberg invalidation fast path

  • IcebergMetadataCache.invalidateTableCache() now attempts a direct key lookup
    (getIfPresent) first. On hit, invalidate immediately; on miss, fall back to full
    cache scan matching by local name. Avoids unnecessary iteration on the common path.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@suxiaogang223
Copy link
Contributor Author

run external

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 76.32% (145/190) 🎉
Increment coverage report
Complete coverage report

@suxiaogang223
Copy link
Contributor Author

run external

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 79.93% (235/294) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants