language: Make `TreeSitterData` only shared between snapshots of the same version #44198

Veykril · 2025-12-05T09:04:47Z

Currently we have a single cache for this data shared between all snapshots which is incorrect, as we might update the cache to a new version while having old snapshots around which then may try to access new data with old offsets/rows.

Release Notes:

N/A or Added/Fixed/Improved ...

…same version

SomeoneToIgnore

Nice, much more correct way to fill this cache.

SomeoneToIgnore · 2025-12-05T09:37:47Z

crates/language/src/buffer.rs

-            }
-            None => HashSet::default(),
-        };
+        let known_chunks = known_chunks.cloned().unwrap_or_default();


NIT: at this state, seems more appropriate to accept by value these new known_chunks?

SomeoneToIgnore · 2025-12-05T09:53:42Z

crates/language/src/buffer.rs

 pub struct TreeSitterData {
    chunks: RowChunks,
-    brackets_by_chunks: Vec<Option<Vec<BracketMatch<usize>>>>,
+    brackets_by_chunks: Mutex<Vec<Option<Vec<BracketMatch<usize>>>>>,


The way I understand the data flow now:

we only start with clean data and clean it on each tree-sitter reparsed event

each snapshot gets an Arc<TreeSitterData>, the same one, with the field they want to access + clone, or initialize once + clone

No blocks and mutex are needed for that, it seems?
I wonder if this could be a Vec<OnceCell<Vec<BracketMatch<usize>>>> given that we're synchronous around this code in the near future?

SomeoneToIgnore · 2025-12-05T09:55:53Z

crates/language/src/buffer.rs

-        self.tree_sitter_data.lock().clear();
+        let snapshot = self.text.snapshot();
+        match Arc::get_mut(&mut self.tree_sitter_data) {
+            Some(tree_sitter_data) => tree_sitter_data.clear(snapshot),


Seems that clear is used once here and needs the same parameters to do the same thing new does.
The get_mut trick is neat, but seems that we can just always do

let tree_sitter_data = TreeSitterData::new(snapshot); self.tree_sitter_data = Arc::new(tree_sitter_data);

, remove clear along with this match and keep things simpler?

SomeoneToIgnore · 2025-12-05T10:06:07Z

crates/language/src/buffer/row_chunk.rs

        }
    }

-    pub fn version(&self) -> &Global {


With this and latest_tree_sitter_data gone, my biggest concern now is that now we technically have more chances to have an outdated snapshot.

Now, without the version, I wonder whether we need to store this at all?
Looking at the usages:

impl std::fmt::Debug for RowChunks

Does not hurt to know the version these chunks are for, but it's accessible from their parent buffer/snapshot

chunk_range

For that, we can precompute the ranges on creation.

applicable_chunks

Here it can explode due to outdated snapshot, but it's used in a very stupid way: it converts input anchors into points.
We can require callers to do that with their correct snapshots.
I still wonder if the version check has to happen in this case somehow...

cla-bot bot added the cla-signed The user has signed the Contributor License Agreement label Dec 5, 2025

Veykril requested a review from SomeoneToIgnore December 5, 2025 09:04

Veykril assigned SomeoneToIgnore and Veykril Dec 5, 2025

Veykril force-pushed the push-tkpupqwyysxm branch from f769a27 to c96c5ed Compare December 5, 2025 09:06

Veykril marked this pull request as draft December 5, 2025 09:12

language: Make TreeSitterData only shared between snapshots of the …

e937058

…same version

Veykril force-pushed the push-tkpupqwyysxm branch from c96c5ed to e937058 Compare December 5, 2025 10:04

SomeoneToIgnore approved these changes Dec 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

language: Make `TreeSitterData` only shared between snapshots of the same version #44198

language: Make `TreeSitterData` only shared between snapshots of the same version #44198

Veykril commented Dec 5, 2025

Uh oh!

SomeoneToIgnore left a comment

Uh oh!

SomeoneToIgnore Dec 5, 2025

Uh oh!

SomeoneToIgnore Dec 5, 2025

Uh oh!

SomeoneToIgnore Dec 5, 2025

Uh oh!

SomeoneToIgnore Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

language: Make TreeSitterData only shared between snapshots of the same version #44198

Are you sure you want to change the base?

language: Make TreeSitterData only shared between snapshots of the same version #44198

Conversation

Veykril commented Dec 5, 2025

Uh oh!

SomeoneToIgnore left a comment

Choose a reason for hiding this comment

Uh oh!

SomeoneToIgnore Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

SomeoneToIgnore Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

SomeoneToIgnore Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

SomeoneToIgnore Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

language: Make `TreeSitterData` only shared between snapshots of the same version #44198

language: Make `TreeSitterData` only shared between snapshots of the same version #44198