Skip to content

Conversation

@Veykril
Copy link
Member

@Veykril Veykril commented Dec 5, 2025

Currently we have a single cache for this data shared between all snapshots which is incorrect, as we might update the cache to a new version while having old snapshots around which then may try to access new data with old offsets/rows.

Release Notes:

  • N/A or Added/Fixed/Improved ...

@cla-bot cla-bot bot added the cla-signed The user has signed the Contributor License Agreement label Dec 5, 2025
@Veykril Veykril marked this pull request as draft December 5, 2025 09:12
Copy link
Contributor

@SomeoneToIgnore SomeoneToIgnore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, much more correct way to fill this cache.

}
None => HashSet::default(),
};
let known_chunks = known_chunks.cloned().unwrap_or_default();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: at this state, seems more appropriate to accept by value these new known_chunks?

pub struct TreeSitterData {
chunks: RowChunks,
brackets_by_chunks: Vec<Option<Vec<BracketMatch<usize>>>>,
brackets_by_chunks: Mutex<Vec<Option<Vec<BracketMatch<usize>>>>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I understand the data flow now:

  • we only start with clean data and clean it on each tree-sitter reparsed event

  • each snapshot gets an Arc<TreeSitterData>, the same one, with the field they want to access + clone, or initialize once + clone

No blocks and mutex are needed for that, it seems?
I wonder if this could be a Vec<OnceCell<Vec<BracketMatch<usize>>>> given that we're synchronous around this code in the near future?

self.tree_sitter_data.lock().clear();
let snapshot = self.text.snapshot();
match Arc::get_mut(&mut self.tree_sitter_data) {
Some(tree_sitter_data) => tree_sitter_data.clear(snapshot),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that clear is used once here and needs the same parameters to do the same thing new does.
The get_mut trick is neat, but seems that we can just always do

let tree_sitter_data = TreeSitterData::new(snapshot);
self.tree_sitter_data = Arc::new(tree_sitter_data);

, remove clear along with this match and keep things simpler?

}
}

pub fn version(&self) -> &Global {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this and latest_tree_sitter_data gone, my biggest concern now is that now we technically have more chances to have an outdated snapshot.

Now, without the version, I wonder whether we need to store this at all?
Looking at the usages:

  • impl std::fmt::Debug for RowChunks

Does not hurt to know the version these chunks are for, but it's accessible from their parent buffer/snapshot

  • chunk_range

For that, we can precompute the ranges on creation.

  • applicable_chunks

Here it can explode due to outdated snapshot, but it's used in a very stupid way: it converts input anchors into points.
We can require callers to do that with their correct snapshots.
I still wonder if the version check has to happen in this case somehow...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed The user has signed the Contributor License Agreement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants