-
Notifications
You must be signed in to change notification settings - Fork 6.2k
language: Make TreeSitterData only shared between snapshots of the same version
#44198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
f769a27 to
c96c5ed
Compare
c96c5ed to
e937058
Compare
SomeoneToIgnore
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, much more correct way to fill this cache.
| } | ||
| None => HashSet::default(), | ||
| }; | ||
| let known_chunks = known_chunks.cloned().unwrap_or_default(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: at this state, seems more appropriate to accept by value these new known_chunks?
| pub struct TreeSitterData { | ||
| chunks: RowChunks, | ||
| brackets_by_chunks: Vec<Option<Vec<BracketMatch<usize>>>>, | ||
| brackets_by_chunks: Mutex<Vec<Option<Vec<BracketMatch<usize>>>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way I understand the data flow now:
-
we only start with clean data and clean it on each tree-sitter reparsed event
-
each snapshot gets an
Arc<TreeSitterData>, the same one, with the field they want to access + clone, or initialize once + clone
No blocks and mutex are needed for that, it seems?
I wonder if this could be a Vec<OnceCell<Vec<BracketMatch<usize>>>> given that we're synchronous around this code in the near future?
| self.tree_sitter_data.lock().clear(); | ||
| let snapshot = self.text.snapshot(); | ||
| match Arc::get_mut(&mut self.tree_sitter_data) { | ||
| Some(tree_sitter_data) => tree_sitter_data.clear(snapshot), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that clear is used once here and needs the same parameters to do the same thing new does.
The get_mut trick is neat, but seems that we can just always do
let tree_sitter_data = TreeSitterData::new(snapshot);
self.tree_sitter_data = Arc::new(tree_sitter_data);, remove clear along with this match and keep things simpler?
| } | ||
| } | ||
|
|
||
| pub fn version(&self) -> &Global { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this and latest_tree_sitter_data gone, my biggest concern now is that now we technically have more chances to have an outdated snapshot.
Now, without the version, I wonder whether we need to store this at all?
Looking at the usages:
impl std::fmt::Debug for RowChunks
Does not hurt to know the version these chunks are for, but it's accessible from their parent buffer/snapshot
chunk_range
For that, we can precompute the ranges on creation.
applicable_chunks
Here it can explode due to outdated snapshot, but it's used in a very stupid way: it converts input anchors into points.
We can require callers to do that with their correct snapshots.
I still wonder if the version check has to happen in this case somehow...
Currently we have a single cache for this data shared between all snapshots which is incorrect, as we might update the cache to a new version while having old snapshots around which then may try to access new data with old offsets/rows.
Release Notes: