Skip to content

Conversation

@iansan5653
Copy link

@iansan5653 iansan5653 commented Aug 27, 2025

Initial checklist

  • I read the support docs
  • I read the contributing guide
  • I agree to follow the code of conduct
  • I searched issues and discussions and couldn’t find anything or linked relevant results below
  • I made sure the docs are up to date
  • I included tests (or that’s not needed)

Description of changes

TLDR: Adds support for extensions' enter and exit handlers to return false to fall back to earlier extensions or the default logic.

Context

I am working on a project where I'd like to attach additional information to nodes based on the information already in the tokenization output. In this particular case, I'm trying to record data about the specific syntax that formed a node in the tree (essentially building a sort of concrete syntax tree on top of the AST).

While building an extension that provides handlers for the relevant syntax tokens and attaches them to nodes in the stack, I noticed that I was breaking the built-in parsing functionality. Upon digging I realized that this was because my enter handlers were overriding and disabling the default handlers, even though I only wanted to encounter the tokens without doing the parsing and tree building myself.

I realized that a very useful feature of extension handlers would be the ability to 'fall through' to the default behavior if desired. To me the most intuitive approach for this was to support returning false to indicate that a token was not handled by the handler. This is a flexible API that allows for handlers to fall through conditionally, allowing them to choose to only take over the handling of a token in certain circumstances.

Since the code change here is fairly straightforward, I chose to go ahead and open a PR rather than start with an issue. I am very open to feedback here and happy to change the approach or reject the change entirely if you don't think it's useful.

@github-actions github-actions bot added 👋 phase/new Post is being triaged automatically 🤞 phase/open Post is being triaged manually and removed 👋 phase/new Post is being triaged automatically labels Aug 27, 2025
@wooorm
Copy link
Member

wooorm commented Aug 28, 2025

Hey!

Can you expand on your actual use case? Sounds like a lot of work to overwrite all handlers.
And, what if there is no handler before?
And, what for extensions that add more handlers?
How would you actually use this?

Why not reach for a beforeEnter/afterEnter/beforeExit/afterExit, something like that?

@iansan5653
Copy link
Author

iansan5653 commented Aug 28, 2025

Sure!

Can you expand on your actual use case? Sounds like a lot of work to overwrite all handlers.

My specific use case is that I am working on a Markdown editor and so I need to know the locations in code of many token types. I could just use micromark tokens directly, but the output from mdast-util-from-markdown is also useful as I also want to integrate some information about the document structure. Parsing out the boundaries of a listItem from the tokens, for example, is pretty difficult and I'd rather not duplicate that logic.

I could handle the mdast tree and the micromark tokens completely separately, but that would require inefficiently tokenizing the document twice and working with two data structures in parallel that would be hard to link together. I'd rather combine all the information I need into one tree by hooking into the existing functionality.

And, what if there is no handler before?

The combineHandlers logic I've defined would handle this case by simply returning the right handler.

And, what for extensions that add more handlers?

If extensions defined after this one define handlers for the same tokens, they would fall through in the same way:

  • If they return undefined, previously-defined handlers would not be called
  • If they return false, previously-defined handlers would still be called

How would you actually use this?

Here's a very simplified example (in practice I will handle many more token types and have a more generic handler):

function handleQuoteMarker(token) {
  this.stack.findLast(n => n.type === 'quote')?.children.push({
    type: "quoteMarker",
    position: {
      end: token.end,
      start: token.start,
    },
  });

  return false // allow the parser to still handle this token, if necessary, to build quote nodes
};

const tree = fromMarkdown(markdown, "utf8", {
  mdastExtensions: [{
    exit: {
      blockQuoteMarker: handleQuoteMarker
    }
  }],
});

Why not reach for a beforeEnter/afterEnter/beforeExit/afterExit, something like that?

That would also work by introducing new events that are less likely to conflict, although I would still be concerned about the possibility that the package internals or other extensions might start to depend on these in the future. I still wouldn't want my extension to override any default functionality, just augment it.

Perhaps the design for these new events could specify that all registered handlers for a token are always called in a consistent order, regardless of conflicts across extensions? Would it be confusing though if these work in a different way from enter and exit? But maybe that's fine since they would act more like "events" than the enter/exit handlers do.

@iansan5653
Copy link
Author

Hello @wooorm, I'm still interested in shipping this feature or some form of it if possible. We have been working around it by patching the installed NPM dependency but of course that's definitely not ideal.

Any chance we might be able to get this across the line? Happy to iterate on it and change the approach if you prefer something else.

@wooorm
Copy link
Member

wooorm commented Dec 4, 2025

Ah, very cool that GH is building exciting things on this :)

Thanks for your patience!

From your OG goal above, maybe the tokens are indeed what you should use.
Alternatively, when you are going with an AST, why not walk that AST? Instead of trying to intercept everything?

To get around: “that would require inefficiently tokenizing the document twice”, perhaps an alternative is to get access to the original tokens from here?
For “and working with two data structures in parallel that would be hard to link together”, well, you are dealing with 2 now, and yes, that comes with difficulties, but I don’t yet see how this makes that less difficult?
For “I'd rather combine all the information I need into one tree by hooking into the existing functionality.”, what kind of new different tree are you making?

Here's a very simplified example (in practice I will handle many more token types and have a more generic handler):

That’s interesting! What is this quote node? A custom construct? If that’s totally custom, then why are you worrying about things being overwritten? 🤔

That would also work by introducing new events that are less likely to conflict, although I would still be concerned about the possibility that the package internals or other extensions might start to depend on these in the future. I still wouldn't want my extension to override any default functionality, just augment it.

Perhaps the design for these new events could specify that all registered handlers for a token are always called in a consistent order, regardless of conflicts across extensions? Would it be confusing though if these work in a different way from enter and exit? But maybe that's fine since they would act more like "events" than the enter/exit handlers do.

Handlers (in the existing sense) are supposed to create something, mutate the result, and can be overwritten to do things differently. They assume control.
My idea for these more “listeners”, is to not really mutate the nodes. Maybe they add other fields (such as expose some extra stuff in data).

From your quote example, you are pushing children into some parent node. What if the other extensions you want to “fall through” expect those not to be there?

But, your example seems to be about a new node, so I don’t quite see how a fall through is needed, as no extension will handle it?

Sorry for the wait and all the questions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🤞 phase/open Post is being triaged manually

Development

Successfully merging this pull request may close these issues.

2 participants