在Chumsky解析器中精确获取标识符的Span范围

2025-06-16 17:05:22作者：明树来

在Chumsky解析器组合库中处理语法树节点时，开发者经常需要精确获取源代码中各个元素的Span范围信息。本文将深入探讨如何在使用Chumsky构建解析器时，精确控制Span范围的获取。

问题背景

在解析类似C语言的变量声明语句时，如int a = 0;，我们通常需要获取：

标识符a的Span范围
表达式0的Span范围
整个语句的Span范围

初始解决方案的问题

初始实现中，开发者可能会尝试这样获取Span：

let stmt_declare = just(Token::Int)
    .then(ident)
    .map_with(|(ty, ident), e| (ty, (ident, e.span())))
    // ...其他组合子...

这种方法的问题在于，e.span()会返回从Token::Int到标识符结束的整个Span范围，而不是仅标识符的范围。

优化后的解决方案

正确的做法是在ident解析后立即使用map_with获取其Span：

let stmt_declare = just(Token::Int)
    .then(ident.map_with(|ident, e| (ident, e.span())))
    .then(just(Token::Equals).ignore_then(expr()).or_not())
    .then_ignore(just(Token::Semicolon))
    .map_with(|((_ty, ident_and_span), expr_and_span), e| 
        (Stmt::Declare(ident_and_span, expr_and_span), e.span()))
    .boxed();