Explainer: Tree-sitter vs. LSP

32 points by anup a day ago on lobsters | 13 comments

conartist6 | a day ago

I don't think it'e entirely fair to say that LSP "solves" M x N. In this case M and N are tools and languages. LSP is not prescriptive in terms of languages -- you can use it with pretty much whatever language you want. But it is quite prescriptive in terms of tools! It provides support for maybe ~7 specific tools depending on how you count: Go To Definition, Find References, Render toolip, Render outline view, Autocomplete, Semantic Tokens, and Code Actions.

These are basically exactly and only the features you need to build something that looks and feels like a clone of VSCode. It's not M x N, it's more like 7 x N. This both suits Microsoft's strategic use for LSP (cement VSCode's market dominance) and their stated engineering design principles which are "bake the UI and UX into the protocol" which can only lead to that outcome.

technomancy | a day ago

In this case M and N are tools and languages. LSP is not prescriptive in terms of languages -- you can use it with pretty much whatever language you want. But it is quite prescriptive in terms of tools! It provides support for maybe ~7 specific tools

That seems like a straw man; the article specifies very clearly that N is languages and M is text editors, not "tools". If you unilaterally redefine M then yes, obviously the claim in the article no longer holds.

That said, people do need to start treating the LSP spec as a starting point and not the final say in language tooling, which I think is your broader point. There are a few people doing this, (see https://lobste.rs/s/ju12fl/bringing_emacs_support_ocaml_s_lsp_server) but it's hard to gain consensus when Microsoft controls the spec and has shown a clear disinterest in improving it.

gcupc | a day ago

One other thing tree-sitter could be used for, but I haven't really seen it used for yet, is providing syntax-aware editing for non-Lisp languages the way paredit does for Lisp. I use an Emacs package called smartparens that tries to add it to all programming modes, but it's possible for operations like slurp and barf to leave your file in a state that's not syntactically valid, unlike paredit.

vsvrd | a day ago

I write very little Lisp and I am not intimately familiar with paredit, but I believe that the combobulate Emacs package by Mickey Petersen may be what you are looking for.

gcupc | a day ago

Oh yes, that's right. I had heard of that, and forgot about it because I can't get tree-sitter working in the environment in which I actually need it.

technomancy | a day ago

That leads to the other huge difference that I wish the article had explained: LSP is a protocol and tree-sitter is a library. The way most people use tree-sitter, they download a random binary blob created by gods-know-what and stuff it into their editor's process, which as you have experienced first hand, often doesn't work very well but also encourages some pretty disgusting software hygiene practices.

gcupc | a day ago

For Emacs, it seems like the way to use tree-sitter is to download the source to that blob (from a URL defined by convention), compile it, and stuff it into your editor's process. Which isn't a whole lot better, but also doesn't work at all on the one major platform that does not include a C compiler.

technomancy | a day ago

I haven't used tree-sitter, but my understanding from following the many, many threads about people trying to figure out why it doesn't work is that you can't do this with a C compiler alone; you need some npm shit as well.

quasi_qua_quasi | 23 hours ago

AFAIR last time I built one, grammars are written as JS files that then get translated to C using a Rust tool that shells out to node to actually evaluate the grammar (so that's where the npm comes in). Once you have the C you can compile that as normal to whatever shared library you like. I think there's also a WASM setup but I don't know much about it.

technomancy | 4 hours ago

Honestly it feels like a scheme deliberately designed to annoy anyone who actually cares about provenance. So many steps and unnecessary dependencies; three different languages all completely unrelated to the language you actually want to parse, just to do what could have been done as a declarative grammar.

Even if the C code is checked in, the implication here is that you must trust generated code. And generated code has a tendency to be the worst kind of code to review and audit, so I assume no one actually does this.

crmsnbleyd | 14 hours ago

The generated C files are usually kept in tree

quasi_qua_quasi | a day ago

helix has this; I can park my cursor inside a Rust function call and hit A-o to select the call, the statement, the enclosing if block, etc etc. Occasionally this doesn't do what I want (in fluent APIs, it's natural to want to select baz() in foo.bar.baz().quux() but there's no AST node that corresponds to that) but in the 95% of cases where it works it's quite nice.

danlamanna | a day ago

One other thing tree-sitter could be used for, but I haven't really seen it used for yet, is providing syntax-aware editing for non-Lisp languages the way paredit does for Lisp.

tree-sitter is what powers cursorless. See also https://lobste.rs/s/l4em19/cursorless_spoken_language_for_editing.