Sem: New primitive for code understanding – not LSPs, but entities on top of Git

165 points by rohanucla a day ago on hackernews | 54 comments

Animats | a day ago

Is this for checking what Claude Code just did to your repo?

[OP] rohanucla | a day ago

It can do that, but that's a small slice of what it does. sem parses your codebase into entities (functions, classes, methods) and builds a dependency graph across files.

So instead of line level analysis the whole granularity of seeing changes and tracking thing shifts to entities. It helps in attention mapping of your agent and lets you track the changes faster.

LSPs have been doing it for quite long but using treesitters is faster even tho type awareness is not great with this approach but overall working across multiple languages with a single tool can be quite helpful.

throw1234567891 | 22 hours ago

The tool looks great! Thanks for sharing.

[OP] rohanucla | 17 hours ago

Appreciate it!

ssivark | 10 hours ago

1. Do you expose this dependency graph so folks can play with it / build interesting things on top? An interesting example would be to understand whether/how a version bump on one of your dependencies might affect your code.

2. What would it take to add a new language? I'm interested in using this with Julia.

hankbond | 23 hours ago

I am interested in subtle ways in which we can change how we write software to get better outcomes out of harnesses (model + tools + skills). I'm imagining that use of Sem will be more effective on code written in some shapes than others.

Can you describe what ways this might be beyond just breaking up code into smaller functions?

An example of this is that Models tend to create unit tests that are mostly just mock + reimplementations of imperative code in the functions they test. If you could force behavioral testing by only allowing test creation agents to accessing the function docstring, name/args/types, branch statements and log events, you could potentially avoid these classes of weak tests being created. But that would mean that your code has to optimize to providing signal via those elements.

This is just an example I'm not sure that would actually work.

hankbond | 23 hours ago

Also I keep seeing solutions in this space that are doing inheritance and call stack dependency linkage, but I haven't seen the same level of exploration into data lifetime dependency. Not lifetime in the way it exists in Rust (to my knowledge), but like including when you copy data and transform that copy. The motivation is "if I change this variable, enumerate all the areas that change would propagate to". The idea is similar, to evaluate blast radius of modifications. Ideally something like this could make refactoring more token efficient and consistent as well.

I don't know if you can reliably do that with static analysis tho. I would be interested in some sort of debug attachment like process that does a code coverage type evaluation. If you can't tell this is at least on the edge of (if not past) my depth of expertise

[OP] rohanucla | 23 hours ago

This is a really interesting direction, you're essentially talking about data flow or taint analysis, where you track how a value propagates through copies and transformations rather than just following call edges. Honestly pure static analysis gets you partway there but it hits real limits once you run into dynamic dispatch, runtime branching, or serialization boundaries where data gets written somewhere and read back in a completely different part of the codebase.

We're on the structural side right now with call graphs and dependency edges, but a hybrid approach that combines the static graph with runtime instrumentation to fill in the gaps is definitely something I'd love to explore. Thanks for the feedback.

hankbond | 20 hours ago

https://en.wikipedia.org/wiki/Taint_checking

I'm sorry for distracting from your engaging and thoughtful reply but I can't help but giggle at the name of this concept.

[OP] rohanucla | 18 hours ago

haha definitely!

insensible | 13 hours ago

You think it’s funny but it taint.

[OP] rohanucla | 23 hours ago

What I've been more interested in lately is structural intelligence as a field in whole.

Things with LLMs break because our infra was always designed for analyzing lines(tools like grep fuzzy matching) and working on quite small sections of code. LLMs struggle with this in cases when they have to analyze different parts of a codebase they either get too much context where you're throwing whole files at them, or too little where they only see the function in isolation, with no real understanding of how the pieces actually connect to each other.

That's really the gap sem is trying to fill. With sem impact you can give an agent the precise blast radius of a change instead of guessing which files matter, and sem diff --patch lets you enforce that a change only touches specific functions and reject anything that bleeds outside that boundary something that's really hard to do with line-level diffs.

Your testing idea is actually closer than you might think. sem already extracts entity signatures, dependencies, and call graphs, so you could build a harness that gives the test-writing agent only the function signature with its dependency graph and behavioral contract, while withholding the implementation entirely. That would force the agent toward behavioral tests because it literally can't see the internals to mock them. I haven't built this harness myself yet but sem graph and sem inspect expose everything you'd need.

The general principle is that sem gives you a structural map of the codebase to both constrain and validate what the model produces, rather than treating code as flat text and hoping the model figures out the relationships on its own.

Another usecase can be about figuring out dead code present in the codebase.

Edit: Also one last thing because I started working on this while solving the fundamental issue of why merge conflicts were occuring with git, so you might also like the merge drive I open sourced on the same Github org - Weave

hankbond | 20 hours ago

> a structural map of the codebase to both constrain and validate what the model produces

I think this in apt and concise description of what this is trying to accomplish. I'm feeling like we had some really great gains in Model improvements both at the top end and the bottom over the last 6-7 months, but the next period is likely to be defined by harness improvements. I appreciate that your effort is being applied to this particular problem set because I think its far more fundamental to improving agentic performance in code bases than yet another memory framework.

qudat | 23 hours ago

I really like this idea and have been experimenting with it over a week or so.

I think there’s an opportunity to use an AST diff system for code forges where you don’t present the user with line diffs in the UI — or at least not as the first diff the user sees.

I firmly believe code review should happen in your editor.

[OP] rohanucla | 23 hours ago

Really glad you've been using it, and yeah that's exactly the direction I've been thinking about. The line diff as the default view in code forges has always felt like an accident of history it definitely was easy to compute, but not what's actually useful for understanding what changed.

awoimbee | 23 hours ago

The benchmarks aren't great, they're super specific to sem's output: why would I ask Claude how many "entities" were modified by a commit and do I need a tool specifically for this request ? Note that an "entity" is a sem-specific concept...

[OP] rohanucla | 23 hours ago

Thanks for pointing it out. I agree with you here, my testing process was quite specific to sem's output but also would love any suggestion from you of how you would design the whole testing process for this kind of tool?

I can also give my thought process, because I was more interested in figuring out the model's inherent search results and understanding without sem.

andai | 23 hours ago

  $ sem impact authenticateUser

  ⊕ function authenticateUser (src/auth/login.ts:26)

    → depends on:    db.findUser, rateLimiter.check
    ← used by:       loginRoute, authMiddleware
    ! 42 entities transitively affected
    ᛋ 7 tests affected
Okay that is pretty cool. I appreciate this information as a human also.

I got about halfway through reinventing something like this last year (minus the git part). I was trying to make a graph of dependencies in the codebase. (I actually got pretty far with a regex!)

[OP] rohanucla | 22 hours ago

Ha, the regex approach is honestly how a lot of people start with this problem and you can get surprisingly far with it until you hit the edge cases around aliased imports, re-exports, and nested scopes where things start falling apart. That's basically why we went with tree-sitter under the hood it gives you the actual parse tree so you don't have to keep patching regex patterns for every new language construct.

Scaevolus | 21 hours ago

Something like https://www.kythe.io/?

docheinestages | 22 hours ago

I doubt if this actually solves a real problem for humans or agents, especially in complex projects. It might help if the examples show scenarios where this tool and its commands could make a difference.

[OP] rohanucla | 22 hours ago

Lemme give you an example. when you're working in a 100K-file TypeScript monorepo and you change a utility function that parses API responses. git diff tells you that you changed n lines in that function. What it doesn't tell you is which services, components, and tests actually depend on that function across the repo. You're left grepping for the function name, hoping nobody aliased the import or re-exported it through a barrel file. sem impact gives you that full downstream dependency list in seconds, so you know exactly what to review and test before you ship.

onlyrealcuzzo | 22 hours ago

Okay, this looks great, but for the love of God... please cut this out:

> AI agents are 2.3x more accurate when given sem output vs raw line diffs. See the benchmark.

No... This is not convincing of anything. These are not real world tasks.

You're trying to pretend like your tool makes AI agents 2.3x better at coding or bug fixing.

It doesn't.

Your benchmark doesn't prove that.

Your tool is cool. Sell it for what it is. Not for what it's not.

jawns | 22 hours ago

The "Try it. 10 seconds." section at the bottom of the page hijacks an existing tool (git diff) and installs a pre-commit hook.

But there are no instructions for how to reverse those actions if you don't like the tool. Feels a little user-hostile to me.

[OP] rohanucla | 22 hours ago

I am sorry, should have put up a warning there, but You can do sem unsetup, if you go to the github, you will understand more about the way to reverse it.

bendjejdjdh | 21 hours ago

tone deaf comment. "read the docs to undo it" is user hostile.
iiuc they answered the question directly and then told them where they can find further answers, didn't seem tone deaf at all
op literally wrote `sem unsetup` in their comment, so, I don't see what is "tone deaf" in this comment!

Brian_K_White | 19 hours ago

"Ah yeah, you're right. I apologize for that. Here is what to do. I'll update the page."

What an asshole! Plus the uninstall steps were completely inconsiderate single 2 word command. Outrageous.

I can't even think of a better possible response.

dboreham | 21 hours ago

A step in the right direction, and interesting in that it layers over existing git rather than requiring a whole new (unfamiliar, untested) SCCS.

[OP] rohanucla | 20 hours ago

git is actually great, and there are not much of the issues as the world says about it, and the best is to build complimentary layers that makes it even stronger is the best bet I guess.

jiggunjer | 18 hours ago

Another potential use case: This may help jujutsu auto split a large revision into small orthogonal revs.

Sometimes agent makes a monolithic commit and it's a lot of work to manually split code you didn't write. After such an auto split I can manually squash related revs into feature/ticket level.

[OP] rohanucla | 18 hours ago

That's a really compelling use case actually

jiggunjer | 6 hours ago

Thx oh and maybe don't call it sem. It's not really semantic, more like a big picture view vs the ground level git lines. How about "bye", short for bird's-eye?

cpard | 18 hours ago

This is really neat. I’m working on something similar but for data artifacts not just code. It’s very encouraging to see that this kind of tooling helps both humans and models, that was what made me starting to work on that.

[OP] rohanucla | 18 hours ago

Thanks! The data artifacts angle is really interesting. in some ways the problem is even harder there because data pipelines have less explicit structure than code, I guess.

gwerbin | 18 hours ago

The artifacts themselves have more structure, but diffing is hard because of size: what exactly do you show in the different? Row-level? Summary statistics? How do you keep it from getting slow on bigger datasets?

Then there are plots saved as images which have basically no structure at all exposed.

cpard | 16 hours ago

Row level and summary stats are both diffs over values that can tell you that something changed but not whether the * meaning * has changed. What I'm working on is providing more information on how the meaning changes.

What questions I'd like to answer with the diffing is more like: will the grain go from one-row-per-user to one-row-per-user-per-day, will a key stop being unique, will a join start fanning out and quietly double a measure, will something additive become non-additive.

This diff is over structure but this structure is latent in the transformation that produces it and to make things harder, if we are talking about some declarative language being used (e.g. SQL) the code doesn't even describe how things are getting done, but what the output would be.

What I've ended up doing is recovering the structure from the code by analyzing it and then using * cheap * profiling than a full row compare.

As an example, my equivalent impact sub-command output would be something like this: "this change makes account_id non-unique three models downstream"

gwerbin | 18 hours ago

There is still no good "data diff" tool that I can run on, say, a big pile of CSV or Parquet. Something with DVC integration would be especially welcome.

appplication | 17 hours ago

I would imagine because at scales where most folks use parquet files, you’re generally no longer really thinking in terms of individual diffs to your data (and also does imply some level of batch processing, vs e.g. a DB).

We have some custom data diff tools at my ultracorp that provide a browsable interface, but the customer tends to be more operations folk than engineers or DS etc who would be more familiar with actual version control concepts. But these work against the data store and not on something like csv or parquet.

globnomulous | 16 hours ago

Interesting idea. How does it, or how should it, perform in huge monorepos, where git performance suffers? I spend most of my time in a repo that contains hundreds of thousands of files, where just a simple `git status` can take >3.5 seconds even on very fast consumer hardware. (Thank God for sparse-checkout.)

[OP] rohanucla | 16 hours ago

This is actually the exact scenario we just spent the last few weeks optimizing for. On a 71K-file TypeScript monorepo, sem was previously choking entirely (DNF), and now completes in 6.5s with the topology cache warm. On a 100K-file generated fixture, sem impact went from 90s cold down to about 1s warm. The key was building a SQLite-backed cache that stores the dependency graph structure so repeat runs skip re-parsing unchanged files entirely.

mcintyre1994 | 14 hours ago

This looks really neat, but I see its diff output as complimentary to git and wouldn’t want to replace git diff. Is there a way to just install the CLI and MCP server but not override git diff?

[OP] rohanucla | 13 hours ago

It doesn't override git diff at all, sem is its own standalone CLI. git diff continues to work exactly as before. You do sem setup only when you want to change your default git diff behavior, other wise after installing sem you can use it straight away using sem commands.

OJFord | 13 hours ago

If I were you I'd remove setup/unsetup commands and replace with a note that if you want to use it for git diff here's what to put in your config, or suggest aliasing as git sdiff or whatever.

mcintyre1994 | 13 hours ago

Ah okay thankyou! Is the MCP server manually configured, or is there documentation on the suggested way to tell an agent to use sem? My guess was that setup was how to do that.

[OP] rohanucla | 6 hours ago

no setup just configures your git diff to use sem by defult, you will find the sem mcp directory on github repostiory, also there's skill.md file which will tell your agent on how to use sem.

znnajdla | 13 hours ago

Looks very useful, but incredibly obnoxious that it overrides default git diff. And the only way to get the regular git diff output is to uninstall it? There is no way I would ever want to do that.

[OP] rohanucla | 13 hours ago

sem doesn't override git diff, it's a completely separate command (sem diff). Your regular git diff should work exactly as it always has after installing sem.

If you want to change your git diff default behavior then you can do sem setup.

znnajdla | 13 hours ago

That’s not clear at all from the docs. It shouldn’t be called “setup” then. Even after doing sem setup there should be a CLI flag to get the default diff output without unsetting up. Very annoying hijack.

[OP] rohanucla | 6 hours ago

sorry if you consider that as hijack, it was just a user's request to use this as default plugin on their git. But I will add it to let the users know thanks for the feedback

alex7o | 11 hours ago

I have been using all of ataraxy labs tools for the past few months and that have been indispensable for models to make less mistakes for me. From better git diffs to impact analysis and code reviews.

[OP] rohanucla | 6 hours ago

Thanks a lot Alex! for this reply, it keeps us pumped.

paolomainardi | 26 minutes ago

This is very cool. Is there a compound skill.md that lets agents use the CLI in the right way, with some examples?