Lightweight protocol to assert authorship of content and vouch for humanity of others

69 points by beto a month ago on lobsters | 43 comments

madhadron | a month ago

AI-generated content is often plausible sounding but wrong.

This, of course, is true of masses of human generated content, too, which is why we have the system of citations and primary sources.

SamRW | a month ago

Yes, people often have their own opinions, and in some cases these are demonstrably wrong. However in the case of AI, and more to this point of "plausible sounding but wrong", the worst part about AI is that the writing doesn't reflect anyone's true opinion.

If I read your blog, and you are talking some nonsense, at least I have the added bonus of knowing what you think. I may not agree with what you think, but I can read your perspective.

If I read a blog, and it was written by AI, it is a waste of my personal time. It would be great to have a way of knowing whether the content I am reading was written by a human. I'm not 100% sold on the idea of this standard's viability to be implemented in practice, but the idea is good.

BenjaminRi | a month ago

I think the unique thing about AI is that it writes in the style of an academic, but thinks in the patterns of an infant. This mismatch is what makes its texts so dangerous. Before LLMs, the quality of the writing was often correlated with the qualities of the thoughts within the text, this is no longer the case. However, careful readers have always separated the artistic quality of the text from the thoughts within, so the issue is nothing new indeed.

jjude | a month ago

writes in the style of an academic, but thinks in the patterns of an infant.

Never read that. Makes me think.

But let me push back a little. Time to time it has happened to me that I would be so fascinated by certain books that I would start following the author's blog / twitter etc. Their writing on those assets would be so incoherent and would leave me scratching my head if this is really the author that I admired. But when they write a book on a topic they have been blogging / tweeting then it would suddenly makes sense because they were "blurting" out their thoughts on the topic. So the question is: which writing are we talking about? A well edited boos or blogs and tweets?

(Edit: The first line didn't have any line space. Corrected it)

cpurdy | a month ago

^*

A well edited book or blogs and tweets?

[OP] beto | a month ago

Fair point!

But keep in mind that the protocol has two aspects: (1) people declare their humanity, but also (2) you get to choose who you trust. So you could build a web-of-trust composed of only rigorous authors — people who will only post human-generated content that has linked sources and references and, in addition, that will only vouch for similar people.

My expectation is that small communities will arise as people start using the protocol and vouching for each other, with similar goals and practices. And given that the browser extension stops crawling at 5 hops, and warns you with a different color at 2 (yellow) and 3+ (orange) hops, the trust will likely remain inside these communities.

Corbin | a month ago

Why does (1) matter if you've already committed to (2)? If you get to choose who you trust, why not choose to only trust humans?

[OP] beto | a month ago

(1) is important because when you post the human.json file to your website you can also vouch for other people. So someone who trusts you would benefit from that shared trust.

It also provides the necessary metadata for the browser extension to work. When you trust a website with the browser extension it crawls the web of vouches, discovering other websites that are also "human-generated".

cpurdy | a month ago

Presence and absence appear to be the only options for vouching, which means there is no "reject list" equivalent. Think: certificate revocation, but applied to AI slop sites.

thesnarky1 | a month ago

that will only vouch for similar people.

This seems hard to know. I can understand adding my favorite human-generated blog to my vouch list, but unless I have a personal relationship there, how can I assume to know the rigor they will put behind their vouching? It seems like more metadata would be needed to understand, for example, will they only link human-authored blogs with sources and references? Or might they do any human-authored blog?

Your proposal seems like it may work for a small group of people that are already communicating, but the minute you end up with a venn diagram of sorts, that "vouch" may not mean the same thing.

[OP] beto | a month ago

You're right, and there's been already a few proposals on how to improve this with more metadata. For example, specifying how many hops you vouch for in a given URL (0 if you trust just them), assigning a weight to the trust, or specifying degrees of humanity (where maybe "written by a human and citing only human sources" would be the highest?). There's been a lot of good feedback like yours, I'm hoping to compile it and use to iterate on the protocol.

thesnarky1 | a month ago

Yeah, I noticed them and think they'll help. But none seemed to get at "what do I mean by a vouch", so wanted to give you this feedback. Best of luck on the project.

eminence32 | a month ago

I guess the "web of trust" thing is interesting, and might work in small cliques, but I'm not sure I've seen examples of a web-of-trust working at large scale.

But I don't really see this proposal working.

  • If you're worried about trying to identify low quality AI-content farms, those places are already using deceptive practices to attract eyeballs and so why wouldn't they just lie in their human.json file?
  • If you're trying to learn about the authorship of some content that appears to be high-quality, then do you really care who wrote it? (This isn't a rhetorical question -- I can think of some times where the answer would be 'yes' and other times where the answer would be 'no')
  • If you're trying to use authorship info as a way to figure out if the content is high quality, then maybe there is some signal in a human.json file, but it feels like a weak one to me. I do recognize this is a problem, though. AI-generated content is often plausible sounding but wrong.
  • If you're just trying to reward human authors who eschew AI generators, then that's a fine goal, but I'm not sure a human.json file is the best way to do that

[OP] beto | a month ago

Author here. Thanks for the feedback!

To be honest, I don't this is supposed to scale. It's more focused on the IndieWeb/smolweb/etc., where you build personal relationships with site authors, and you want to leverage the relationships they have built with other people. I have maybe ~20 sites where I can say I reasonably trust that the authors are writing their own content, and this protocol is a way to grow that number based on the people they trust.

AI-content farms could post a human.json file, but you would still have to trust one of those websites, which you probably wouldn't. For the process to work you need a few "seed" websites where you trust the authors enough to also trust the people they're vouching for, so a high level of trust is needed when adding a seed.

And I think quality is a non-goal here, like you said in your last point, it's more about rewarding human generated content. You stumble upon a website, you see the green dot on the browser extension, and you know that you and the author are connected through a web of trust, and you know that the content you're reading is genuine — even if low quality or factually wrong, someone took the time to write it.

Edit: formatting

eminence32 | a month ago

If the goal is just to help promote and connect human authors, then I think I'd prefer (speaking as a reader, not as a website author) to just see some type of standardized footer that says something like "hey, I wrote all of this myself, and also here are some other people that I trust, go read them too"

[OP] beto | a month ago

Not the same, but some people have started using /ai to describe their AI policy. I agree a standard footer would be nice!

mcherm | a month ago

My biggest concern about the proposal is the scaling, and I think "this isn't supposed to scale" makes it essentially useless to me.

Your proposal does a breadth-first search through all of the items in your web of trust (up to a certain depth). That inherently limits you to a small number of sites which can be certified. Even if you had a few hard-working individuals who were willing to maintain large lists of trusted-to-be-human sources, you couldn't really add them to your tree of trust because they would make the number of nodes you had to visit be too large.

So I think any proposal which relies on "keep or build a local list of all sites known to be human" will be unable to scale to a size which would be useful.

sjamaan | a month ago

My biggest concern about the proposal is the scaling, and I think "this isn't supposed to scale" makes it essentially useless to me.

This is actually the bit I like best about it. If it were large-scale and comprehensive, the system would easily be usable by LLM webscrapers to avoid model collapse. Even as it is, I'm already hesitant to adopt this protocol for the same reason. I don't want to help those fuckers.

[OP] beto | a month ago

Hmmm, that's something I didn't consider. You raise a good point.

But I assume that as the protocol gets wider adoption it will be also be used by AI farms to pretend that they're human, which is why the trust model is important. And I expect that the trust model will only be able to define small communities, since it's might be hard to trust anything past 3 hops.

[OP] beto | a month ago

That's fair! But also, there's nothing preventing someone from writing their own crawler, and expose that through an API — moving the database of trusted sources from the browser to a server.

jjude | a month ago

I am reminded of blogrolls. Blogs used to have it. Some wordpress sites still have it. Dave Winer has it. Maybe this is "automated and verified" blogroll.

The downside is that this could be abused as a blackhat SEO technique.

eminence32 | a month ago

Comment removed by author

Gaelan | a month ago

I've been thinking about something similar, but implemented as a search engine crawler, on the assumption that (by and large) human-written web pages only link to other human-written web pages, because there's no reason to link to something of no value. Of course, implementing a search crawler is something of a fraught prospect at the minute.

But god, a search engine that only covered human-written pages would be a godsend these days.

fleebee | a month ago

It's not exactly what you're asking for, but Kagi has SlopStop that aims to downrank AI-generated pages flagged by users.

I can't compare it to other search engines (because it's the only one I use), but I'm happy with my search experience and at least they're making an effort to fight slop.

koala | a month ago

Kagi has smolweb, which has a curated list of domains. https://marginalia-search.com/ is also a thing. There are a few similar projects.

(Unfortunately, they can be a bit hit and miss. Kagi smolweb only does English sites. My blog seems to have very few pages indexed in Marginalia.)

marginalia | a month ago

What's your website? If you don't mind sharing I can take a peek in the marginalia index to see if I can figure out what's up.

cceckman | a month ago

I'd suggest that the property N ("how far the web of trusts extends") is a property of the vouch, not of the client.

As a voucher, I'd like to say: "I believe this is a human, but I do not trust their ability to evaluate whether others are likely to use LLMs" (radix 0). Or "I believe this is a human, and trust their ability to evaluate others for humanity, but not their ability to evaluate others for ability to evaluate..." etc etc, ad infinitum. ("Infinitum" being the unspecified radix, leaving it to the client.)

[OP] beto | a month ago

This is a good point! I've also considered given a weight for the vouches between 0-1. The browser would multiply the weights alongs the path, and ignore them past a configurable threshold. But I think the idea of a number of hops might be simpler and more intuitive.

csomar | a month ago

This doesn't address how it will scale and it's the same as "a href linking" but with extra steps. At scale, it'll just create an industry similar to that of SEO where people build a trust chain with one another.

[OP] beto | a month ago

Yeah, I did consider just scraping for rel="is-human" links to build the web of trust, but with the JSON schema it's easier to evolve the protocol and add more details, eg:

"vouches": [{
  "url": "https://alice.example.com",
  "vouched_at": "2026-03-09",
  "trust": 0.95,
  "degree": "AI-assisted",
  "nore": "Personal friend"
}]

I'm not saying those are planned to be implemented, but they're definitely on my radar.

sjamaan | a month ago

I have questions. Is one supposed to vouch only for people one knows personally? And what about sites of people who haven't used AI up to a certain date, and then start using it? I don't see myself keeping such a list up to date: after adding a site to it, it would probably stay there even if I've stopped following the site myself so I might not even realize they've started to post slop.

[OP] beto | a month ago

Not necessarily. I never met Cory Doctorow, but I can see myself vouching for his website.

The way I implemented the protocol in my site was as a dynamically generated JSON file, so I can manage the URLs for people I vouch for from the site UI and from other channels. For example, I have my blogroll integrated with my site, so I could add people I follow automatically to my vouch list.

Also, the vouched_at attribute can be used as signal — we could configure the browser extensions to not trust anythin vouched more than a year ago, for example.

bitrot | a month ago

Cries in semantic web

[OP] beto | a month ago

I know! I'm hoping that starting with something simple will make it easier to get wider adoption, and we can add more semantics later.

bitrot | a month ago

I’m just joking anyways! The idea of using a semantic, machine legible protocol to establish trust is interesting because it fundamentally caters more to machine than human perusal. What do you think about that tension / irony?

[OP] beto | a month ago

There's also the irony that robots.txt is in a human format, and human.json is in a machine format! 😄

My main goal with this project/protocol is to one day be able to visit a website for the first time, and know instantly that what I'm reading was generated by a human, because my browser can compute a path from people I trust. And in order for that to happen we need a protocol for machines.

And to be clear, I'm not against AI. I use Claude Code daily, and I used it to write the browser extensions. I do think there is a lot of value in AI, but I also think there's value in having sites where the content is human generated.

th0ma5 | a month ago

A lot of the criticism of the semantic web technologies came back to a lot of arguments similar to the spam discussion from forever ago: https://craphound.com/spamsolutions.txt ... You're advocating for a closed world hypothetical subset on top of open world assumptions, which has documented struggles.

xyproto | a month ago

Content from the time before AI should also be considered as human.

[OP] beto | a month ago

It's a valid point, but I'm not sure how to incorporate it in the protocol, since it can be easily spoofed. If you trust someone, the date is irrelevant; and if you don't trust someone (or they're not vouched by someone you trust), then you can't trust the date either.

th0ma5 | a month ago

Which AI and at what time? Horse Ebooks is an example of ... Well we don't know. Originally a Markov chain possibly?

addison | a month ago

I like this. A while ago I was thinking about such trust mechanics. Webrings but directed graphs rather than proper rings. Ah well.

Regardless of impl, I always worry about how these things can be abused. Other folks have already mentioned the spam generators simply lying, but I also worry about it effectively focusing scraping efforts. I would probably only deploy this with iocaine, and even then, I just generally wonder if the open web must close somewhat...

addison | a month ago

I have since adopted this protocol and will try it out for a bit, just to see how it feels in practice.

JulianWgs | a month ago

Wouldn‘t it also make sense to add a backlink? The realistic scenario I have in mind is that I visit an untrusted domain and want to find out whether someone I trust, trust that domain. Creating this graph is rather tedious coming from the domains I trust.

[OP] beto | a month ago

Yeah, I've been thinking about this too! Feel free to leave an issue on the repo if you'd like to brainstorm the idea!