This post does a really great job explaining how the AT protocol works. I feel like I haven’t seen such a complete and coherent explanation before now.
One thing that caught my eye and seems like a difficult problem is the discussion of how to determine the like count:
So, to get the reply count, we just need to count every such post:
This just seems really challenging to keep up-to-date, especially if folks are deleting records too?
It is a hard problem, but the folks over a Microcosm have been doing amazing work on community network infrastructure including the fantastic Constellation backlinks server.
aw thank you! it's a pretty fun problem to work on! re. parent post's concerns:
the production-facing instance of constellation (which indexes every backlink across all atproto lexicons) is still served by a single raspberry pi 5 with a 1TB NVMe disk, sitting across the room from me. there's a bigger box hot and ready for failover, but the pi just... keeps doing it fine on its own. it's now getting traffic from a few dozen different atproto apps. the index is just shy of 12 billion backlinks from the last almost-12-months.
constellation tries to thoroughly handle deletes, record updates that change links, account deletes, account deactivations (filters content from responses, comes back if they reactivate). it actually has to also index forward links in order to handle deletes, which means as a bonus it can do many-to-many queries and other kinds of both-directions graph traversal, for any data in atproto.
throughput is one thing, but maintaining integrity is also challenging. constellation consumes the atproto firehose via jetstream, which is really nice, but is not a super reliable event delivery service. constellation's index can (and does, at a very low rate) become slightly incorrect due to missing events.
using the full firehose and its sync protocol Fixes This (TM): you get cryptographic proofs per-identity for every update, that your history of events is complete and correct, plus means to recover at the individual identity level if things get out. getting constellation onto this is a work in progress, but the really hard parts about keeping integrity in a system like this are solved by the spec :)
it's all doable and really can doesn't require massive hardware investment; infra costs really scale with read load. it's kind of funny to work on something whose read and write loads are so extremely un-correllated.
(ps there's a new thing called tap which gives your backend app a really nice interface to full-sync firehose consumer. )
This sort of decoupling between applications and content on the social web is underappreciated.
Many words have been spilled about how ATProto compares to ActivityPub, and I don't want to rehash all of that here, but I would like to note that the ActivityPub protocol also allows for that sort of thing. The client-to-server ActivityPub protocol has seen little implementation but, if implemented, could similarly allow us to use a single account/data store with multiple very different applications.
This is one of the things I think ATProto has done right, and I'd really like the fediverse to do the same!
I've thought about creating an activity pub based server that can be mounted with 9p as well so I really enjoyed this. I'm a fanof filesystems as an API, especially with fuse.
I think you could take their representation of posts even further: createdAt is set when you create a file, and can be retrieved using stat, so you don't actually need a json representation for a post at all. The content of the post can just be the content of the file. Metadata can also be stored as xattributes, which is how I represent categories and privacy settings for my notes in my vjournal based filesystem.
To me this all sounds like it's tickling the same nerd sensibilities as blockchain tech does. Like radical decentralisation with some interesting data modelling tricks around how identifiers work and so on.
Fun and all but I don't think any of it survives any serious consideration of human nature and business pressures. Imagine any of these cryptoatproto startups a few years in. There's a new feature in the pipeline but it's hard/impossible to deliver with blockchainPDS due to performance/privacy/compliance difficulties inherent to the architecture of the platform. Or maybe key investors are giving the CEO a hard time about churn being too high and the ease of portability to competitors is singled out as a primary driving force behind the problem.
So "just this once" the company makes a special case one-off decision to ship a key feature using a traditional private database backend. Someone makes an angry GitHub issue about it that gets 612 thumbs up reactions and some people churn in protest and maybe even write an enshittification blog post or two. But the damage is minor and the company learns that this is a viable trade-off that can be repeated as necessary. A few years down the line and this becomes a fine-tuned plan of strategic lock-in with a veneer of openness.
Ultimately I don't believe we can design around power by being clever about identifiers and I especially don't believe architecture can substitute for governance. To me ATProto is a textbook case of a tech solution to a people problem and with age I have sadly lost my ability to enjoy the fun of the excitement around those.
My extremely cynical take on Bluesky is very similar: it's fundamentally a centralised system built atop decentralised parts, so all the nerdy types get excited about the technology while ignoring the very real social issues. (In fact, the bluesky leadership people have made a bunch of unpopular moderation decisions and retorted with "don't like it? well just run your own PDS/AppView/etc", which obviously a critical mass of people aren't going to suddenly do.)
So "just this once" the company makes a special case one-off decision to ship a key feature using a traditional private database backend.
This has already happened: direct messages on Bluesky go through an entirely centralised server, because there isn't a way to make things private in the AT protocol model.
To me, incentives do not add up, long-term, to sustain truly decentralized implementations of such social media & identity systems; additionally, people naturally flock around the most successful option or a few options, creating centralizing effects organically. Additionally, I don't know whether even theoretical benefits outweigh multiple tradeoffs systems of this nature must make around the user experience.
A few account-based, centralized platforms competing with other + many smaller aggregators like Hackernews or Lobsters + infinite number of self-hosted blogs might be the best that the Internet has realistically, not ideologically, to offer. Maybe it's not that bad after all?
Maybe you're right, and maybe we don't need big social media platforms. To me, your comment is another way of saying that a lot of pro-social activities don't align with the profit motive. So it's an economic problem at the core, and there are a lot of motivated nerds out there with time on their hands. The hacker ethic has always been strongly anti-capitalist.
It's cool to see the dialogue around decentralized/local-first tech evolve. For a long time I felt like every other month I'd see a post that basically went "corporate ownership and vendor lock-in is a problem and we should do something about it, idk what though, something something Mastodon?" Today, I'm encouraged by the proliferation of ambitious ideas that reframe not just the techiques but the ideology of "data". A lot of these ideas are surely wrong, but some of them will probably stick, and it's great to see people trying. I would love to hear others' perspectives on what works and why.
I see lots of folks pointing out that this won’t survive because it isn’t attractive to Big Tech, but…isn’t users voluntarily creating cross-platform profiles of everything they do and like exactly the kind of thing a Meta wants? Has anyone thought about the privacy implications of building such a rich, multimedia profile for users? (I’m sure someone has I just don’t follow this stuff that closely
danking | a month ago
This post does a really great job explaining how the AT protocol works. I feel like I haven’t seen such a complete and coherent explanation before now.
One thing that caught my eye and seems like a difficult problem is the discussion of how to determine the like count:
This just seems really challenging to keep up-to-date, especially if folks are deleting records too?
rushsteve1 | a month ago
It is a hard problem, but the folks over a Microcosm have been doing amazing work on community network infrastructure including the fantastic Constellation backlinks server.
phil | a month ago
aw thank you! it's a pretty fun problem to work on! re. parent post's concerns:
the production-facing instance of constellation (which indexes every backlink across all atproto lexicons) is still served by a single raspberry pi 5 with a 1TB NVMe disk, sitting across the room from me. there's a bigger box hot and ready for failover, but the pi just... keeps doing it fine on its own. it's now getting traffic from a few dozen different atproto apps. the index is just shy of 12 billion backlinks from the last almost-12-months.
constellation tries to thoroughly handle deletes, record updates that change links, account deletes, account deactivations (filters content from responses, comes back if they reactivate). it actually has to also index forward links in order to handle deletes, which means as a bonus it can do many-to-many queries and other kinds of both-directions graph traversal, for any data in atproto.
throughput is one thing, but maintaining integrity is also challenging. constellation consumes the atproto firehose via jetstream, which is really nice, but is not a super reliable event delivery service. constellation's index can (and does, at a very low rate) become slightly incorrect due to missing events.
using the full firehose and its sync protocol Fixes This (TM): you get cryptographic proofs per-identity for every update, that your history of events is complete and correct, plus means to recover at the individual identity level if things get out. getting constellation onto this is a work in progress, but the really hard parts about keeping integrity in a system like this are solved by the spec :)
it's all doable and really can doesn't require massive hardware investment; infra costs really scale with read load. it's kind of funny to work on something whose read and write loads are so extremely un-correllated.
(ps there's a new thing called tap which gives your backend app a really nice interface to full-sync firehose consumer. )
ocramz | a month ago
it's event sourcing all over again
jfred | a month ago
This sort of decoupling between applications and content on the social web is underappreciated.
Many words have been spilled about how ATProto compares to ActivityPub, and I don't want to rehash all of that here, but I would like to note that the ActivityPub protocol also allows for that sort of thing. The client-to-server ActivityPub protocol has seen little implementation but, if implemented, could similarly allow us to use a single account/data store with multiple very different applications.
This is one of the things I think ATProto has done right, and I'd really like the fediverse to do the same!
mccd | a month ago
I've thought about creating an activity pub based server that can be mounted with 9p as well so I really enjoyed this. I'm a fan of filesystems as an API, especially with fuse.
I think you could take their representation of posts even further: createdAt is set when you create a file, and can be retrieved using stat, so you don't actually need a json representation for a post at all. The content of the post can just be the content of the file. Metadata can also be stored as xattributes, which is how I represent categories and privacy settings for my notes in my vjournal based filesystem.
op | a month ago
an atproto PDS can be mounted as a FUSE filesystem!
writeup: https://oppi.li/posts/mounting_the_atmosphere/
code: https://tangled.org/oppi.li/pdsfs
an example of what this enables: counting scrobbles by artist because I publish app.rocksky.* records;
henrycatalinismith | a month ago
To me this all sounds like it's tickling the same nerd sensibilities as blockchain tech does. Like radical decentralisation with some interesting data modelling tricks around how identifiers work and so on.
Fun and all but I don't think any of it survives any serious consideration of human nature and business pressures. Imagine any of these
cryptoatproto startups a few years in. There's a new feature in the pipeline but it's hard/impossible to deliver withblockchainPDS due to performance/privacy/compliance difficulties inherent to the architecture of the platform. Or maybe key investors are giving the CEO a hard time about churn being too high and the ease of portability to competitors is singled out as a primary driving force behind the problem.So "just this once" the company makes a special case one-off decision to ship a key feature using a traditional private database backend. Someone makes an angry GitHub issue about it that gets 612 thumbs up reactions and some people churn in protest and maybe even write an enshittification blog post or two. But the damage is minor and the company learns that this is a viable trade-off that can be repeated as necessary. A few years down the line and this becomes a fine-tuned plan of strategic lock-in with a veneer of openness.
Ultimately I don't believe we can design around power by being clever about identifiers and I especially don't believe architecture can substitute for governance. To me ATProto is a textbook case of a tech solution to a people problem and with age I have sadly lost my ability to enjoy the fun of the excitement around those.
eta | a month ago
My extremely cynical take on Bluesky is very similar: it's fundamentally a centralised system built atop decentralised parts, so all the nerdy types get excited about the technology while ignoring the very real social issues. (In fact, the bluesky leadership people have made a bunch of unpopular moderation decisions and retorted with "don't like it? well just run your own PDS/AppView/etc", which obviously a critical mass of people aren't going to suddenly do.)
This has already happened: direct messages on Bluesky go through an entirely centralised server, because there isn't a way to make things private in the AT protocol model.
BinaryIgor | a month ago
I tend to agree; I once wrote a deep dive about all possible approaches to identity in the digital systems, when I was interested in doing something on the NOSTR protocol (ATProto competitor): https://binaryigor.com/centralized-vs-decentralized-identity-tradeoffs.html
To me, incentives do not add up, long-term, to sustain truly decentralized implementations of such social media & identity systems; additionally, people naturally flock around the most successful option or a few options, creating centralizing effects organically. Additionally, I don't know whether even theoretical benefits outweigh multiple tradeoffs systems of this nature must make around the user experience.
A few account-based, centralized platforms competing with other + many smaller aggregators like Hackernews or Lobsters + infinite number of self-hosted blogs might be the best that the Internet has realistically, not ideologically, to offer. Maybe it's not that bad after all?
runxiyu | a month ago
Nice post... kinda wish there are alt tags on the images though!
ashishb | a month ago
No big social media platform wants to lock themselves into such a set up.
Remember Open social. It had great backing but still did not survive.
coby | a month ago
Maybe you're right, and maybe we don't need big social media platforms. To me, your comment is another way of saying that a lot of pro-social activities don't align with the profit motive. So it's an economic problem at the core, and there are a lot of motivated nerds out there with time on their hands. The hacker ethic has always been strongly anti-capitalist.
coby | a month ago
It's cool to see the dialogue around decentralized/local-first tech evolve. For a long time I felt like every other month I'd see a post that basically went "corporate ownership and vendor lock-in is a problem and we should do something about it, idk what though, something something Mastodon?" Today, I'm encouraged by the proliferation of ambitious ideas that reframe not just the techiques but the ideology of "data". A lot of these ideas are surely wrong, but some of them will probably stick, and it's great to see people trying. I would love to hear others' perspectives on what works and why.
sugaryboa | a month ago
In UNIX everything is a file.
bernhard | a month ago
I see lots of folks pointing out that this won’t survive because it isn’t attractive to Big Tech, but…isn’t users voluntarily creating cross-platform profiles of everything they do and like exactly the kind of thing a Meta wants? Has anyone thought about the privacy implications of building such a rich, multimedia profile for users? (I’m sure someone has I just don’t follow this stuff that closely