|
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168 |
- title: Understanding A Protocol
- url: https://aeracode.org/2022/12/05/understanding-a-protocol/
- hash_url: 4b5bae499ad13fe0f5413d8c7b77c09a
-
- <p>Yesterday I pushed out the <a href="https://docs.jointakahe.org/en/latest/releases/0.5/">0.5.0 release of Takahē</a>,
- and while there's plenty left to do, this release is somewhat of a milestone
- in its own right, as it essentially marks the point where I've implemented
- enough of ActivityPub to shift focus.</p>
- <p>With the implementation of image posting in this release, there are now only
- a few things left at a <em>protocol</em> level that I know I'm missing:</p>
- <ul>
- <li>
- <p>Custom emoji (these are custom per-server and a mapping of name-to-image
- comes with each post)</p>
- </li>
- <li>
- <p>Reply fan-out to the original author's followers</p>
- </li>
- <li>
- <p>Pinned posts on profiles (and collections in general)</p>
- </li>
- <li>
- <p>Shared inbox delivery (to reduce fan-out requests)</p>
- </li>
- </ul>
- <p>My current aim is to get Takahē to a point where a few small communities can
- run on it (including takahe.social), and while these are nice,
- they are not critical for that. The reply fan-out is probably most important,
- but is also the easiest given what we have written already.</p>
- <p>Instead, it's now time to shift and focus on stability and efficiency. My
- general tactic for big new projects like this is an initial "spike" period,
- where I am more focused on pushing out code with a roughly correct architecture
- rather than focusing on query efficiency, caching or the like, and to then
- shift gears into more of a "polish" period.</p>
- <p>Takahē is actually pretty useable for me as a daily driver for
- the <a href="https://takahe.social/@takahe@jointakahe.org/">@takahe@jointakahe.org</a>
- account - sure, I find a few bugs here or there, but it's honestly not bad.
- That means, to me, it's time to shift focus a bit more towards polishing.</p>
- <p>The other big missing feature for a community at this point is probably having
- mobile app support (which I plan to do by implementing a Mastodon-compatible
- client API) and better moderation features (reporting and user blocking,
- in addition to the existing server blocking).</p>
- <p>So, I'm going to focus on adding those, polishing, and improving efficiency; there's now
- quite a few other contributors to the project who have been helping out
- with bugfixes, efficiency, and plenty more, which is helping a great deal.</p>
- <p>I'll also be sending out a few
- invitations to <a href="https://takahe.social">takahe.social</a> to use that as a testbed
- as the first small community; nothing like dogfooding your own software to see
- what it needs (as well as asking some existing Mastodon admins for their
- thoughts, if they are gracious enough to lend me some of their time).</p>
- <p>Still, though, getting to this point is quite a big deal - I feel like I've
- learned a lot about ActivityPub and its related specifications by implementing
- them. So let's talk about it a little bit.</p>
- <h2>Fan-Out</h2>
- <p>ActivityPub is all about "fan-out" - the process of getting posts from their
- authors to their followers. At the basic level, this means one HTTP request
- per follower to deliver it to their inbox - but there's some efficiency gains
- to be made with "shared inboxes", where you can push things on a per-server
- basis rather than per-user.</p>
- <p>Obviously, doing this is noisy and takes a lot of requests, and has to be done
- as background workers - especially as the server on the other end might be
- down when you try and send the message over, and you need to retry.</p>
- <p>Plus, whenever you reply to someone, that reply is then sent to every one of
- their followers so that it can appear in reply threads. This means there's an
- increasing amplification effect as you get more and more followers, and your
- server spends a lot of its life just sending request and getting requests from
- other servers.</p>
- <p>There's other aspects to fan-out, though; there was an
- <a href="https://ar.al/2022/11/09/is-the-fediverse-about-to-get-fryed-or-why-every-toot-is-also-a-potential-denial-of-service-attack/">excellent blog post</a>
- about that last month that outlines the problems with link previews. See, when
- you post a message to a server's inbox, Mastodon (at least) goes and fetches
- any image attachments, and tries to generate web previews for any links. If
- people have mobile clients, some of those will <em>also</em> try to fetch previews.
- This does not end well for unprepared servers - and for those links, they
- could just be someone's random blog.</p>
- <p>Takahē does not do this prefetching yet - we'll likely never do it for the
- link previews (but some clients connected to us will, once there's a client API).
- Post attachments and profile images are a different story - we need to at
- least proxy those for user privacy, but we can hopefully make it a caching
- proxy. If users have their timelines open, there's not a big difference between
- a caching proxy and prefetching for the source server, either.</p>
- <p>How do we solve this? Well, bundling some of this data into the original post
- is one idea; having shared caching proxies split between multiple servers is
- another potential one as well.</p>
- <p>That brings us, though, to the push-pull of scaling that's at the heart of
- ActivityPub.</p>
- <h2>Two Axes Of Scaling</h2>
- <p>In a previous post about ActivityPub and Takahē, I referred to the fact that
- there are two "axes" of scaling available in the protocol:</p>
- <ul>
- <li>Having more people per server/instance</li>
- <li>Running more servers/instances</li>
- </ul>
- <p>Both of these have their pros and cons, and it's hard to go all in on one of
- them - having a million people on one server is difficult to scale for that
- individual server (you have to start building it as its own distributed
- system), but having a million servers makes the fan-out problem even worse
- (say hello to massive prefetching loads and shared inboxes being not very
- useful).</p>
- <p>It seems to me like a bit of a separation between domain, moderator, and
- caching store is needed - Takahē already lets multiple domains be on a single
- server, but the server moderation and caching are scoped just to that one server.</p>
- <p>I do believe that sharing moderation across domains is a very important scaling
- step; this doesn't have to purely be "multiple domains on the same server",
- either - I think there's scope for a moderation API where you can have a
- team of professional (volunteer or paid) moderators look after multiple
- servers.</p>
- <p>Sharing caching and previews is also important, though; if there was just ten
- or so link preview caches around, and all servers used one of them, then we
- still avoid centralisation while massively lowering the load on the target
- of links.</p>
- <h2>That Transport Layer</h2>
- <p>I both love that ActivityPub is all over HTTP, and hate it.</p>
- <p>On the plus side, it means there's all manner of pre-existing load balancers,
- gateways, frameworks and more at our disposal. Plus, every programming language
- on Earth has some way of slinging JSON over HTTP.</p>
- <p>On the negative side, it's wildly inefficient. There's a lot of overhead for
- each individual call, <code>Accept</code> headers have to be bandied around everywhere,
- and there's a lot of HTTP implementation variation that has to be accounted for.</p>
- <p>If, magically, I could change it - would I go to something like SMTP, with its
- own port and protocol? I'm not entirely sure, to be honest - I do like the ease
- of entry with HTTP, and it does mean there's a lot of framing and encoding
- already agreed. Maybe HTTP as a base protocol with an optional TCP alternate
- for high-traffic servers to talk to each other over.</p>
- <p>The one thing I would get rid of, though, is JSON-LD. If you're not aware,
- ActivityPub is not just JSON - it's JSON-LD, which has schemas, namespaces,
- expanded and compressed forms, transforms, and all manner of other stuff.
- You need to transform each message to a canonical form before you parse it!</p>
- <p>I get the idea, but I was never an RDF fan (it's just JSON RDF, basically) and
- it just makes everything so much more complex. A plain
- JSON specification with known keys would have been better, I think, though
- I was not there when the spec was written, so I'm sure there's more context
- I lack.</p>
- <p>I do want to stress, though, that while I am not a huge fan of the transport
- layer, I think the object model is quite decent. If we could get
- <code>preferredDomain</code> in there along with some proper multi-size image support,
- I would be even happier than my usual buoyant self.</p>
- <h2>Difference Is Strength</h2>
- <p>A virtual Mastodon monopoly is not good for almost anyone, I think - I'm
- actually quite excited for Tumblr to implement ActivityPub, because it stands
- a chance of forcing protocol changes and improvements to be discussed,
- rather than directed almost entirely by one project.</p>
- <p>If we can get Takahē to even 5% of active users on the Fediverse, that would
- be a significant impact, too. I'm not sure we'll get there, but I do at least
- hope the attempt will also place its own bit of pressure on the protocol in
- terms of evolving and trying to fix some of the scaling issues we're all
- sailing directly towards.</p>
- <p>How to do that responsibly is another question - I would ideally like to make
- sure we have a server that is designed to easily handle things like DMCA
- requests, GDPR requests, and the awful spectre of terrorist content and CSAM.
- I'm looking into starting a fund to pay for some legal and compliance
- consultations on this front; it's the sort of work that
- every admin should not have to do themselves, and I'd love us to have a server
- designed to handle the requirements easily, and written guides as to how to do it.</p>
- <p>Still, for me, the focus right now is on growing Takahē and hopefully fostering some
- communities under its wing, and that means getting stability, efficiency,
- and working closely with people who want to use us to run communities. It's
- also about slowly fostering a set of people who look after it with sensible
- governance, so I'm not needed as a decision-making leader forever.</p>
- <p>I'm not yet focused on people migrating servers from Mastodon - supporting
- that is eventually on the roadmap, and I've reserved the appropriate URL
- patterns so actor/object URLs can move over seamlessly, but it's still a lot
- of work, and we're not ready for that quite yet.</p>
- <p>If you're interested in helping out with Takahē, do pop over to our
- <a href="https://discord.gg/qvQ39tAMvf">Discord</a> or email me at andrew@aeracode.org
- and mention what you'd like to help out with - there's a large
- <a href="https://docs.jointakahe.org/en/latest/contributing/">number of areas</a> we need
- help with, not just coding!</p>
|