A place to cache linked articles (think custom and personal wayback machine)
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

index.md 11KB

1 year ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168
  1. title: Understanding A Protocol
  2. url: https://aeracode.org/2022/12/05/understanding-a-protocol/
  3. hash_url: 4b5bae499ad13fe0f5413d8c7b77c09a
  4. <p>Yesterday I pushed out the <a href="https://docs.jointakahe.org/en/latest/releases/0.5/">0.5.0 release of Takahē</a>,
  5. and while there's plenty left to do, this release is somewhat of a milestone
  6. in its own right, as it essentially marks the point where I've implemented
  7. enough of ActivityPub to shift focus.</p>
  8. <p>With the implementation of image posting in this release, there are now only
  9. a few things left at a <em>protocol</em> level that I know I'm missing:</p>
  10. <ul>
  11. <li>
  12. <p>Custom emoji (these are custom per-server and a mapping of name-to-image
  13. comes with each post)</p>
  14. </li>
  15. <li>
  16. <p>Reply fan-out to the original author's followers</p>
  17. </li>
  18. <li>
  19. <p>Pinned posts on profiles (and collections in general)</p>
  20. </li>
  21. <li>
  22. <p>Shared inbox delivery (to reduce fan-out requests)</p>
  23. </li>
  24. </ul>
  25. <p>My current aim is to get Takahē to a point where a few small communities can
  26. run on it (including takahe.social), and while these are nice,
  27. they are not critical for that. The reply fan-out is probably most important,
  28. but is also the easiest given what we have written already.</p>
  29. <p>Instead, it's now time to shift and focus on stability and efficiency. My
  30. general tactic for big new projects like this is an initial "spike" period,
  31. where I am more focused on pushing out code with a roughly correct architecture
  32. rather than focusing on query efficiency, caching or the like, and to then
  33. shift gears into more of a "polish" period.</p>
  34. <p>Takahē is actually pretty useable for me as a daily driver for
  35. the <a href="https://takahe.social/@takahe@jointakahe.org/">@takahe@jointakahe.org</a>
  36. account - sure, I find a few bugs here or there, but it's honestly not bad.
  37. That means, to me, it's time to shift focus a bit more towards polishing.</p>
  38. <p>The other big missing feature for a community at this point is probably having
  39. mobile app support (which I plan to do by implementing a Mastodon-compatible
  40. client API) and better moderation features (reporting and user blocking,
  41. in addition to the existing server blocking).</p>
  42. <p>So, I'm going to focus on adding those, polishing, and improving efficiency; there's now
  43. quite a few other contributors to the project who have been helping out
  44. with bugfixes, efficiency, and plenty more, which is helping a great deal.</p>
  45. <p>I'll also be sending out a few
  46. invitations to <a href="https://takahe.social">takahe.social</a> to use that as a testbed
  47. as the first small community; nothing like dogfooding your own software to see
  48. what it needs (as well as asking some existing Mastodon admins for their
  49. thoughts, if they are gracious enough to lend me some of their time).</p>
  50. <p>Still, though, getting to this point is quite a big deal - I feel like I've
  51. learned a lot about ActivityPub and its related specifications by implementing
  52. them. So let's talk about it a little bit.</p>
  53. <h2>Fan-Out</h2>
  54. <p>ActivityPub is all about "fan-out" - the process of getting posts from their
  55. authors to their followers. At the basic level, this means one HTTP request
  56. per follower to deliver it to their inbox - but there's some efficiency gains
  57. to be made with "shared inboxes", where you can push things on a per-server
  58. basis rather than per-user.</p>
  59. <p>Obviously, doing this is noisy and takes a lot of requests, and has to be done
  60. as background workers - especially as the server on the other end might be
  61. down when you try and send the message over, and you need to retry.</p>
  62. <p>Plus, whenever you reply to someone, that reply is then sent to every one of
  63. their followers so that it can appear in reply threads. This means there's an
  64. increasing amplification effect as you get more and more followers, and your
  65. server spends a lot of its life just sending request and getting requests from
  66. other servers.</p>
  67. <p>There's other aspects to fan-out, though; there was an
  68. <a href="https://ar.al/2022/11/09/is-the-fediverse-about-to-get-fryed-or-why-every-toot-is-also-a-potential-denial-of-service-attack/">excellent blog post</a>
  69. about that last month that outlines the problems with link previews. See, when
  70. you post a message to a server's inbox, Mastodon (at least) goes and fetches
  71. any image attachments, and tries to generate web previews for any links. If
  72. people have mobile clients, some of those will <em>also</em> try to fetch previews.
  73. This does not end well for unprepared servers - and for those links, they
  74. could just be someone's random blog.</p>
  75. <p>Takahē does not do this prefetching yet - we'll likely never do it for the
  76. link previews (but some clients connected to us will, once there's a client API).
  77. Post attachments and profile images are a different story - we need to at
  78. least proxy those for user privacy, but we can hopefully make it a caching
  79. proxy. If users have their timelines open, there's not a big difference between
  80. a caching proxy and prefetching for the source server, either.</p>
  81. <p>How do we solve this? Well, bundling some of this data into the original post
  82. is one idea; having shared caching proxies split between multiple servers is
  83. another potential one as well.</p>
  84. <p>That brings us, though, to the push-pull of scaling that's at the heart of
  85. ActivityPub.</p>
  86. <h2>Two Axes Of Scaling</h2>
  87. <p>In a previous post about ActivityPub and Takahē, I referred to the fact that
  88. there are two "axes" of scaling available in the protocol:</p>
  89. <ul>
  90. <li>Having more people per server/instance</li>
  91. <li>Running more servers/instances</li>
  92. </ul>
  93. <p>Both of these have their pros and cons, and it's hard to go all in on one of
  94. them - having a million people on one server is difficult to scale for that
  95. individual server (you have to start building it as its own distributed
  96. system), but having a million servers makes the fan-out problem even worse
  97. (say hello to massive prefetching loads and shared inboxes being not very
  98. useful).</p>
  99. <p>It seems to me like a bit of a separation between domain, moderator, and
  100. caching store is needed - Takahē already lets multiple domains be on a single
  101. server, but the server moderation and caching are scoped just to that one server.</p>
  102. <p>I do believe that sharing moderation across domains is a very important scaling
  103. step; this doesn't have to purely be "multiple domains on the same server",
  104. either - I think there's scope for a moderation API where you can have a
  105. team of professional (volunteer or paid) moderators look after multiple
  106. servers.</p>
  107. <p>Sharing caching and previews is also important, though; if there was just ten
  108. or so link preview caches around, and all servers used one of them, then we
  109. still avoid centralisation while massively lowering the load on the target
  110. of links.</p>
  111. <h2>That Transport Layer</h2>
  112. <p>I both love that ActivityPub is all over HTTP, and hate it.</p>
  113. <p>On the plus side, it means there's all manner of pre-existing load balancers,
  114. gateways, frameworks and more at our disposal. Plus, every programming language
  115. on Earth has some way of slinging JSON over HTTP.</p>
  116. <p>On the negative side, it's wildly inefficient. There's a lot of overhead for
  117. each individual call, <code>Accept</code> headers have to be bandied around everywhere,
  118. and there's a lot of HTTP implementation variation that has to be accounted for.</p>
  119. <p>If, magically, I could change it - would I go to something like SMTP, with its
  120. own port and protocol? I'm not entirely sure, to be honest - I do like the ease
  121. of entry with HTTP, and it does mean there's a lot of framing and encoding
  122. already agreed. Maybe HTTP as a base protocol with an optional TCP alternate
  123. for high-traffic servers to talk to each other over.</p>
  124. <p>The one thing I would get rid of, though, is JSON-LD. If you're not aware,
  125. ActivityPub is not just JSON - it's JSON-LD, which has schemas, namespaces,
  126. expanded and compressed forms, transforms, and all manner of other stuff.
  127. You need to transform each message to a canonical form before you parse it!</p>
  128. <p>I get the idea, but I was never an RDF fan (it's just JSON RDF, basically) and
  129. it just makes everything so much more complex. A plain
  130. JSON specification with known keys would have been better, I think, though
  131. I was not there when the spec was written, so I'm sure there's more context
  132. I lack.</p>
  133. <p>I do want to stress, though, that while I am not a huge fan of the transport
  134. layer, I think the object model is quite decent. If we could get
  135. <code>preferredDomain</code> in there along with some proper multi-size image support,
  136. I would be even happier than my usual buoyant self.</p>
  137. <h2>Difference Is Strength</h2>
  138. <p>A virtual Mastodon monopoly is not good for almost anyone, I think - I'm
  139. actually quite excited for Tumblr to implement ActivityPub, because it stands
  140. a chance of forcing protocol changes and improvements to be discussed,
  141. rather than directed almost entirely by one project.</p>
  142. <p>If we can get Takahē to even 5% of active users on the Fediverse, that would
  143. be a significant impact, too. I'm not sure we'll get there, but I do at least
  144. hope the attempt will also place its own bit of pressure on the protocol in
  145. terms of evolving and trying to fix some of the scaling issues we're all
  146. sailing directly towards.</p>
  147. <p>How to do that responsibly is another question - I would ideally like to make
  148. sure we have a server that is designed to easily handle things like DMCA
  149. requests, GDPR requests, and the awful spectre of terrorist content and CSAM.
  150. I'm looking into starting a fund to pay for some legal and compliance
  151. consultations on this front; it's the sort of work that
  152. every admin should not have to do themselves, and I'd love us to have a server
  153. designed to handle the requirements easily, and written guides as to how to do it.</p>
  154. <p>Still, for me, the focus right now is on growing Takahē and hopefully fostering some
  155. communities under its wing, and that means getting stability, efficiency,
  156. and working closely with people who want to use us to run communities. It's
  157. also about slowly fostering a set of people who look after it with sensible
  158. governance, so I'm not needed as a decision-making leader forever.</p>
  159. <p>I'm not yet focused on people migrating servers from Mastodon - supporting
  160. that is eventually on the roadmap, and I've reserved the appropriate URL
  161. patterns so actor/object URLs can move over seamlessly, but it's still a lot
  162. of work, and we're not ready for that quite yet.</p>
  163. <p>If you're interested in helping out with Takahē, do pop over to our
  164. <a href="https://discord.gg/qvQ39tAMvf">Discord</a> or email me at andrew@aeracode.org
  165. and mention what you'd like to help out with - there's a large
  166. <a href="https://docs.jointakahe.org/en/latest/contributing/">number of areas</a> we need
  167. help with, not just coding!</p>