Hacker News new | past | comments | ask | show | jobs | submit login
Activity Pub vs. Web Frameworks (danpalmer.me)
122 points by danpalmer on March 5, 2023 | hide | past | favorite | 52 comments



> When migrating between systems, this difference presents a problem.

How does it break anything? ActivityPub ids can be any URI. You dont need to migrate your original rest-style links to anything - just publish those as the ids...

And if you are going in the other direction (and why would you), then I cant think of a single app router that ENFORCES you to use rest-style routes. It may be encouraged with many, but its certainly not enforced.

> Can the framework function in most of the ways we’d expect from a modern web framework, but with every URI being a UUID?

URIs should already be unique. You can use those...

> These requirements suggest that the framework would likely be content based, rather than route based

ActivityPub objects contain the type, so thats ALWAYS going to be the case to reliably use 3rd-party ActivityPub objects. As for locally-sourced objects, you can still use your rest-style URIs and use those URIs as the ActivityPub ids!


> ActivityPub ids can be any URI.

I think that's the conclusion that TFA reaches also. However the premise it was started from is how Mastodon and Mastodon-likes do not do that.

> so thats ALWAYS going to be the case to reliably use 3rd-party ActivityPub objects.

From Mastodon's point of view there hardly are any '3-rd party' objects. Mastodon wants to store the whole world locally (which is one of the pain points that its admins find out sooner or later) instead of using the web as it was intended and saving shallow copies of remote objects that get refreshed when users want to access them. :(


Say I migrate my ActivityPub server from ProductX to ProductY, but I want to keep all the usernames and external follow relationships intact.

We need external people who follow @me@myserver.example.com seamlessly still follow at the new server, which has a different code an URL structure. Will it "just work"?


> I want to keep all the usernames and external follow relationships intact

ActivityPub doesnt have the concept of a "username" in the sense that you are thinking. An activitypub "preferredUsername" field is just a preferred display name for that actor. Instead they have actor ids - which are just URIs.

Behind the scenes, mastodon (and similar) simply treat @me@myserver.example.com as a webfinger ref. They do a lookup via webfinger which returns the actor id, and then they connect to that URI to return the respective object(s).

Of course, because your mastodon instance is also likely hosting your actor AND its respective collections AND its activities, then ALL the URIs are ALSO tied to their domain. In order to "migrate" they (probably - im not 100% sure) send updates for your actor object (and your other activity types) to your outbox and distribute to your followers, who will then update their own references to your actor, its aliases (such as the webfinger ref), and other associated collections and activities.

Its 100% possible to have a webfinger alias on your own domain and use separate hosts for your actor, specific collections, and each individual activity, should you wish. Mastodon can reach other activitypub actors and activities configured in this way, but it cant provide a similar service for its own users, due to its current implementation (it expects to be the originator of activities for its associated actors).


That will just work for any new software which follows the move mechanism built into Mastodon. Not all do still, but it's fairly new. When I moved from mastodon.social to my own instance, about ~80%-90% of my followers automatically migrated. Most of the ones left were on older version of Mastodon, and some non-Mastodon servers like Misskey.

What does not migrate currently is your old posts. If you control the target server, you do get a dump of your old posts and you can import them with some effort, but it's not automated. E.g. in my case, if you go to [1] you can see my old posts and a notice from mastodon.social that I have moved. The 40 followers remaining on that account are the few followers who either have not moved or have manually followed on my new account but not bothered to unfollow on my old.

Moving the posts have a few issues. One concern the Mastodon devs have that I don't agree with is that it might make it easier to "fake" your history. But you can trivially do that if you set up your instance anyway, and the exported posts are signed so they could check the signature on import, for what little benefit that'd give. The other is that anyone who have linked to the old posts would link to the old site. Fixing that (e.g. with redirects) would potentially require a mapping. It's doable, but would require a new mechanism (e.g. an activity posted from the new server telling the old server of the new URLs after import - the mechanism for the old server to know the new server is where you're moving is already in place).

[1] https://mastodon.social/@vidarh


Not quite. You can move Mastodon server, but you can't move _software_ on the same server.


Yes, you can. You will need to have both running at the same time if you want to use the built in migration mechanism in Mastodon (assuming the new software supports it), and so will need to ensure the URL space does not collide, e.g. by giving one of them a different hostname and adding an alias to the webfinger response.

If you are going to migrate software on the same server you'd likely be in a better place if you just generate static pages from the Mastodon export for the actual posts after doing the follower migration.

If you're outright replacing the Mastodon server you might get away with just importing the followers as well, as long as the software you're interacting with is well behaved and do webfinger lookups. You can improve on that by ensuring the inbox url stays the same, worst case with a redirect.


Sorry, by "same server" what I actually meant was the same domain. As far as I can tell there's no support for migrating Activity Pub content between software on the same domain.

Mastodon can move accounts only, and requires different domains. Mastodon's import/export can help with content assuming you're moving to another Mastodon instance, but won't do anything to update the URIs of content objects in other instances databases. Webfinger redirects can be used to move accounts across domains too.

However as far as I can tell there's no way to move arbitrary Activity Pub objects across domains.


> Sorry, by "same server" what I actually meant was the same domain.

There's nothing stopping it on the same domain either, as long as there's no conflicting URLs. It'll be harder to do "online", and may result in a bunch of redirect rules, and it definitely ought to be easier, but nothing prevents you from doing it.

But there's little reason to do that, because what users interact with your account by is the webfinger address, which can point to whichever hostnames you like. So you can migrate to a server on a different hostname, and then redirect the webfinger from your main domain name to the new host name, no problem at all.

> Mastodon can move accounts only, and requires different domains.

Using the built in move functionality, you'd run into that if you insist on never having the servers referenced by different hostnames. But if you're keeping it on the same domain that you control, you can "just" migrate the data. It could absolutely be made more convenient, sure.

> However as far as I can tell there's no way to move arbitrary Activity Pub objects across domains.

Define "move" here. You can't cause other instances to update their references, so if you can you'd want to leave redirects in place. You can however just move the objects over. It'd be nice if you could announce an update for the links as well, but since you can't avoid "external" links from places that don't federate at all, this will never be perfect.

That said, the Note activities are signed, so can be served up from another domain and can still be verified as long as the old pub-key can be obtained from the original URL or is cached by a trusted third party. But this is not a problem for the hypothetical scenario I replied to, as whether you keep it on the same hostname as you clarified you meant, or put it on another hostname as I suggested, nothing prevents you from maintaining the old posts as static content on the original hostname, so you don't need to move those objects. You can at least conceptually include them in your outbox so they're visible as part of the timeline (though I suspect that would either require a change to Mastodon if you insist on running Mastodon, or including them as "boosts"/ "Announce" activities instead of posts if you change the hostname)


Why does the new server need to have a new url structure? Why do the URLs need a "structure" anyway? That's what TFA and OP are actually saying.

Mastodon - partly due to the fact that it predates federating through ActivityPub - doesn't have the concept of URL as a unique identifier for an object (in the ActivityPub sense), but uses numeric ids, or uuids in some places.

If you really want "a structure", I wonder why nobody is implementing ActivityPub services[1] using the simplest and dumb way of mapping objects in storage using their ids.

Ie: an actor has the ID https://example.com/user1, therefore it's inbox/outbox should be https://example.com/user1/inbox|outbox. Then, an activity created by that user can be found at https://example.com/user1/outbox/{unique-activity-id}. If the activity is a Create (the only ones with a side effect of creating an object), then this object can have the id https://example.com/user1/outbox/{unique-activity-id}/object, samd.

[1] My own fediverse software does this for multiple types of storage: filesystem hierarchy, k/v store, database tables, but I haven't seen anything else similar.


> using the simplest and dumb way of mapping objects in storage using their ids

> https://example.com/user1/outbox/{unique-activity-id}

This isn't actually the simplest way, and is what makes it hard to move between software. Mastodon might use /userid/outbox/activityid, GoToSocial might use /users/<id>/posts/<id>, and so on.

The "simplest" way to do this is to treat the URLs as the ids themselves, which is what the Activity Pub spec defines. This should be the most interoperable.


But it doesn't matter what each software does because the necessary endpoints are defined in your actor object linked from your webfinger entry. That's the only endpoints the user agent is supposed to see and use.

Now if you're saying the ActivityPub spec is tragically underdeveloped, sometimes on purpose it seems, and doesn't define actually useful stuff but leaves it "to the particularities of implementations" then I fully agree with you


Maybe true with endpoints in the actor object, but clients should be able to refresh any object from its URI at any time, I believe this is a spec requirement rather than just advised, although I could be wrong.

As for Activity Pub lacking clarity, I do completely agree, it's going to be years before we have a good understanding of what the spec actually necessitates in production implementations.


This would be "interoperable" in that it's easy to go from one framework/software to the other, but it's still not transparent. The current way of doing things is to use the url of the object in the search bar, and thus use the search bar as the "new" url bar. In practice it is more interoperable because all softwares then behave the same.


If you mean changing domain names, I think in theory you can setup redirects with Webfinger.


You're right, you can, and it works. E.g. my Mastodon id is @vidar@galaxybound.com, but my Mastodon server is hosted on m.galaxybound.com. The webfinger on galaxybound.com is just a redirect. In addition to redirecting the webfinger query, the webfinger results themselves can point to an inbox etc. elsewhere.


I think there is an simple solution to this problem. It does require some changes, but they seem to be pretty straightforward:

- When installing any app, user must always specify an arbitrary prefix. It is applied to all pages (and resources) except toplevel one. So you never have "hostname.com/posts/1234", instead it is "hostname.com/m1/posts/1234"

- When switching to a new software, you use web spider/export functionality to convert the entire old prefix to static files. Static files are fast, and you can easily keep 100000 of them on average computer (and if not, you can store them in sqlite or some immutable k-v store). You then expose the static versions, either directly or via some trivial app.

- New ActivityPub goes to different prefix, so new posts are "hostname.com/m2/post/1234"

- The only thing that _might_ need redirects is users, to get following to work.. Those will need to be set up manually. But there are much fewer users than other objects, so this should be pretty doable.

----

Does this needs some changes to be feasible? yes. Is it easier than designing all-new scheme? Absolutely.


This is a good idea that I hadn't fully considered. There are a few issues that make it tricky in the general case though – for existing implementations there may not be a prefix that can be used, and if we're talking about adding prefixes to all new systems, we could just use frameworks that have built-in support for this.

Additionally, web scraping isn't perfect. It assumes static content, which may not be possible with ACLs or other behaviour. It also assumes that there's a single link entry point to the content, or that all the necessary link entry points are known. It could also be very wasteful – storing a tweet in a database is much smaller than storing the JSON response for that tweet, or the HTML of the page the tweet is in.

This is an interesting idea to consider, and it could be simpler in some cases, but I'm not sure it's a better solution in all cases.


There's nothing that requires the post URLs to even be on the same hostname as the user address. Mine are not. So switching locations for where the content is served up from for a new install is always possible. My Mastodon/Fediverse address is @vidar@galaxybound.com, while my posts are all hosted on m.galaxybound.com, e.g. [1]. In this case this is a result of installing Mastodon on the latter, while the former just redirects the webfinger, but there's AFAIK also nothing preventing you from e.g. the address being @user@domain1, the profile being on domain2, and the canonical URI of the posts being on domain3.

In terms of being wasteful, that might be a consideration for large shared hosts, but in reality media storage will outpace it by a large factor for most users. E.g. when I moved my account from mastodon.social, the export had 9.6MB of media I'd posted, and the uncompressed JSON for all my posts took up just 800K. That's despite being someone who posts relatively few pictures and very rarely any video.

> Additionally, web scraping isn't perfect. It assumes static content, which may not be possible with ACLs or other behaviour. It also assumes that there's a single link entry point to the content, or that all the necessary link entry points are known.

You don't need to scrape, as Mastodon supports export (but that will not be true for every type of fediverse software), but you can scrape as for federation to work your endpoints need to be accessible to all but instances that are defederated. You can also get at every posts trivially by looking up a user in webfinger to find their outbox, retrieve their outbox and retrieve the link to every post. This works for every piece of fediverse software that cares about ensuring federation works.

[1] https://m.galaxybound.com/@vidar/109965336245254807


It's true there's a disconnection between the account domain and the hosting domain, I already use this, but the Activity Pub object IDs are on the _hosting domain_, and if you want to move that you do need the IDs to continue to resolve in order to not (partially) defederate.

As far as I am aware there's no other good way to move that content between software on the same domain. Mastodon will import/export, but that data still assumes it'll be backed by Mastodon code with the Mastodon database schema, which is not correct if you're migrating to or from Mastodon.

> but you can scrape as for federation to work your endpoints need to be accessible to all but instances that are defederated

...with Activity Pub. Mostly.

This is not true for all linked data, and my post was mostly looking at the general case of linked data rather than sticking to Mastodon or Activity Pub in particular. Activity Pub can be authenticated in some cases, although I think that may be something Mastodon has implemented on top. Not sure.


> It's true there's a disconnection between the account domain and the hosting domain, I already use this, but the Activity Pub object IDs are on the _hosting domain_, and if you want to move that you do need the IDs to continue to resolve in order to not (partially) defederate.

Yes, that is true. There's nothing you can do about that without breaking basic web expectations of URLs staying the same. The new endpoints can serve up the old content or Announce references to them, but the old URLs do need to continue resolving and at a minimum serve up a redirect if you want maximum availability.

It would be a nice improvement to have a URL scheme that allowed referencing posts relative to a webfinger lookup to reduce the impact of that.

> As far as I am aware there's no other good way to move that content between software on the same domain. Mastodon will import/export, but that data still assumes it'll be backed by Mastodon code with the Mastodon database schema, which is not correct if you're migrating to or from Mastodon.

The export is as an archive of JSON representing ActivityPub activities and objects. There's very little in the export that is Mastodon specific in any way. I have lots of complaints about Mastodon, but the export format is not one of them. If you write ActivityPub software, just support that format.

> Activity Pub can be authenticated in some cases, although I think that may be something Mastodon has implemented on top. Not sure.

None of the retrieval endpoints for the core activity data can be authenticated in Mastodon if you want federation to keep working properly, and much more data is retrievable without auth unless the user has marked specific subsets of data private. But if you're crawling your own account, with the right Accept: header, you'll get pretty much the same content in mostly the same (ActivityPub JSON) format as you get from an export.


You already commit a big fallacy in my opinion implying that objects need to be related to database access.

Once an object has been created on an instance - with very few exceptions - there's no need to tamper with it. Storing it as a JSON on disk is perfectly acceptable.


JSON on disk is a perfectly acceptable database format ;)

More seriously, I used "database" as a proxy for looking up in storage of some sort. Whether that's JSON files on disk, relational data in a SQL database, or anything in between, is just down to the access, consistency, durability, etc, requirements of the system.


> You then expose the static versions, either directly or via some trivial app.

Most ActivityPub objects are already static and immutable, the sooner fediverse developers realize this and simplify the code paths between reading a URL and returning the static and immutable object that has that URL as an ID, the better the ecosystem will be.

It's sad that everyone insists in wrapping costly database accesses and logic on top of what should be very, very simple logic.


Activities are immutable but objects are not. Mutability allows posts to be edited, for instance.

The ids in ActivityPub also aren't URLs, they're random strings.

EDIT: They actually are URLs, my mistake


Are you sure? From the spec, object IDs must be:

> Publicly dereferencable URIs, such as HTTPS URIs, with their authority belonging to that of their originating server

(or they can be null in some cases)


All of your links breaking is the good case. Parse all the URIs out of your log, capture static copies, and then serve them alongside the new implementation before dropping to 404.


> This is in direct contrast to typical REST-ish APIs that may use arbitrary identifiers. In these systems, clients must understand the identifiers and how they compose into paths to be used as URIs to request content.

> Unfortunately almost all web frameworks are designed for the REST-ish applications where the client constructs URIs, and have the concept of a router based on path segments.

I would argue that most frameworks do not really use REST, they use a form of HTTP and JSON based RPC. This goes back to the discussion about HATEOAS. In this case we see a clear advantage HATEOAS has with regards to architecture flexibility and ability to decentralize. With a HATEOAS REST API you do not construct URIs, you just follow them.


Absolutely, this is why I used the term "REST-ish" because while it's the term the industry have decided to use, it's exactly because it's not really REST that it causes these issues.


> Being content based implies database queries for every request as in the previously mentioned option (3), but by raising this functionality to the framework level, more optimisations may be implemented, and correctness ensured, in one place, likely resulting in a lower impact of those additional queries.

I'm not at all convinced that the performance overhead of a database hit for every incoming request in order to resolve the URL is significant enough in any of the widely used frameworks to justify throwing them out and doing something different.

Most database-backed web apps I've worked on in my career have had a dozen or more SQL queries executing per page rendered.

Adding an indexed lookup against a URL table of some sort is a tiny additional load - and it's also usually quite straight forward to add a cache using memcache or Redis or Varnish if is starts to become a problem.

URL lookups are a lot more likely to work as a simple TTL based cache than most other kinds of query.


My perspective here was that 1 database query implemented by a framework with: good caching, perhaps canonicalisation to ensure small caches (e.g. query parameters ordering?), the right database indexes, the right response behaviour, etc, would be fine, but implementing this well in a middleware added by developers using a framework would be hard to do well and could be significantly higher cost.

In my experience, implementing things like access control, requiring a single database query, in a middleware that applies to every request, does actually become a noticeable overhead and problem at some point. Sure many pages might be doing 20 queries so adding one more isn't a problem, but at some point there might be an API that needs to respond quickly, does <1 query on average with caching, and therefore has _double_ the cost with this feature. Because the URLs are supposed to be opaque in the general case, this still needs to be applied everywhere.

This can definitely all be optimised, but I think doing this at the framework level is a much better solution.


Take the link as a global id and the initial value for url then allow url to be updated.

It means you have to trust the redirect chain.


We need good patterns for ‘dereferencing’ dns-bound identifiers like many https://{dns} URLs into something like a multiformats/cid[1]

[1]: https://github.com/multiformats/cid


URLs/URIs/IRIs are already as federated as they can be, there's no need to add an abstraction on top of that. Why do you think differently?


I love the idea of every route on my site being just a UUID :) I've never thought of that, despite being a huge fan of UUIDs. I might go and experiment a bit with this.

Anyone done this before? Any learnings?


I wish OP explained the issues with GoToSocial


Bugs, lack of functionality, incompatibility with clients. Also I realised that there wasn’t any need to - Mastodon will run for a single user on pretty much anything so why do I need something that runs twice as fast?

Also, I completely understand maintaining open source software is hard, but GoToSocial’s approach to addressing this was to entirely disable all issues and pull requests for a month over the holiday period, and I think that’s the wrong approach. I didn’t report any of the bugs because I was unable to do so. I would have understood not receiving a response for a few weeks, or even at all, but to not even be able to post them was a strange choice. It turned me off the software a bit, what if someone wanted to report a vulnerability?


Wouldn't ipfs be a good fit for this?


On the topic of IPFS, but also slightly tangential -- how has nobody written an Activity Pub client/server in Javascript and distributed it as a Browser Addon?

If an IPFS node/server can be literally hosted via a Firefox Extension[1], how has nobody done the same for an arguably easier protocol like Activity Pub?

1: https://github.com/ipfs/js-ipfs


Ipfs is a pull protocol. Activitypub is normally implemented as a push one (despite the specification having outboxes, so in theory it should be able to do either)


The ideas behind IPFS definitely seem to reflect the unchangeable URL intentions of this article. However, IPFS is also more limited, as edits and updates require special handling and interactivity is simply not possible.

Furthermore, IPFS is too slow to be usable in practice. I'd love it to become a serious alternative to the web in terms of documents and blog posts, but the amount of timeouts and 30+ second loading times you need to tolerate to get any IPFS page to load makes it very difficult to tolerate.

At least with the NFT scams IPFS found a real-life purpose, something I hadn't expected to see. That's dying down now, I think, so who knows how long the most active pin servers will stick around.


IPFS would still be a good candidate for media attachments in Mastodon, potentially saving quite a bit of resources in bandwidth and storage.

The performance of ipfs gateways has improved a lot lately, so even without a native ipfs:// protocol handler that could work nicely in clients.


Bittorrent has been there for decades and can solve the issue ofedia attachments way more easily than ipfs


Are there gateways to access bittorrent resources from https:// ?


There are multiple libraries and software giving you a gateway, but there's no public gateway. It would fall into problematic use in less than an hour because of the insanity that IP is, and it would completely contradict the whole idea of being decentralized


If no one is paying, content on IPFS will disappear, and very soon.


Maybe making everything a URL isn't a great idea, particularly when a lot of people on Mastodon have privacy concerns and don't want to be found?

If you want people find your articles via links from other websites and search engines, better to put it on a blog.


I'm a bit confused by people using a protocol that pushes their content to a large number of other people's computers if they don't want to be found. ActivityPub is inherently very public, even more so than publishing content to a website.


Posting on mastodon is called microblogging for a reason - cause its a public blog. One can host a single-person instance and quite literally have a mastodon based blog site. Some people seem to confuse it with private messaging. User confusion about what an application or service does or does not is not a good basis on which to base a technical design, though it obviously hints that the particular app was not exactly "self-documenting".

One can only speculate about how the confusion came about, but privacy advocates posting... publicly on mastodon may have unwillingly contributed to a sense that this space is "different" and "better" in every possible way. Yes, the fediverse/mastodon is different and better but in specific ways: it doesnt use algorithmic timelines, it doesn't push ads, it doesnt use dark addiction patterns etc. (though all of those might be reversed for particular instances or new server types).

Unbundling the different issues and educating the broad base of users about what is going on is not just healthy and honest towards users, it may also contribute to having more well defined and stable user requirements about different types of online platforms.


>people on Mastodon have privacy concerns and don't want to be found?

Maybe don't use social media or the internet? Especially with your real name and mugshot?


I used to think this. But it turns out some people have an enormously more pleasant experience if they're slightly harder to harass. Lots of harassers turn out to be lazy.


In that case, we should eliminate social media profiles entirely. I've been an advocate of this for a long time.* It should work like mailing lists, where someone's message is just there, and there isn't any way to click their name and look at a reverse chronological feed of all their posts. This also maps to real life. There's no bubble floating above people's heads that lets you track down and pore over what a random stranger you see at the grocery store said at their friend's dinnerparty the night before.

* It's beyond annoying that GitHub only just now got the ability to turn off how it broadcasts your activity on your profile page whether you wanted it to or not




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: