What are the activity_id formats for various platforms?
from admiralpatrick@lemmy.world to fediverse@lemmy.world on 16 Sep 14:44
https://lemmy.world/post/36006992

TL;DR: Any of you who are more familiar with Fediverse platforms that aren’t Lemmy/Piefed, can you let me know what the AP_IDs look like for users, posts, comments, and, if applicable, communities?

So, I’ve rewritten the search / search boxes in Tesseract to skip the search and directly resolve activity pub URLs for users, posts, comments, and communities. I’m loving this as it makes things so much faster and easier.

To make that work, and reduce false positives/negatives, I have to do some pre-flight checks on the URL that’s submitted to the search.

Currently, it checks if the domain is to a known federated instance and looks for specific paths in the URL. If it detects the URL is an AP_ID URL, it will only resolve the object and redirect you to it (skipping the lengthy search step). For false negatives, it will pass it to the regular search but still try a federated lookup along with the search.

For Lemmy and Piefed, those are:

For Mbin, I think it’s the same except it uses /m/ for communities (they call them “magazines” I believe).

I think mastoon uses /user or maybe /username/ in the AP identifiers?

Any of you who are more familiar with Fediverse platforms that aren’t Lemmy/Piefed, can you let me know what the AP_IDs look like for users, posts, comments, and, if applicable, communities?

#fediverse

threaded - newest

Jayjader@jlai.lu on 16 Sep 16:18 next collapse

From my own experience querying public mastodon timelines via API (edit: removed incorrect /api/v1s in the AP_IDs):

  • Mastodon user accounts have an ActivityPub URI of https://<instance.domain.tld>/users/<username>
  • Mastodon posts have an ActivityPub URI of https://<instance.domain.tld>/users/<post_author_username>/statuses/<post_id> (they also have a url property of https://<instance.domain.tld>/@<post_author_username>/<post_id> but that tends to serve the html view of the post)

To see for yourself, pick an instance that allows viewing their public timeline without logging in (mastodon.social is perfect for this) and follow the “Playing with public data” section of the docs. That page ellides most of the info you’re looking for in the example payloads they give (as the JSON payloads themself are quite large and nested), but I can assure you that AP_IDs for user accounts and posts can be found pretty quickly from a single timeline query.

I don’t think Mastodon has any notion of community, nor does it distinguish between posts and comments (when following a lemmy community, both posts and comments show up in my masto feed as “top-level” statuses (ie posts)).

admiralpatrick@lemmy.world on 16 Sep 17:43 collapse

Cool, thanks. I was close with /user guessing from memory.

I think the /users/…/post_id will be sufficient. It just needs to know that the given URL is an AP_ID before passing it off to the API call to resolveObject. Since it already knows instance.domain.tld is a federated instance, it just needs to see if the path is an AP_ID or the HTML (or something else). Thus, I don’t have to parse the whole thing, just check that enough of it matches.

Thanks!

rglullis@communick.news on 16 Sep 17:43 next collapse

So, I’ve rewritten the search / search boxes in Tesseract to skip the search and directly resolve activity pub URLs for users, posts, comments, and communities. I’m loving this as it makes things so much faster and easier.

Isn’t that the whole point of webfinger? Moreover, why would you paint yourself into a corner and hardcode the logic for all the different types of services, if ActivityPub uses JSON-LD and therefore provides a straightforward method for document dereferencing?

I’m not trying to be snarky. It’s just that I’m writing ActivityPub server where the id of each object is just an ULID, because to the server there is zero difference between serving the information about an actor or an activity.

admiralpatrick@lemmy.world on 16 Sep 18:14 collapse

We’ve had this discussion :)

This application is written against the Lemmy API. It only speaks API. Eventually it’ll speak Piefed API as well, but right now, only Lemmy API.

Lemmy and Piefed only do server-to-server Activity Pub and not client-to-server AP. Clients have to use the API to interact with them. This is a Lemmy (and eventually Piefed) client.

rglullis@communick.news on 16 Sep 19:13 collapse

But then why do you worry about the ap_id patterns from other software?

admiralpatrick@lemmy.world on 16 Sep 20:41 collapse

I’m making an “omnisearch” box.

Paste in an AP_ID into the search field, and it auto-resolves it and redirects you to your instance’s local copy (which is very fast) instead of going through the whole search process (which is slow). To prevent false positives, I’m matching the various ap_id formats and only doing the resolution on those; anything else gets passed to search.

Anything else that falls through the cracks just gets passed to search as usual (which also does a resolveObject lookup).

It’s to make life easier.

julian@activitypub.space on 16 Sep 20:08 next collapse

@admiralpatrick@lemmy.world I think you would be better served by checking for the Link header. NodeBB and WordPress do it, if that gives you some idea of implementation?

julian@activitypub.space on 16 Sep 20:40 collapse

It took me a minute to find, but it is detailed in @evan@cosocial.ca's write up about HTTP Discovery of ActivityPub Objects.

This is probably exactly what you're looking for.

https://swicg.github.io/activitypub-html-discovery/

I think your current approach has merit but is limited. If you know the instance software by URL and can resolve it using path matching without the use of a pre-flight request, that's absolutely a better way forward. The downside is you have to know the URL patterns of every software. You'll never "catch 'em all"!

However, if that method fails, doing a pre-flight check to grab Link also works and is a viable way forward.

You can test against NodeBB users or posts.

admiralpatrick@lemmy.world on 17 Sep 14:23 collapse

I think you would be better served by checking for the Link header

Can’t really do that, client-side, in a browser application. CORS is a perpetual cockblock (though I understand why it is), and I’d rather not make an internal API endpoint to do the lookup.

The application polls Lemmy’s getFederatedInstances API endpoint at startup, so it has a list of every activity pub server your instance knows about. That’s the first and primary check for the URL that’s being searched.

The second check is just to rule out non activity pub URLs that point to a federated instance (e…g. lemmy.world/modlog, lemm.world/pictrs/image/blah.webp, etc).

Goal isn’t to “catch 'em all” but to catch the most used ones. If there’s one I don’t account for, either by omission or because the federated platform didn’t exist when I made the patterns, then it will just fall back to a regular search which also includes trying to resolve it as a federated URL (which is the current behavior in all prior versions).

The goal is just to simply short-circuit the search behavior if the query is a known ap_id URL in order to avoid a lengthy search process and quickly redirect you to your instance’s local copy.

julian@activitypub.space on 19 Sep 01:47 collapse

Can you not call fetch() to do a HEAD call? Maybe I'm mistaken about it but it should be ok.

CORS is indeed a wrench that gets thrown in when you least expect it...

moseschrute@lemmy.world on 19 Sep 00:32 collapse

I maintain my own Lemmy client (Blorp), and this sounds like a cool idea. How do you get your known list of federated instances?

I currently have my own threadiverse crawler I wrote, but I disregard any Lemmy/PieFed instance with <20 monthly active users. That brings the list down to about 63 Lemmy instances and 7 PieFed. I wonder if that list is extensive enough to implement the resolve object mechanism you mentioned.

admiralpatrick@lemmy.world on 19 Sep 15:59 collapse

At startup, it calls /api/v3/federated_instances and stores the result to a lookup variable. Then I’ve got a couple of helper functions that accept either an instance ID or a domain name which looks them up from the lookup variable.

moseschrute@lemmy.world on 19 Sep 18:07 collapse

Ahh that makes sense. I guess you couldn’t search anything your instance doesn’t federate with anyway.

admiralpatrick@lemmy.world on 19 Sep 23:56 collapse

I believe you can, yeah, and I also think that “bootstraps” that instance to yours if it doesn’t already know about it. But in that case, the way I have the search written, it’ll “fall back” to regular search which also does resolveObject. That just takes longer.

The ap_id check is just to short-circuit that behavior to avoid the lengthy, often unnecessary, search and quickly redirect you to your instance’s local copy.

Have had that working for about a week now, and it’s pretty nice. Please do steal this feature lol.

moseschrute@lemmy.world on 20 Sep 06:32 collapse

I believe it bootstraps the object, not the instance. You still won’t be able to find an object from an instance you don’t federate with.

This is based on an explanation from @MrKaplan@lemmy.world, unless I misunderstood MrKaplan?

admiralpatrick@lemmy.world on 20 Sep 12:27 collapse

Oh, I meant just if the instance isn’t know, I thought resolving would make it “aware” of that instance. I could be wrong. But yeah, the instance would have to federate with the other one for it to be able to resolve, though. e.g. it won’t resolve an object from an instance that is on the current instance’s “block” list.

moseschrute@lemmy.world on 21 Sep 10:05 collapse

Sorry to bother you here, but I send you a PM. I would love to collaborate more. I have a group chat with the other Lemmy/PieFed devs. Some of the devs are already working on shared logic/libraries between apps. I also 100% understand if you’re spread too thin and don’t want more messages.

admiralpatrick@lemmy.world on 21 Sep 12:19 collapse

but I send you a PM

Oh, sorry. One of the new features in this dev branch is the ability to disable PMs and mentions. I’ve been running with those turned off. Seems like that feature is working lol.

I turned DMs back on and found the message - will try to join here when I’m back on desktop. Dunno how active I can be right now, but I am eventually going to start on Piefed so would be nice to have a sounding board.

Some of the devs are already working on shared logic/libraries between apps.

Nice!

moseschrute@lemmy.world on 21 Sep 12:29 collapse

One of the libraries we talked about, though @aeharding@vger.social did all the work, is an abstraction built over lemmy-js-client that automatically adds PieFed support. I really think opening communication between everyone will unlock more opportunity for collaboration.

www.npmjs.com/package/threadiverse