The current state of context resolution
from julian@community.nodebb.org to swicg-threadiverse-wg@community.nodebb.org on 18 Jul 2024 16:13
https://community.nodebb.org/post/100389

tl;dr — conversation backfill and synchronization via resolvable context; potential FEP.

This topic is an extension of an earlier discussion: How do you use context (if at all)?

We came out of May's ForumWG meeting with a sense that pursuing formalisation of the context property was a step in the right direction. I later built out a resolvable context collection as part of this effort.

Currently, if you are given a standalone activitypub object, you might not have any or all of the conversation surrounding it. That's part-and-parcel of the design of ActivityPub — that content is pushed to various federated instances, as opposed to one centralized authority —but is a source of some concern as end-users continually remark on how various instances have different reply sets, and worse yet, even the original site may not have the entire conversation.

I can hear @evan@cosocial.ca now:

"ActivityPub is a push and pull-based API!!" — Evan Prodromou

Agreed! Although, while you can pull public objects via ActivityPub, you can't pull said objects if you don't know they exist. Here are your options for building/resolving any single object's conversational context:

New this week is a proof-of-concept implementation of a "context synchronization" mechanic. Using similar mechanics to Mastodon's FEP-8fcf (Followers collection synchronization across servers), I propose servers can compute a digest for a context collection via its object ids, and serve them using the common ETag header. Recipients may opt to calculate their own digest and begin backfill on digest mismatch. Optionally, the If-None-Match header containing that digest can be sent, allowing the origin server to respond with an even simpler 304 Not Modified.

Technical details re: topic synchronization.


Backfill and sync are both still limited availability; only NodeBB supports them currently. However, I'm working with Angus (building out the Discourse AP integration) to expand support, and I'd like to eventually publish an FEP and SocialCG report to make this all pseudo-official.

We intend to discuss our research at this month's ForumWG (August 1st; 1300 EDT), join us and let's see where this goes!

#activitypub #forumwg #swicg #swicg-threadiverse-wg

threaded - newest

jdp23@blahaj.zone on 18 Jul 2024 16:19 next collapse

@julian@community.nodebb.org very interesting work, it's certainly addressing an important problem!

silverpill@mitra.social on 18 Jul 2024 17:40 next collapse

@julian

>and serve them using the common ETag header

Nice. Does it eliminate the need for a custom Collection-Synchronization header?

(initially replied from SocialHub but my reply didn't federate)

julian@community.nodebb.org on 18 Jul 2024 23:49 collapse

@silverpill@mitra.social it wasn't readily apparent in FEP 8fcf why a bespoke header was used instead of ETag. If I had to conjure up a rationale, it would be because an ETag is explicitly tied to a resource, but follower synchronization digests differ depending on calling user (different follower sets, etc.)

In this case however, since topic synchronization deals only with publicly addressable content, and is tied to the context, we sidestep that complication and I opted to use the commonly seen ETag header.

Also thanks for the heads up re: socialhub federation (or lack thereof), yay for regressions!

silverpill@mitra.social on 02 Aug 2024 04:50 collapse

@julian

I implemented fetching of context (manual) - my server simply reads the latest N items from the collection.
While working on that I realized that synchronization can be done differently. If context collection contains activities that modify it (such as Add and Remove), in reverse chronological order, the client can re-construct the current state by fetching them and applying one by one. There is no need to compute digests with this approach, remembering latest activity ID would be enough.

silverpill@mitra.social on 02 Aug 2024 05:22 collapse

@julian Even if context contains objects and not activities, synchronization can be done by requesting all activities where Activity.target == Object.context (from the outbox perhaps?)

julian@community.nodebb.org on 02 Aug 2024 14:12 collapse

@silverpill@mitra.social @Alex-mehr You both seem to have come up with similar solutions at roughly the same time!

Some thoughts:

  • There's no guarantee that a collection would present items in chronological vs. reverse chronological order — are you checking the timestamps and reversing as needed?
  • Wouldn't you need to paginage through the entire collection anyway?
  • My context contains only objects (e.g. as:Note, as:Article, etc.) so there wouldn't be any activities for you to actually consume
    • However, this is not set in stone. @trwnh@mastodon.social advocated for objects in the context collection, but activities in the context outbox, which could also work.
  • The idea behind serving a digest in ETag header is simply to provide a means to quickly determine whether your collection of objects is up-to-date. The header is served when the context is requested, and the hashing can be done locally. If a match is found, then you avoid any additional network calls.

How the context is synchronized is actually implementor dependent. So if what works for you is to look at the activities and re-construct based on ID, then that's great (assuming activities are even provided by the context)! If you'd prefer to just re-iterate through the entire collection, that's great too.

julian@community.nodebb.org on 02 Aug 2024 14:16 collapse

@silverpill@mitra.social @Alex-mehr Admittedly I have a blind spot when it comes to activities.

NodeBB doesn't actually track activities it receives, it only processes objects and activity IDs are really just generated on-the-fly. I think that informs why I set up topic synchronization in this manner, and why my idea of context collections contain only objects; to me, activities don't really mean much at all.

silverpill@mitra.social on 02 Aug 2024 16:33 collapse

@julian @alex-mehr @trwnh

>There's no guarantee that a collection would present items in chronological vs. reverse chronological order — are you checking the timestamps and reversing as needed?

The ordering can be specified by some property of Collection

>Wouldn't you need to paginage through the entire collection anyway?

The client will fetch pages until it finds an item that has already been processed.

> I think that informs why I set up topic synchronization in this manner, and why my idea of context collections contain only objects; to me, activities don't really mean much at all.

I'd prefer context to be a collection of objects too, as long as there's a way to retrieve activity history.

Activity-based sync seems more natural to me. I think ActivityPub can be better understood not as a protocol for social networking, but as a distributed database where nodes sync datasets by sending messages over the network. Messages are activities, datasets are collections. When I send a Follow activity and your server responds with an Accept, followers and following collections are updated on both sides (or their equivalents if you don't store activities and collections). More generally, any activity delivery can be viewed as a synchronization of outbox collection.

I think such change of perspective can greatly improve DX and provide a solid foundation for further protocol extensions

michael@newsmast.social on 19 Jul 2024 06:51 collapse

@julian This looks really interesting! Great you are addressing this. cc @newsmast