from julian@community.nodebb.org to swicg-threadiverse-wg@community.nodebb.org on 18 Jul 2024 16:13
https://community.nodebb.org/post/100389
tl;dr — conversation backfill and synchronization via resolvable context; potential FEP.
This topic is an extension of an earlier discussion: How do you use context
(if at all)?
We came out of May's ForumWG meeting with a sense that pursuing formalisation of the context
property was a step in the right direction. I later built out a resolvable context
collection as part of this effort.
Currently, if you are given a standalone activitypub object, you might not have any or all of the conversation surrounding it. That's part-and-parcel of the design of ActivityPub — that content is pushed to various federated instances, as opposed to one centralized authority —but is a source of some concern as end-users continually remark on how various instances have different reply sets, and worse yet, even the original site may not have the entire conversation.
I can hear @evan@cosocial.ca now:
"ActivityPub is a push and pull-based API!!" — Evan Prodromou
Agreed! Although, while you can pull public objects via ActivityPub, you can't pull said objects if you don't know they exist. Here are your options for building/resolving any single object's conversational context:
- You may opt to do nothing (and the object is standalone; not ideal).
- You may traverse up the
inReplyTo
chain and build out one direct thread of replies (better).- N.B. for security, it is best to limit the traversal to an arbitrary maximum
- New — you may query the object's
context
property, and if resolving to a(Ordered)Collection
, build out the entire conversational context — including all conversational sub-trees — in one fell swoop.
New this week is a proof-of-concept implementation of a "context synchronization" mechanic. Using similar mechanics to Mastodon's FEP-8fcf (Followers collection synchronization across servers), I propose servers can compute a digest for a context collection via its object ids, and serve them using the common ETag
header. Recipients may opt to calculate their own digest and begin backfill on digest mismatch. Optionally, the If-None-Match
header containing that digest can be sent, allowing the origin server to respond with an even simpler 304 Not Modified
.
Technical details re: topic synchronization.
Backfill and sync are both still limited availability; only NodeBB supports them currently. However, I'm working with Angus (building out the Discourse AP integration) to expand support, and I'd like to eventually publish an FEP and SocialCG report to make this all pseudo-official.
We intend to discuss our research at this month's ForumWG (August 1st; 1300 EDT), join us and let's see where this goes!
threaded - newest
@julian@community.nodebb.org very interesting work, it's certainly addressing an important problem!
@julian
>and serve them using the common ETag header
Nice. Does it eliminate the need for a custom
Collection-Synchronization
header?(initially replied from SocialHub but my reply didn't federate)
@silverpill@mitra.social it wasn't readily apparent in FEP 8fcf why a bespoke header was used instead of ETag. If I had to conjure up a rationale, it would be because an ETag is explicitly tied to a resource, but follower synchronization digests differ depending on calling user (different follower sets, etc.)
In this case however, since topic synchronization deals only with publicly addressable content, and is tied to the context, we sidestep that complication and I opted to use the commonly seen
ETag
header.Also thanks for the heads up re: socialhub federation (or lack thereof), yay for regressions!
@julian
I implemented fetching of context (manual) - my server simply reads the latest N items from the collection.
While working on that I realized that synchronization can be done differently. If context collection contains activities that modify it (such as Add and Remove), in reverse chronological order, the client can re-construct the current state by fetching them and applying one by one. There is no need to compute digests with this approach, remembering latest activity ID would be enough.
@julian Even if context contains objects and not activities, synchronization can be done by requesting all activities where
Activity.target == Object.context
(from the outbox perhaps?)@silverpill@mitra.social @Alex-mehr You both seem to have come up with similar solutions at roughly the same time!
Some thoughts:
as:Note
,as:Article
, etc.) so there wouldn't be any activities for you to actually consumeETag
header is simply to provide a means to quickly determine whether your collection of objects is up-to-date. The header is served when the context is requested, and the hashing can be done locally. If a match is found, then you avoid any additional network calls.How the context is synchronized is actually implementor dependent. So if what works for you is to look at the activities and re-construct based on ID, then that's great (assuming activities are even provided by the context)! If you'd prefer to just re-iterate through the entire collection, that's great too.
@silverpill@mitra.social @Alex-mehr Admittedly I have a blind spot when it comes to activities.
NodeBB doesn't actually track activities it receives, it only processes objects and activity IDs are really just generated on-the-fly. I think that informs why I set up topic synchronization in this manner, and why my idea of context collections contain only objects; to me, activities don't really mean much at all.
@julian @alex-mehr @trwnh
>There's no guarantee that a collection would present items in chronological vs. reverse chronological order — are you checking the timestamps and reversing as needed?
The ordering can be specified by some property of Collection
>Wouldn't you need to paginage through the entire collection anyway?
The client will fetch pages until it finds an item that has already been processed.
> I think that informs why I set up topic synchronization in this manner, and why my idea of context collections contain only objects; to me, activities don't really mean much at all.
I'd prefer
context
to be a collection of objects too, as long as there's a way to retrieve activity history.Activity-based sync seems more natural to me. I think ActivityPub can be better understood not as a protocol for social networking, but as a distributed database where nodes sync datasets by sending messages over the network. Messages are activities, datasets are collections. When I send a Follow activity and your server responds with an Accept,
followers
andfollowing
collections are updated on both sides (or their equivalents if you don't store activities and collections). More generally, any activity delivery can be viewed as a synchronization of outbox collection.I think such change of perspective can greatly improve DX and provide a solid foundation for further protocol extensions
@julian This looks really interesting! Great you are addressing this. cc @newsmast