Traversing the reply chain when working with topics

Traversing the reply chain when working with topics
from angus@socialhub.activitypub.rocks to swicg-threadiverse-wg@community.nodebb.org on 03 May 2024 07:52
https://socialhub.activitypub.rocks/ap/object/42acd2a60a73ecab2b0d0db78165fc82

I've just been cleaning up threadiverse-wg@socialhub.activitypub.rocks as it had a number of orphaned replies that had been turned into topics. The immediate fix for this is to discard a Note if I don't have the Note it's in reply to (instead of creating a new topic) (PR for that is here), however I know that some implementations try to "walk up" reply chain. I guess I'm thinking out loud here as to how much reply chain walking makes sense in a forum context. I mean ideally we have a collection in a context property to work with, and the Discourse plugin actually checks this already, but we don't live an ideal world. This is what I'm currently thinking for reply chain walking: Go back N number of replies (perhaps 5) to see if there is a Note already associated with an existing post. If we find a Note (say at the 4th iteration) we import ALL of the intervening Notes, and add ALL notes as new posts in the relevant topic. So we'd end up with 5 new posts in the existing topic in this example. Curious what others think on this.

#swicg-threadiverse-wg

threaded - newest

angus@socialhub.activitypub.rocks on 09 May 2024 16:17 next collapse

I guess one of the things I'm assuming is that other services are implementing the Inbox Forwarding spec correctly, which would mean that, in an ideal world, you should already have the replies you should have anyway and this is more of a "stop gap". https://www.w3.org/TR/activitypub/#inbox-forwarding However, I note that Mastodon violates the spec here, which means that more replies from Mastodon might be missed than is ideal https://github.com/mastodon/mastodon/issues/5631#issuecomment-343039649

julian@community.nodebb.org on 09 May 2024 16:26 collapse

@angus@socialhub.activitypub.rocks said in Traversing the reply chain when working with topics:

more replies from Mastodon might be missed than is ideal

You are not incorrect. In practice the following situation happens occasionally, especially in larger/busy topics:

You post a reply to a topic/thread (branch A), but a different branch (B) of the topic occurs outside of your view (since the activities are not forwarded to you)
Later on, someone you do follow replies in branch B, and you receive it.
Traversal finds 20 posts in between you missed, and they are all added at once, and you receive the notification of new posts in the topic, except now all of the "new" posts are scattered throughout the linear flow
- Additionally, some of these new posts might appear in places higher up than where you last read

So this violates the assumption (at least in NodeBB) that if you have a "read up to" point in a topic, that there will not be new content above that point.

@angus@socialhub.activitypub.rocks said in Traversing the reply chain when working with topics:

is it still right to say that those replies are part of your topic in a coherent sense?

From a purely technical point of view, yes, they are part of the same context (at least as derived via reply chain traversal), but from a UX POV, you could make that argument.

A forum with a linear flow of posts tends to diverge less often due to the nature of the presentation of posts themselves; something threaded models don't need to contend with.

angus@socialhub.activitypub.rocks on 09 May 2024 16:10 collapse

The Discourse plugin will implement reply chain traversal for the purpose of topic detection when this is merged: https://github.com/discourse/discourse-activity-pub/pull/98 Essentially it implements the following (but with a limit of 3 instead of 5. angus: Go back N number of replies (perhaps 5) to see if there is a Note already associated with an existing post. If we find a Note (say at the 4th iteration) we import ALL of the intervening Notes, and add ALL notes as new posts in the relevant topic. So we’d end up with 5 new posts in the existing topic in this example. If you're curious about the detail see the ContextResolver spec: spec/lib/discourse_activity_pub/context_resolver_spec.rb

julian@community.nodebb.org on 09 May 2024 15:27 collapse

@angus@socialhub.activitypub.rocks may I ask why you add a limit to the traversal logic?

I can see an argument made against doing so if it locks up the process, but the downside is you'd still have some cases where you don't get the full context.

Either way this may be moot if an iterable context is found, so inReplyTo traversal is ideal as a fallback mechanism.

Edit: in NodeBB's case, we call an internal recursive method called getParentChain which just makes the S2S call and adds it to a Set. The method terminates when it encounters an object with no inReplyTo or is unprocessable.

angus@socialhub.activitypub.rocks on 09 May 2024 15:38 collapse

The honest answer is that a limit makes some intuitive sense to me, but I have medium to low confidence in the cogency of my thinking on both the limit and where it's set. I've set it at 3 as that seems to be the more "conservative" (read "safer") approach while I think it through further / see how this first version works in practice. In terms of the "risks" (to the extent they exist) I think I'm thinking a version of the following: You could be sent a random Note inReplyTo an unrelated Note that's part of a large chain which you end up traversing for no reason. Even if you eventually get to a Note in an existing topic, say 20 replies in, is it still right to say that those replies are part of your topic in a coherent sense? In what scenario would you be missing 20 odd replies? Perhaps there is one.

julian@community.nodebb.org on 09 May 2024 16:28 collapse

@angus@socialhub.activitypub.rocks said in Traversing the reply chain when working with topics:

You could be sent a random Note inReplyTo an unrelated Note that's part of a large chain which you end up traversing for no reason.

Another legitimate concern. My counter is that traversing the chain is rather inexpensive: XHR => (do other things while waiting) => inReplyTo? XHR... etc.

Actual note processing is done only once the chain is complete, and a positive relation is found.

... but I can see how this could lock up the process in other languages where processing literally stops when waiting for the XHR to complete.

FenTiger@mastodon.social on 09 May 2024 17:24 collapse

@julian @angus A bad actor with some programming skill could send you a Note that's part of an infinite inReplyTo chain.

This gets even worse if you want to look at the replies collections of individual Notes - which could form an infinitely branching tree.

None of this happens if there's a One True Collection from which the whole thread can be fetched in one gulp.

julian@community.nodebb.org on 09 May 2024 17:34 collapse

@FenTiger@mastodon.social said in Traversing the reply chain when working with topics:

infinite inReplyTo chain.

I think this could be solved in part by the chain traversal sanity checking to ensure that the id is not already retrieved, but I'm not naive enough to assuming that that can't be circumvented.

... so yes, in that sense a limit makes sense from a security standpoint.