@schmittlauch @lain

𝐼𝒻 𝐼 π“‚π’Άπ“Ž 𝒾𝓃𝓉𝑒𝓇𝒿𝑒𝒸𝓉

I really wish DHTs would be removed from the literature because they seem so good but they have so many unsolved problems that they trick people into designing systems that can't be safely implemented.

1. DoS -> If you're hashing the /64 of IP6 space then I need to get a /48 (trivial), getting a /32 is not so trivial but it's not really hard. Domains are actually probably less bad than /64s

@cjd @schmittlauch do you know of anything else that could solve the problem here?

@lain @schmittlauch
My instinct is to do repeated searches with dns-like TTL caches. So forwarding a search to the entire fediverse is obviously pretty horrible, but nodes who didn't recently use a tag shouldn't be bothered with the question.

So perhaps use a pubsub which allows you to subscribe to recent tag activity, for example "I have messages from the past 1 minute which use the tags" [ .... ]
Then you can limit the number of nodes you have to search...

@cjd @lain Searches alone don't provide any subscription functionality, as having to poll for posts will just overload the network at high interest for a tag.
Furthermore, many use cases mandate post delivery to happen at least close to real-time. This wouldn't be possible with TTL-flooding at all.

Regarding flooding, I hope you know how "well" Gnutella worked?

@schmittlauch @lain
My idea is definitely not well thought out, but the reason why I tried to make it work on top of querying/pubsub is because when someone is running a shitty server (freefedifollowers), you can just block the server and be done with it, whereas in a DHT you would have to get everybody onboard.

@cjd @lain Well, the operator of the responsible relay can still block some instances (whether it should is another question).
In my design subscribing/ querying instances just receive the post IDs and can still decide not to fetch all these posts from the originating node, based on the URI.

@schmittlauch @lain
What about a fully replicated gossip net? I mean even the bittorrent DHT isn't really a DHT anymore, there are nodes which are just storing the entire dataset (and forwarding it to trackers and answering DHT requests, and acting as a sybil net)

@schmittlauch @lain
So thinking about it a bit more, I think what I'd do is the following:
1. Gossip all of the data in order to reduce the load
2. When you get an update about server X from server Y, next time you want to learn about server X, ask server Y again (unless server Y goes down, in which case you switch)
3. Publish the chain of servers through-which the updates from server X reached you
This way you can do blacklisting and whitelisting which is resistant to fakery.

@cjd @lain
> 1. Gossip all of the data

What do you mean with "all of the data"? Apart from the hashtag posts assigned to an instance by the DHT as a relay or storage node, instances shall only receive the posts they subscribed to or queried. Otherwise we'll get an overload-causing "all or nothing" situation for smaller nodes like with current relays.

2. & 3.: So we're building ourselves a multicast tree by learning, right? That is a possible approach, but how does it perform better in ->

@schmittlauch
Everything -> everything you proposed to put in the dht (I like your idea of a message id only)

re performance, nodes can set a preference number to indicate how much they want you to pull from them so you tend to build a tree with hub nodes that can handle it. Ofc each update from node x should be signed and time-stamped by node x so it can't be tampered with.
@lain

@cjd
> Everything -> everything you proposed to put in the dht

Still not sure what you mean.
The DHT part assigns responsibility of handling a bunch of hashtags to an instance. All other hashtag posts don't have to reside on that instance if it doesn't deliberately fetch them/ subscribe to them at another instance out of interest.

->
@lain

@cjd

So either instances can fetch all posts of all hashtags they can get a hold on, which would benefit dissemination. Or they just take what they're interested in themselves, which'd decrease the probability of finding posts of unpopular hashtags.

@lain

@schmittlauch
I can write what I'm thinking on a pad tomorrow morning if you think it is interesting, at least then it would be properly written rather than me phone posting bits and pieces of how it could work...
@lain

@cjd
Yes, feel free to write a consistent wrap-up of your ideas. Although I'm a bit sceptical, it will widen my thoughts on other approaches and structures, I'll definitely read it.
@lain

@schmittlauch @cjd @lain Just wanted to say that I'm happy to see this discussion happening here. We effectively need to solve the same problems for Datashards, so... timely!

@cjd @schmittlauch @lain Looping in @emacsen. This is exactly the same stuff we need to figure out for how to distribute Datashards content across the fediverse. Can we collaborate on this vision?

@cwebber @schmittlauch @lain @emacsen
My suggestion is just that, a suggestion. I think it provides rich tools for dealing with bad actors but that's just me...

The only table I really want to bang my fist on is please don't slip on the DHT-banana-peel.

AFAICT the only protocol using DHT at scale is bittorrent (are there others?) and their usage is very unique. I would argue that in their usage it's a motte and bailey.

@cwebber @schmittlauch @lain @emacsen
The bailey in this case is the trackers. They work really well, they're fast, they're centrally administered so if something goes wrong, someone can deal with it.

But if the baddies threaten to take down the trackers, the bittorrent people say "ohh you are fools, you can take down the trackers all you like, we have <drumroll> The DHT", and that's true, if the trackers go down, the network will continue to function.

But then you have the DHT attacks...

@cjd @schmittlauch @lain @emacsen Do you think hosting such things over tor .onion services or I2P helps? Makes it harder to take down nodes. But OTOH, I'd also love to be able to use the fediverse servers we already have to distribute content without setting up separate daemons necessarily (I'm guessing that's where the Pleroma devs plan to take things)

@cwebber @schmittlauch @lain @emacsen

I think it would be really cool if actually everything was gossiped, so then the fediverse could cross network boundaries (some nodes in tor, some in i2p, some in Hyperboria, some in China), but that's just a dream and the bandwidth to move media around makes such a thing untenable.

@cjd @schmittlauch @lain @emacsen Also, there are two things that can be gossiped:
- who has the content
- the content

presumably we'd do the former, but occasionally as part of the system "grab" bits of interesting stuff?

@cjd @schmittlauch @lain @emacsen Another thing I'm really unsure of:

Should a node, once it has content that is "important" to it (eg, let's say my node containing this very post) continue to hold onto it and respond to queries asking for content?

On the one hand, this helps important content survive. On the other hand, it helps reveal who has the content.

I wonder if we can make progress on this without going full-freenet ;)

@cwebber @schmittlauch @lain @emacsen

Having the originating server store the content and other servers only "cache" it makes logical sense because the originating server is the one which has the direct relationship with the person who created the content (who is probably the relevant data-subject).

@cwebber @schmittlauch @lain @emacsen
Interesting question: Why not just gossip ALL public messages between nodes ?
This solves:
* Hashtags
* Groups
* Full text search

@cjd @cwebber @lain @emacsen As you're already starting the discussion before I had time to read your proposal thoroughly:
Do you intend to store-and-forward messages through all nodes on the path, or is gossiping just used for discovery and delivery is done directly routed?

@schmittlauch @cwebber @lain @emacsen
Also I don't want to come off as rushing a solution.

You have a strong interest in this, evidenced by the paper you put significant time and effort into, I'm occupied by other things and I only have a marginal interest in making the fediverse more flexible in how it deals with attacks.

At this point your proposal is more standards-ready than mine, yours has a champion (you), mine doesn't because I don't have the time.

@cjd @cwebber @lain @emacsen I'll try to read your proposal as soon as possible. I like your enthusiasm and you quickly getting onto things, but am also a bit appalled by how quick you put together an alternative suggestion and people discussing it.

I need to remember my considerations for *not* building gossip (I didn't know that term back then) trees 6 months(!) ago πŸ˜…

@schmittlauch @cjd @lain @emacsen At least conversations are starting and people are excited!

Both of you have made proposals; nobody has gotten to implementation yet, so it's ok, there's still plenty of time for us to unpack and discuss.

@cwebber @emacsen @cjd @schmittlauch i think having both dht and gossip wouldn't be too bad, just like bittorrent has dht and trackers. i think instances will be here for quite a while so we don't need to go full p2p all the time.

@lain @emacsen @cjd @schmittlauch I think one thing that happened at APConf is that a lot of us started to get excited about the viability of bringing Datashards to the fediverse. It seems to me that the Pleroma team is looking to take leadership here, and that's really great and increases my confidence.

We didn't have Mastodon devs at the table when these conversations occurred; eventually we want to start looping in @gargron and @nightpool and others about what we're thinking.

@cwebber @lain @emacsen @cjd

Unfortunately I mustn't get too excited as I still have exams and other projects to do /0\

Remember @lain & @cwebber, not everyone works full-time on Fediverse stuff :P

@schmittlauch @lain @emacsen @cjd That's true and full ack (and empathy) there! Your opinions and review are still valuable though :)

Follow

@cwebber @schmittlauch @lain @cjd I don't remember is @cj is on this thread. There are too many threads!

I also have a call with someone from Arne from Freenet on Monday.

Will Discourse help keep things easy to follow? I'm finding this challenging.

Β· Web Β· 1 Β· 0 Β· 2
Sign in to participate in the conversation
Mastodon

emacsen.net is one server in the network