𝐼𝒻 𝐼 𝓂𝒶𝓎 𝒾𝓃𝓉𝑒𝓇𝒿𝑒𝒸𝓉
I really wish DHTs would be removed from the literature because they seem so good but they have so many unsolved problems that they trick people into designing systems that can't be safely implemented.
1. DoS -> If you're hashing the /64 of IP6 space then I need to get a /48 (trivial), getting a /32 is not so trivial but it's not really hard. Domains are actually probably less bad than /64s
My instinct is to do repeated searches with dns-like TTL caches. So forwarding a search to the entire fediverse is obviously pretty horrible, but nodes who didn't recently use a tag shouldn't be bothered with the question.
So perhaps use a pubsub which allows you to subscribe to recent tag activity, for example "I have messages from the past 1 minute which use the tags" [ .... ]
Then you can limit the number of nodes you have to search...
@cjd @lain Searches alone don't provide any subscription functionality, as having to poll for posts will just overload the network at high interest for a tag.
Furthermore, many use cases mandate post delivery to happen at least close to real-time. This wouldn't be possible with TTL-flooding at all.
Regarding flooding, I hope you know how "well" Gnutella worked?
My idea is definitely not well thought out, but the reason why I tried to make it work on top of querying/pubsub is because when someone is running a shitty server (freefedifollowers), you can just block the server and be done with it, whereas in a DHT you would have to get everybody onboard.
So thinking about it a bit more, I think what I'd do is the following:
1. Gossip all of the data in order to reduce the load
2. When you get an update about server X from server Y, next time you want to learn about server X, ask server Y again (unless server Y goes down, in which case you switch)
3. Publish the chain of servers through-which the updates from server X reached you
This way you can do blacklisting and whitelisting which is resistant to fakery.
What do you mean with "all of the data"? Apart from the hashtag posts assigned to an instance by the DHT as a relay or storage node, instances shall only receive the posts they subscribed to or queried. Otherwise we'll get an overload-causing "all or nothing" situation for smaller nodes like with current relays.
2. & 3.: So we're building ourselves a multicast tree by learning, right? That is a possible approach, but how does it perform better in ->
Everything -> everything you proposed to put in the dht (I like your idea of a message id only)
re performance, nodes can set a preference number to indicate how much they want you to pull from them so you tend to build a tree with hub nodes that can handle it. Ofc each update from node x should be signed and time-stamped by node x so it can't be tampered with.
> Everything -> everything you proposed to put in the dht
Still not sure what you mean.
The DHT part assigns responsibility of handling a bunch of hashtags to an instance. All other hashtag posts don't have to reside on that instance if it doesn't deliberately fetch them/ subscribe to them at another instance out of interest.
Writing up a little something here: https://cryptpad.fr/code/#/2/code/view/DjX7MWbez2OF5uXSjN1aQjL6IMoE-tXC26AG6fN5OEw/present/
Still in progress...
The only table I really want to bang my fist on is please don't slip on the DHT-banana-peel.
AFAICT the only protocol using DHT at scale is bittorrent (are there others?) and their usage is very unique. I would argue that in their usage it's a motte and bailey.
@cwebber @schmittlauch @lain @emacsen
The bailey in this case is the trackers. They work really well, they're fast, they're centrally administered so if something goes wrong, someone can deal with it.
But if the baddies threaten to take down the trackers, the bittorrent people say "ohh you are fools, you can take down the trackers all you like, we have <drumroll> The DHT", and that's true, if the trackers go down, the network will continue to function.
But then you have the DHT attacks...
@cjd @schmittlauch @lain @emacsen Do you think hosting such things over tor .onion services or I2P helps? Makes it harder to take down nodes. But OTOH, I'd also love to be able to use the fediverse servers we already have to distribute content without setting up separate daemons necessarily (I'm guessing that's where the Pleroma devs plan to take things)
I think it would be really cool if actually everything was gossiped, so then the fediverse could cross network boundaries (some nodes in tor, some in i2p, some in Hyperboria, some in China), but that's just a dream and the bandwidth to move media around makes such a thing untenable.
Should a node, once it has content that is "important" to it (eg, let's say my node containing this very post) continue to hold onto it and respond to queries asking for content?
On the one hand, this helps important content survive. On the other hand, it helps reveal who has the content.
I wonder if we can make progress on this without going full-freenet ;)
Having the originating server store the content and other servers only "cache" it makes logical sense because the originating server is the one which has the direct relationship with the person who created the content (who is probably the relevant data-subject).
You have a strong interest in this, evidenced by the paper you put significant time and effort into, I'm occupied by other things and I only have a marginal interest in making the fediverse more flexible in how it deals with attacks.
At this point your proposal is more standards-ready than mine, yours has a champion (you), mine doesn't because I don't have the time.
@cjd @cwebber @lain @emacsen I'll try to read your proposal as soon as possible. I like your enthusiasm and you quickly getting onto things, but am also a bit appalled by how quick you put together an alternative suggestion and people discussing it.
I need to remember my considerations for *not* building gossip (I didn't know that term back then) trees 6 months(!) ago 😅
@lain @emacsen @cjd @schmittlauch I think one thing that happened at APConf is that a lot of us started to get excited about the viability of bringing Datashards to the fediverse. It seems to me that the Pleroma team is looking to take leadership here, and that's really great and increases my confidence.
emacsen.net is one server in the network