Some thoughts on the Canonical S-Expressions format we use currently in Datashards:

@emacsen Maybe you should look into protobuf?
It's been implemented in a bunch of languages so writing a parser shouldn't be needed and I think of it as basically a binary-version of JSON.

@lanodan Perhaps but profobufs require a schema. CSEXP is *technically* schema-less.

@emacsen I like it. Another variant that could be useful in some contexts is length-prefixing the lists as well. That way if you somehow know a particular list isn't of interest you can skip over it faster. I suppose that feature would be more interesting if there were a schema to inform these kinds of seeks.

@emacsen Another tradeoff is the added size overhead would probably make full reads slower. A header with a span index table and an offset to the start of the expression may be a good compromise. Unless the expressions get really large there’s probably no need for these extensions.

@jmitchell I think if your data is this complex, you're already better off with another serialization format.

@jmitchell Interesting. The file sizes in Datashards are so small that this isn't a issue for us and we just read the raw data into memory first.

Sign in to participate in the conversation
Mastodon is one server in the network