Some thoughts on the Canonical S-Expressions format we use currently in Datashards:
@lanodan Perhaps but profobufs require a schema. CSEXP is *technically* schema-less.
@emacsen I like it. Another variant that could be useful in some contexts is length-prefixing the lists as well. That way if you somehow know a particular list isn't of interest you can skip over it faster. I suppose that feature would be more interesting if there were a schema to inform these kinds of seeks.
@emacsen Another tradeoff is the added size overhead would probably make full reads slower. A header with a span index table and an offset to the start of the expression may be a good compromise. Unless the expressions get really large there’s probably no need for these extensions.
@jmitchell I think if your data is this complex, you're already better off with another serialization format.
@jmitchell Interesting. The file sizes in Datashards are so small that this isn't a issue for us and we just read the raw data into memory first.
emacsen.net is one server in the network