The sub-millisecond writes with data in S3 is false and impossible. If you look at the benchmark the fsync is not timed, so this is just the latency of either the network or in kernel file operations depending on the mount settings
I hate it when databases celebrate their performance without synchronous flushing. You should be clear about data loss window (which should be zero for committed transactions by default!) and the flushing interval to persistent storage.
I'm okay if you batch writes, I'm okay if you offer a low-latency mode with less durability, but by being unclear about this it just feels like a scam.
Yeah in this case the footnote to the write latency specifically says “at rest in S3”, which is what caused me to go look at the source. To be clear I have no problem with the ZeroFS of only flushing on fsync.
I am very excited for object storage first systems like this to leverage low latency zonal storage for write ahead logs to keep the disaggregated storage but greatly reduce write latency. That ends up being more expensive, but is likely a good tradeoff in lots of cases I have seen
ZeroFS aims to be a POSIX filesystem, the semantics here are the standard ones (ext4, xfs behave the same): write() is buffered (that's the batching) and "committed" maps to fsync(), which returns only once data is durable.
Nothing wrong with that, but you should remove the “at rest in S3” footnote from the write latency on the frontpage of the website, because that is not what is measured
Read/write operations in object storage are _far more_ expensive than stored bytes. I'm always afraid of anything that abstracts over S3/GCS access specifically for that reason.
Since you are harnessing the sorcery of AI, have it write really good benchmarks, run tests and comparisons on competitive products, (and publish them), look up common pitfalls, often requested features, run security analysis.
Also with marketing texts, write your self first and then you can ask AI to hone it or give you feedback. AI slopped marketing text is visible from miles and really, really puts people off. Even if the product itself would be fine, there is some much slop slushing around in the pipes at the moment.
I really like this project and want to see it succeed! Don't let naysayers wear you down.
If one of your goals is to get others to adopt the software, I recommend you redo the marketing page and readme from scratch. Delete them without looking at them again, then hand write the content for them. Once you have the content, you call tell an LLM to format it into a nice landing page, but strictly keep your wording without changes.
Thanks for being explicit, AI written marketing site. Wouldn't have been able to figure that out! Every currently maintained and reasonably popular open source project either runs CI in public or makes the tests extremely easy to run.
> These are asciinema recordings of real terminal sessions, rendered as text rather than video. Playback caps idle pauses at two seconds and changes nothing else.
Thanks? This sounds like it's the LLM's response to the prompter, not something you should display on the page itself...
I see this all the time in code reviews at work. Extremely verbose comments that teach the clueless author how things work but have no place in the final code: aside from codebase not being a coding tutorial, they are also incredibly specific and would become stale and incorrect in matter of weeks.
I feel bad for actually liking that part now. Capping pauses at 2 seconds would show you where it hung 2+ seconds without wasting your time. Smart I thought.
How does this compare to JuiceFS or SeaweedFS in terms of metadata latency? The LSM tree approach is interesting but compaction pauses on a remote-backed store seem like they could be painful.
I believe the first version of this required the metadata to be stored on the ZeroFS server, making HA kinda hard.
This has changed now that if I stop the server and create a new instance with the same configuration file it'll pickup the existing metadata from the bucket?
NFSv4 is a hard beast to implement correctly, with a lot of protocol surface (state, compound ops, delegations) for benefits ZeroFS mostly gets through 9P with extensions, over a much simpler protocol: https://www.zerofs.net/docs/9p-extensions. NFSv3 stayed in ZeroFS mostly for client compatibility.
We have done loads of research into using object storage wherever we can (given how cheap it is compared to SSDs), and so far it seems like making your application object store-aware is a far surer bet than abstracting S3 behind the file system. The behavior is just too different.
I'm more interested in applications that cleverly use object storage, e.g. AutoMQ, which is quite compatible with Kafka APIs but needs no HDDs.
I'm okay if you batch writes, I'm okay if you offer a low-latency mode with less durability, but by being unclear about this it just feels like a scam.
I am very excited for object storage first systems like this to leverage low latency zonal storage for write ahead logs to keep the disaggregated storage but greatly reduce write latency. That ends up being more expensive, but is likely a good tradeoff in lots of cases I have seen
> ZeroFS fetches object data in 128 KiB parts
Read/write operations in object storage are _far more_ expensive than stored bytes. I'm always afraid of anything that abstracts over S3/GCS access specifically for that reason.
Since you are harnessing the sorcery of AI, have it write really good benchmarks, run tests and comparisons on competitive products, (and publish them), look up common pitfalls, often requested features, run security analysis.
Also with marketing texts, write your self first and then you can ask AI to hone it or give you feedback. AI slopped marketing text is visible from miles and really, really puts people off. Even if the product itself would be fine, there is some much slop slushing around in the pipes at the moment.
I really like this project and want to see it succeed! Don't let naysayers wear you down.
Ask me anything!
If one of your goals is to get others to adopt the software, I recommend you redo the marketing page and readme from scratch. Delete them without looking at them again, then hand write the content for them. Once you have the content, you call tell an LLM to format it into a nice landing page, but strictly keep your wording without changes.
> Each card links to the CI pipeline.
Thanks for being explicit, AI written marketing site. Wouldn't have been able to figure that out! Every currently maintained and reasonably popular open source project either runs CI in public or makes the tests extremely easy to run.
> These are asciinema recordings of real terminal sessions, rendered as text rather than video. Playback caps idle pauses at two seconds and changes nothing else.
Thanks? This sounds like it's the LLM's response to the prompter, not something you should display on the page itself...
This has changed now that if I stop the server and create a new instance with the same configuration file it'll pickup the existing metadata from the bucket?
Metadata has always been in the bucket itself.
For HA, there's now a "replicated mode" if you want automatic failover:
https://www.zerofs.net/docs/high-availability
We have done loads of research into using object storage wherever we can (given how cheap it is compared to SSDs), and so far it seems like making your application object store-aware is a far surer bet than abstracting S3 behind the file system. The behavior is just too different.
I'm more interested in applications that cleverly use object storage, e.g. AutoMQ, which is quite compatible with Kafka APIs but needs no HDDs.
there was slop with ai jesus but now gpt image is just a photo with hidden watermark
(Not posix compliant because it doesn't need to be.)