• Developing high-performance storage and I/O primitives • Benchmarking existing I/O library functions to determine where there are bottlenecks • Participating in peer code review of all PRs related to file storage and interacting with different filesystems • Analyzing I/O throughput in a massively parallel and distributed query engine to identify inefficiencies and crafting solutions to tackle those inefficiencies • Ensuring that the everything related to storage is built as high quality as possible, balancing performance, usability, and maintainability across the Voltron Data and Apache Arrow ecosystems • Developing a comprehensive set of low level benchmarks for I/O functions targeting various local, networked and cloud storage technologies to enable monitoring for performance regressions • Identifying and building reusable software components to ensure a high quality and maintainable codebase


• Strong experience developing in C++, especially using Modern C++ • Experience developing and using various data lake storage technologies as: S3, Google Compute Storage, Azure Blob Storage • Building and using distributed networked file systems such as HDFS or Ceph • Experience working with technologies such as io_uring, DMA, RDMA, or GPUDirect Storage • Experience with data lake table formats such as Iceberg, Delta Lake, and Hudi • Experience with data lake formats such as Parquet, Delta Lake, and Avro • Experience with data lake formats such as Iceberg, Delta Lake, and Hudi • Strong experience with C++ development tools such as Apache Arrow • Experience with C++ development tools such as Apache Arrow


• Unlimited PTO • Medical, Dental, and Vision • Retirement [USA Only] • Home Office Budget • Continuing Education Budget

