April 25
• Build data pipelines, text analysis algorithms, query engines, and decision making engines • Apply robust and fault-tolerant approaches to create scalable ingestion and data-processing systems • Debug, profile and optimize distributed data-intensive applicating, improving their latency, accuracy, resource consumption, and throughput • Work with existing applications built with Spark, S3, Timescale, Python and Rust • Directly implement services and features that leverage the results of your data pipelineImplement and improve machine learning and data pipelines
• 5+ years of experience as an engineer with a strong understanding of key concepts in distributed systems • 3+ years of extensive experience in building and deploying data applications • Fluency in at least one, and ideally more than one, of these languages: Java/Scala/Kolin, Python, Go, Rust, or C++ • Good understanding of following concepts: partitioning, replication, map-reduce, indexing, and CAP • Experience with distributed storage systems (S3, HDFS, Hive, ClickHouse, Elastic, etc), distributed processing engines (Spark, etc), and message queues (Kafka, SQS, etc) • Passion for building large-scale ML applications and improving software engineers' productivity • Some understanding of key concepts in natural language processing, machine learning, or statistical analysis • Some experience with machine learning stack (pandas, PyTorch, numpy, sci-kit, transformers, etc)
• Unlimited PTO • Competitive salary and equity • Work-life balance • Flexibility to be fully or partly remote • Few meetings, so you can ship fast and focus on building • One Medical membership on us! • Top-notch medical, dental, vision, short-term disability, long-term disability, and life insurance • All insurance is 100% company-paid ($0 premiums) for employees and highly subsidized for dependants • FSA, HSA with company contributions, and pre-tax commuter benefits • 401(k) plan • Paid parental leave ( up to 12 weeks)
Apply Now