Research Crawling Engineer

🕒 Abril 28

🗣️🇺🇸🇬🇧 Inglês obrigatório

Candidatar-se
Encontrar Vagas Remotas Similares

📊 Verifique sua pontuação de currículo para esta vaga

Melhore suas chances de conseguir uma entrevista verificando sua pontuação de currículo antes de se candidatar.

Logo of MLabs

MLabs

51 - 200 funcionários

A MLabs Consulting ajuda a configurar especificações de projetos, implementar, gerenciar e manter projetos técnicos para IA, Fintech, Tecnologia da Informação e mais. Somos especializados em programação funcional, compiladores, IA, DevOps e desenvolvimento full-stack.

Descrição

• Construct and maintain large-scale web crawlers across diverse domains. • Design high-throughput, fault-tolerant systems for data collection, managing volumes ranging from millions to billions of URLs per day. • Navigate anti-bot systems, rate limits, and dynamic, JavaScript-heavy websites. • Develop robust pipelines for data cleaning, deduplication, filtering, and normalization. • Build and maintain datasets specifically structured for research and machine learning model training. • Monitor and optimize crawl performance, coverage, and data quality through rapid iteration. • Collaborate with research teams to ensure data collection efforts align with modeling requirements. • Optimize infrastructure to ensure cost-efficiency, low latency, and reliability.

🎯 Requisitos

• Extensive programming experience in one or more of the following: Go, Rust, Python, Java, or C++. • Proven experience in building web crawlers or large-scale data pipelines. • Solid understanding of HTTP, networking protocols, and browser behavior. • Familiarity with distributed systems and parallel processing techniques. • Experience handling large datasets, ideally at the terabyte to petabyte scale. • Demonstrated ability to debug and maintain systems within unstable or adversarial environments. • Preferred Qualifications: • Experience with NLP pipelines or dataset curation for machine learning. • Familiarity with LLM pre-training data or retrieval systems. • Practical experience with headless browsers (e.g., Playwright, Puppeteer, or Chrome DevTools Protocol). • Knowledge of proxy systems, IP rotation, and large-scale request orchestration. • Background in data quality evaluation or benchmarking. • Experience running workloads on cloud or bare-metal infrastructure.

🏖️ Benefícios

• Impactful Opportunity: Contribute to the development of a web-scale crawler and knowledge graph at the forefront of AI data accessibility. • High-Performance Culture: Join a lean, low-ego team that prioritizes high output and professional growth. • Remote Work: This position is part of a fully remote team, offering flexibility and autonomy. • Competitive Compensation: A package including a competitive salary, comprehensive benefits, and equity, commensurate with experience and the ability to operate at scale.

Candidatar-se

Vagas Similares

🕒 Abril 28

AECOM

10.000+ funcionários

Substation Engineer specializing in physical substation design for AECOM. Experience in power delivery industry required with a focus on utility scale designs.

🇺🇸 Estados Unidos – Remoto (EUA)

💵 $110.000 - $160.000 / ano

⏰ Tempo Integral

🟡 Pleno

🟠 Sênior

👷🏻‍♀️ Engenheiro

🦅 Patrocina Visto H1B

info

🗣️🇺🇸🇬🇧 Inglês obrigatório

🕒 Abril 28

Ulteig

1001 - 5000

⚡ Energia

Lead Civil Owner’s Engineer overseeing civil and structural design reviews for renewable energy projects at Ulteig. Act as a trusted third-party engineer for solar, wind, and battery energy storage projects.

🇺🇸 Estados Unidos – Remoto (EUA)

💵 $126.400 - $164.300 / ano

⏰ Tempo Integral

🟠 Sênior

👷🏻‍♀️ Engenheiro

🦅 Patrocina Visto H1B

info

🗣️🇺🇸🇬🇧 Inglês obrigatório

🕒 Abril 28

Medline Industries, LP

10.000+ funcionários

⚕️ Seguro de Saúde

💊 Farmacêutico

Engineer supporting Medline’s Supplier Quality operations by conducting supplier audits and ensuring compliance with FDA regulations. Collaborating with cross-functional teams for supplier qualification and performance monitoring.

🇺🇸 Estados Unidos – Remoto (EUA)

💵 $79.000 - $119.000 / ano

💰 Private Equity Round em 2021-06

⏰ Tempo Integral

🟡 Pleno

🟠 Sênior

👷🏻‍♀️ Engenheiro

🗣️🇺🇸🇬🇧 Inglês obrigatório

🕒 Abril 28

Medline Industries, LP

10.000+ funcionários

⚕️ Seguro de Saúde

💊 Farmacêutico

Lead Supplier Quality Oversight for Global Supply Base at Medline. Drive risk-based qualification, performance management, and audit execution for high-risk suppliers.

🇺🇸 Estados Unidos – Remoto (EUA)

💵 $101.000 - $152.000 / ano

💰 Private Equity Round em 2021-06

⏰ Tempo Integral

🟠 Sênior

👷🏻‍♀️ Engenheiro

🗣️🇺🇸🇬🇧 Inglês obrigatório

🕒 Abril 28

Commissioning Project Engineer providing technical support for commissioning projects at JLL. Collaborating with internal and external teams, monitoring, and executing commissioning-related activities.

🇺🇸 Estados Unidos – Remoto (EUA)

💵 $85.000 - $120.000 / ano

⏰ Tempo Integral

🟢 Júnior

🟡 Pleno

👷🏻‍♀️ Engenheiro

🗣️🇺🇸🇬🇧 Inglês obrigatório