Senior Data Scientist – Big Data R&D, Identity Graph, KYC

🕒 April 24

🏄 California – Remote

info

💵 $140k - $170k / year

⏰ Full Time

🟠 Senior

📊 Data Scientist

🦅 H1B Visa Sponsor

info
Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Socure

Socure

501 - 1000 employees

Founded 2012

🤖 Artificial Intelligence

🔐 Security

💸 Finance

💰 $450M Series E on 2021-11

Artificial Intelligence • Security • Finance

Socure is a leading platform for digital identity verification and trust. Utilizing advanced predictive analytics, artificial intelligence, and machine learning technologies, Socure leverages vast online and offline data intelligence including email, phone, address, IP, and device information to verify identities in real-time. Their solutions address challenges in onboarding, login authentication, account takeover prevention, and contact center operations. Socure's AI-powered platform excels in combating identity fraud, ensuring compliance, and enhancing user experiences across various industries such as financial services, eCommerce, online gaming, and crypto.

📋 Description

• Own the design, development, and evaluation of machine learning, statistical, and graph-based algorithms for entity-resolution, identity trust scoring, and anomaly detection on massive datasets. • Architect and optimize graph-based identity representations (identity graph structure, linkage rules, clustering) to improve match rates, reduce false positives/negatives, and support downstream fraud and KYC models. • Build and maintain scalable data pipelines and feature stores in Spark/PySpark (or Scala), including data normalization, deduplication, and feature computation across large PII datasets in AWS/Databricks environments. • Lead A/B tests and offline/online experimentation for new models, features, and data sources; define success metrics, design experiments, and ensure rigorous validation before rollout. • Evaluate new internal and external data sources: explore signal quality, design backtests, quantify incremental value, and provide clear recommendations on vendor selection and integration. • Partner closely with product managers and engineers to translate ambiguous business and regulatory requirements (e.g., KYC coverage, watchlist matching) into concrete modeling and data roadmaps. • Provide deep analytical support to Socure’s compliance and regulatory product suite, including investigative analyses, root‑cause analysis for anomalies, and clear narratives for internal and external stakeholders. • Contribute to model governance and documentation: clearly explain model logic, data dependencies, limitations, and monitoring plans to internal risk/compliance stakeholders. • Mentor junior data scientists and engineers on best practices in data exploration, feature engineering, experimentation, and code quality. • Communicate complex technical concepts and trade‑offs in a concise, structured way to both technical and non‑technical audiences (e.g., product reviews, customer meetings, internal briefings).

🎯 Requirements

• Master’s degree with 3+ years of relevant industry experience, or Ph.D. with 1+ years of experience in applied ML / data science roles; background in Computer Science, Statistics, Mathematics, or related quantitative fields preferred. • Strong proficiency in Python (preferred) or Scala, including experience with ML libraries such as scikit‑learn, XGBoost, TensorFlow or PyTorch. • Extensive experience with Spark or PySpark and distributed data systems (e.g., AWS EMR, Databricks) working on very large, messy datasets. • Deep understanding of supervised and unsupervised learning, feature engineering, model evaluation, and experiment design (A/B testing, holdout strategies, stratification). • Experience developing production-quality data pipelines and automated workflows using Airflow or similar orchestration tools. • Practical familiarity with graph databases and/or graph frameworks (Neo4j, AWS Neptune, GraphFrames, DGL, PyTorch Geometric) and graph algorithms for clustering, link prediction, and community detection is strongly preferred. • Solid SQL skills and experience working with large-scale analytical data stores. • Experience in at least one of: identity verification, fraud detection, credit risk, or adjacent high‑stakes domains is a plus. • Demonstrated ability to lead medium‑to‑large projects end‑to‑end, make sound trade‑off decisions under ambiguity, and influence cross‑functional stakeholders with data and clear reasoning.

🏖️ Benefits

• Offers Equity • Offers Bonus

Apply Now

Similar Jobs

🕒 April 24

Foundation

11 - 50

₿ Crypto

🏪 Marketplace

🛍️ eCommerce

Senior Data Scientist at City of Hope analyzing large healthcare datasets to improve cancer care delivery. Collaborating with administrative and clinical teams, applying machine learning techniques.

🕒 April 24

OneStudyTeam

201 - 500

⚕️ Healthcare Insurance

🧬 Biotechnology

💊 Pharmaceuticals

Senior Data Scientist advancing data-driven solutions for clinical trials at OneStudyTeam. Collaborating with cross-functional teams to improve patient enrollment and trial management through statistical models and machine learning algorithms.

🕒 April 23

Cushman & Wakefield

10,000+ employees

🏠 Real Estate

🏢 Enterprise

Senior Director leading execution of AI strategy and managing a multidisciplinary team at Cushman & Wakefield. Overseeing AI innovation, governance, and operational rigor.

🕒 April 23

Advarra

501 - 1000

☁️ SaaS

💊 Pharmaceuticals

AI Data Scientist focusing on optimizing and operationalizing machine learning models for Advarra’s Braid platform. Collaborating with teams to enhance clinical and operational data leveraging advanced AI techniques.

🕒 April 23

Paramount

10,000+ employees

📱 Media

👥 B2C

Senior Data Scientist leading evaluation strategies for dynamic personalization surfaces at Paramount. Collaborating to ensure visual optimizations are statistically sound and causally effective.