Search open positions
Back to results Apply
Data Lead
Newbury
, Ohio Direct Hire
Remote
Data Lead
Remote
$160,000-210,000
The Data Lead will be responsible for designing and maintaining our data infrastructure, including ETL pipelines, vector databases, and retrieval systems for RAG-based applications. They will guide data quality, governance, and performance optimization efforts, ensuring our platform delivers accurate, scalable, and cost-efficient data-driven experiences.
Data Lead Responsibilities
Data Lead Requirements
Remote
$160,000-210,000
The Data Lead will be responsible for designing and maintaining our data infrastructure, including ETL pipelines, vector databases, and retrieval systems for RAG-based applications. They will guide data quality, governance, and performance optimization efforts, ensuring our platform delivers accurate, scalable, and cost-efficient data-driven experiences.
Data Lead Responsibilities
- Data Engineering: Strong SQL and Python, ETL pipeline design, and data normalization/cleaning.
- Vector Databases & Retrieval: Hands-on with Pinecone, Weaviate, Milvus, or pgvector. Knowledge of index strategies (HNSW, IVF, PQ).
- RAG (Retrieval Augmented Generation): Designing retrieval strategies (chunking, embeddings selection, reranking).
- Embedding Models: Understanding how to choose and evaluate embedding models for domain-specific tasks.
- Data Modeling & Knowledge Graphs (nice-to-have): For linking structured/unstructured data.
- Data Quality & Governance: Setting standards for metadata, access controls, lineage, and freshness.
- Performance Optimization: Benchmark and tune latency, recall/precision, and cost/performance trade-offs.
Data Lead Requirements
- 6+ years in data engineering, data platform, or ML data roles.
- Strong SQL and Python skills for ETL and data workflows.
- Experience with vector databases (Pinecone, Weaviate, Milvus, pgvector).
- Proven ability to design retrieval pipelines for RAG.
- Deep understanding of embedding models and their evaluation.
- Familiarity with data quality and governance frameworks.
- Ability to optimize systems for latency, accuracy, and cost-efficiency.
#ZR







