Back

Sr. Data Engineer (Neo4J, Graph Databases) (Pune)

18.5214 73.8545
Pune, India
Posted: a week ago
Save
Share

Description

Senior Data Engineer
• Data Engineering
• Streaming Pipelines
• Graph Databases
• Entity Resolution Role at a Glance Level
- Lead / Senior (Individual Contributor / Team Lead Track) Experience
- 7
- 10 years of relevant professional experience Location
- Remote (Pune Based Preferred) Employment Type
- Contract Industry Preference
- Any (Healthcare preferred
- Payer / Provider experience strongly preferred) About the Role We are looking for a Senior Data Engineer with deep expertise in streaming architectures, graph database platforms, and large-scale data pipeline engineering. This is a high-ownership, hands-on role that sits at the intersection of real-time data infrastructure, entity resolution, and multi-database system design. You will architect and build pipelines that drive a complex, multi-layered data platform
- ingesting from diverse upstream sources, resolving entities at scale, and keeping graph, relational, search, and caching layers in sync. You will work closely with data architects, AI engineers, and product teams to deliver reliable, high-performance data infrastructure that powers downstream analytics and intelligent applications across any domain. Key Responsibilities Streaming & Ingestion Architecture •Design and build production-grade CDC (Change Data Capture) pipelines using Apache Kafka, consuming events from PostgreSQL, SQL Server, and other RDBMS sources into a centralised knowledge graph. •Architect multi-source ingestion connectors supporting schema evolution, backpressure handling, and at-least-once delivery guarantees across heterogeneous data sources. •Configure and govern Confluent Schema Registry with Avro / Protobuf schemas across all Kafka topics; enforce backward and forward compatibility standards. •Design micro-batch and streaming ETL/ELT workflows using Apache Spark or equivalent frameworks for bulk initial loads and ongoing incremental refresh patterns. •Manage messaging workflows where required; define routing, dead-letter, and retry strategies appropriate to each integration pattern. Graph Database Engineering •Design, build, and optimise graph data models on a production graph database platform; Neo4j is preferred but experience with Amazon Neptune, ArangoDB, TigerGraph, or equivalent graph databases is valued. •Author complex graph queries and traversal patterns
- Cypher (Neo4j), Gremlin (Neptune/TinkerPop), or SPARQL
- for both operational and analytical use cases. •Own ingestion-side write strategies for the graph layer: batch import patterns, upsert logic, index management, and performance tuning under high write throughput. •Collaborate with senior architects to ensure graph data models honour defined schema constraints and governance standards; apply constraint validation frameworks where applicable. •Engineer reliable data flows across complementary stores
- relational (PostgreSQL), search (Elasticsearch), caching (Redis), and time-series (TimescaleDB)
- with consistent transaction semantics. Entity Resolution & Data Quality •Build probabilistic entity resolution engines for large-scale deduplication across master data domains
- customers, products, entities, or records
- leveraging record linkage concepts (Fellegi-Sunter model, blocking strategies, confidence thresholds) and libraries such as Splink, Zingg, or Dedupe.io. •Define and enforce data quality validation rules at ingestion time; implement automated alerting for schema violations, volume anomalies, and SLA breaches. •Design master data management patterns for cross-system entity matching and golden record creation; ensure consistency across all downstream consumers. Data Platform & Lakehouse •Design and implement data lakehouse patterns (Iceberg / Parquet on S3-compatible or Azure storage) for historical data retention, cost-efficient storage, and analytical workloads. •Build and maintain ETL/ELT pipelines using Apache Spark or dbt; define transformation logic, partitioning strategies, and incremental processing patterns. •Ensure data lineage, audit trail, and observability are built into pipeline design from the outset using OpenTelemetry or equivalent tooling. Technical Leadership & Collaboration •Contribute to a sub-team of data engineers; participate in sprint planning, design reviews, and on-call rotations for critical pipelines. •Define and enforce coding standards, pipeline patterns, and infrastructure-as-code practices using Terraform, Docker, and Kubernetes. •Drive proof-of-concept evaluations for recent ingestion technologies, graph platforms, and data tooling relevant to the engagement. Required Qualifications Experience •7
- 10 years of progressive experience in data engineering or a closely related discipline. •Demonstrated track record of delivering production-grade streaming and CDC pipeline systems in enterprise environments across any industry vertical. •Hands-on experience with graph database platforms in production
- Neo4j preferred; Amazon Neptune, ArangoDB, TigerGraph, or equivalent is acceptable. •Practical experience with entity resolution, fuzzy matching, or master data management at scale (500K+ records). •Solid experience with multi-database architectures combining graph, relational, and search layers. •Candidates from any industry are welcome; experience in regulated or data-intensive domains (financial services, retail, logistics, telecoms, healthcare) is advantageous. Technical Skills •Streaming & CDC: Apache Kafka •Graph Databases: Production experience with at least one graph database platform
- Neo4j (preferred), Amazon Neptune, ArangoDB, or TigerGraph; proficiency in the associated query language (Cypher, Gremlin, or SPARQL). •Supporting Databases: PostgreSQL (relational), Elasticsearch (search), Redis (caching). •Programming: Python (Advanced)
- pipeline automation, data workflow scripting, testing; SQL at expert level for complex transformations and query optimisation. •Entity Resolution: Probabilistic record linkage concepts; practical experience with Splink, Zingg, Dedupe.io, or a comparable library. •Data Engineering: High-volume ETL/ELT pipeline design; Apache Spark for distributed processing; data lakehouse patterns (Iceberg, Parquet, Delta Lake). •Cloud & Infrastructure: AWS or Azure
- production delivery on at least one platform; Docker, Kubernetes, Terraform. •Familiarity with semantic or schema standards
- OWL 2, RDF, SHACL, JSON-LD
- sufficient to write conformant graph data models against a defined schema. •Experience with OpenTelemetry, distributed tracing, or observability tooling for pipeline monitoring and incident response. •Prior work in compliance-driven data environments with audit trail, data masking, or access control requirements. •Exposure to graph analytics and visualisation tooling such as Neo4j Bloom, Gephi, or equivalent. •Experience with data governance platforms such as Microsoft Purview, Collibra, or Alation. Preferred Qualifications •Bachelor’s or Master’s degree in Computer Science, Information Systems, or a related engineering discipline. Skills:- Apache Kafka, Neo4J, Issue resolution, Python, Data engineering, Apache Spark, Graph Databases, Docker, Kubernetes, Amazon Web Services (AWS), PostgreSQL and Windows Azure Apply on Kit Job: kitjob.in/job/4m2t80

Highlights

Company name

Vivanet
Job position

Sr. Data Engineer (Neo4J, Graph Databases) (Pune)

Ad ID:

8784178997
Flag
Block ad

Safety Tips

Beware of ads written with poor grammar or spelling.