Data Engineer



Software Engineering, Data Science
San Francisco, CA, USA · Remote
Posted on Friday, June 30, 2023

We're looking for a forward-thinking, structured problem solver, and technical specialist passionate about building systems at scale. You will be among the first to tap into massive blockchain datasets, to construct data infrastructure that makes possible analytics, data science, machine learning, and AI workloads.

As the data domain specialist, you will partner with a cross-functional team of product engineers, analytics specialists, and machine learning engineers to unify data infrastructure across Yakoa's product suite. Requirements may be vague, but the iterations will be rapid, and you must take thoughtful and calculated risks. Your work will take place at the interface of the AI, blockchain, and intellectual property domains, so you must be a quick learner with a thirst for many types of knowledge.


  • Design, build, test, and maintain scalable data pipelines and microservices sourcing both first-party and third-party datasets and deploying distributed (cloud) structures and other applicable storage forms such as vector databases and relational databases.
  • Index multiple blockchain data standards into responsive data environments, and tune those environments to power real-time query infrastructure.
  • Design and optimize data storage schemas to make terabytes of data readily accessible to our API.
  • Build utilities, user-defined functions, libraries, and frameworks to better enable data flow patterns.
  • Utilize and advance continuous integration and deployment frameworks.
  • Research, evaluate and utilize new technologies/tools/frameworks centered around high-volume data processing.
  • Mentor other engineers while serving as technical lead, contributing to and directing the execution of complex projects.


  • 4+ years working as a data engineer.
  • Proficient in database schema design, and analytical and operational data modeling.
  • Proven experience working with large datasets and big data ecosystems for computing (spark, Kafka, Hive, or similar), orchestration tools (dagster, airflow, oozie, luigi), and storage(S3, Hadoop, DBFS).
  • Experience with modern databases (PostgreSQL, Redshift, Dynamo DB, Mongo DB, or similar).
  • Proficient in one or more programming languages such as Python, Java, Scala, etc., and rock-solid SQL skills.
  • Experience building CI/CD pipelines with services like Bitbucket Pipelines or GitHub Actions.
  • Proven analytical, communication, and organizational skills and the ability to prioritize multiple tasks at a given time.
  • An open mind to try solutions that may seem astonishing at first.
  • An MS in Computer Science or equivalent experience.

Exceptional candidates also have:

  • Experience with Web3 tooling.
  • Experience with artificial intelligence, machine learning, and other big data techniques.
  • B2B software design experience.

No crypto or Web3 experience? No problem! We’ll help coach you and cover any costs for educational materials for your growth.


  • Unlimited PTO.
  • Competitive compensation packages.
  • Remote friendly & flexible hours.
  • Wellness packages for mental and physical health.