Talent.com
Data Engineer - AI & ML

Data Engineer - AI & ML

TEPHRASan Francisco, CA, United States
2 days ago
Job type
  • Full-time
Job description

Description :

Location : San Francisco, CA

Responsibilities :

1.Design and Build Data Pipelines :

  • Develop, construct, test, and maintain data pipelines to extract, transform, and load (ETL) data from various sources to data warehouses or data lakes.
  • Ensure data pipelines are efficient, scalable, and maintainable, enabling seamless data flow for downstream analysis and modeling.
  • Work with stakeholders to identify data requirements and implement effective data processing solutions.

2. Data Integration :

  • Integrate data from multiple sources such as internal databases, external APIs, third-party vendors, and flat files.
  • Collaborate with business teams to understand data needs and ensure data is structured properly for reporting and analytics.
  • Build and optimize data ingestion systems to handle both real-time and batch data processing.
  • 3. Data Storage and Management :

  • Design and manage data storage solutions (e.g., relational databases, NoSQL databases, data lakes, cloud storage) that support large-scale data processing.
  • Implement best practices for data security, backup, and disaster recovery, ensuring that data is safe, recoverable, and complies with relevant regulations.
  • Manage and optimize storage systems for scalability and cost efficiency.
  • 4. Data Transformation :

  • Develop data transformation logic to clean, enrich, and standardize raw data, ensuring it is suitable for analysis.
  • Implement data transformation frameworks and tools, ensuring they work seamlessly across different data formats and sources.
  • Ensure the accuracy and integrity of data as it is processed and stored.
  • 5. Automation and Optimization :

  • Automate repetitive tasks such as data extraction, transformation, and loading to improve pipeline efficiency.
  • Optimize data processing workflows for performance, reducing processing time and resource consumption.
  • Troubleshoot and resolve performance bottlenecks in data pipelines.
  • 6. Collaboration with Data Teams :

  • Work closely with Data Scientists, Analysts, and business teams to understand data requirements and ensure the correct data is available and accessible.
  • Assist Data Scientists with preparing datasets for model training and deployment.
  • Provide technical expertise and support to ensure the integrity and consistency of data across all projects.
  • 7. Data Quality Assurance :

  • Implement data validation checks to ensure data accuracy, completeness, and consistency throughout the pipeline.
  • Develop and enforce data quality standards to detect and resolve data issues before they affect analysis or reporting.
  • Monitor and improve data quality by identifying areas for improvement and implementing solutions.
  • 8. Monitoring and Maintenance :

  • Set up monitoring and logging for data pipelines to detect and alert for issues such as failures, data mismatches, or delays.
  • Perform regular maintenance of data pipelines and storage systems to ensure optimal performance.
  • Update and improve data systems as required, keeping up with evolving technology and business needs.
  • 9. Documentation and Reporting :

  • Document data pipeline designs, ETL processes, data schemas, and transformation logic for transparency and future reference.
  • Create reports on the performance and status of data pipelines, identifying areas of improvement or potential issues.
  • Provide guidance to other teams regarding the usage and structure of data systems.
  • 10. Stay Updated with Technology Trends :

  • Continuously evaluate and adopt new tools, technologies, and best practices in data engineering and big data systems.
  • Participate in industry conferences, webinars, and training to stay current with emerging trends in data engineering and cloud computing.
  • Qualifications : (Please list all required qualifications) Click here to enter text.

    (Rationalizes basic requirements for candidates to apply. Helps w / rationalization when

    Requirements : -

  • Minimum of 7 years of total experience
  • 1.Educational Background :

    Bachelor's or Master's degree in Computer Science, Information Technology, Data Engineering, or a related field

    2.Technical Skills :

  • Proficiency in programming languages such as Python, Java, or Scala for data processing.
  • Strong knowledge of SQL and relational databases (e.g., MySQL, PostgreSQL, MS SQL Server).
  • Experience with NoSQL databases (e.g., MongoDB, Cassandra, HBase).
  • Familiarity with data warehousing solutions (e.g., Amazon Redshift, Google BigQuery, Snowflake).
  • Hands-on experience with ETL frameworks and tools (e.g., Apache NiFi, Talend, Informatica, Airflow).
  • Knowledge of big data technologies (e.g., Hadoop, Apache Spark, Kafka).
  • Experience with cloud platforms (AWS, Azure, Google Cloud) and related services for data storage and processing.
  • Familiarity with containerization and orchestration tools (e.g., Docker, Kubernetes) for building scalable data systems.
  • Knowledge of version control systems (e.g., Git) and collaboration tools (e.g., Jira, Confluence).
  • Understanding data modeling concepts (e.g., star schema, snowflake schema) and how they relate to data warehousing and analytics.
  • Knowledge of data lakes, data warehousing architecture, and how to design efficient and scalable storage solutions.
  • 3.Soft Skills :

  • Strong problem-solving skills with an ability to troubleshoot complex data issues.
  • Excellent communication skills, with the ability to explain technical concepts to both technical and non-technical stakeholders.
  • Strong attention to detail and a commitment to maintaining data accuracy and integrity.
  • Ability to work effectively in a collaborative, team-based environment.
  • 4.Experience :

  • 3+ years of experience in data engineering, with hands-on experience in building and maintaining data pipelines and systems.
  • Proven track record of implementing data engineering solutions at scale, preferably in large or complex environments.
  • Experience working with data governance, compliance, and security protocols.
  • 5.Preferred Qualifications

  • Experience with machine learning and preparing data for AI / ML model training.
  • Familiarity with stream processing frameworks (e.g., Apache Kafka, Apache Flink).
  • Certification in cloud platforms (e.g., AWS Certified Big Data - Specialty, Google Cloud Professional Data Engineer).
  • Experience with DevOps practices and CI / CD pipelines for data systems.
  • Experience with automation and orchestration tools (e.g., Apache Airflow, Luigi).
  • Familiarity with data visualization and reporting tools (e.g., Tableau, Power BI) to support analytics teams
  • 6.Work Environment :

  • Collaborative and fast-paced work environment.
  • Opportunity to work with state-of-the-art technologies.
  • Supportive and dynamic team culture
  • #LI-AD1

    Create a job alert for this search

    Ai Ml Engineer • San Francisco, CA, United States