Data Engineer

OneSource Regulatory
McKinney, TX, US
Full-time

Job Description

Job Description

Salary : Company Introduction

Company Introduction

OneSource Regulatory Technology hosts a number of innovative solutions to enhance job performance in the Pharmaceutical space.

OSR Technology is looking for an experienced and dedicated data engineer to join our product solutions team!

Job Description

OneSource Regulatory is trying to identify a full-time contractor with at least 4+ years of experience to assist us with ongoing R&D projects.

We are looking for a data engineer to pull data from various sources and do all the necessary steps to clean, normalize, possibly annotate, and finally load the data into databases.

The candidate should be able to develop and implement a strategy for testing the data integrity of the collected data. This role requires extreme attention to detail to ensure data quality is top priority.

Responsibilities

  • Well versed in parsing and synthesizing of XML and / or JSON documents.
  • Curating of data that can involve some intermediate to advanced web scraping. (data may need to be fetched via SFTP, FTP, Wget, Curl, REST APIs, GraphQL queries from spots on the Internet)
  • Proficiency with Linux command line and various simple tools, such as grep, wc, sed, awk, find, ls, cat, piped commands and possibly some very light Bash shell scripting, setting up crontab schedules and programs
  • Must have basic knowledge of SQL with the following databases : PostGres, MySQL, Google BigQuery
  • Must have basic knowledge of No-SQL database knowledge such as MongoDB or similar
  • Familiarity with basic Cloud technology such as storage buckets, cloud serverless functions
  • Must have experience extracting text and images from PDF files
  • Knowledge of Puppeteer or other automatable web client technologies
  • Understanding JavaScript, HTML / CSS and HTTP methods (for understanding page structure for web scraping)

Skills

  • Solid experience with Python and Python Libraries such as Pandas, requests, etc
  • Skill set should match up with required responsibilities listed above
  • Strong English skills (e.g. grammatical analysis and rhetorical structure)
  • Team Player
  • Great communication skills

Bonus Skills

  • Experience within the Pharmaceutical Space
  • Ability to expose data via C# NETCore and / or GraphQL
  • Google Cloud Platform (Cloud Buckets, Google Cloud Functions (.NET, Python, Node.JS))
  • Ability to parallelize data manipulation and scraping via Python multi-threading, etc.
  • Python BeautifulSoup
  • Scrapy
  • Docker (setting up Kubernetes style processing if warranted for data scraping / data ingestion / normalization)
  • Multithreading concepts
  • 30+ days ago
Related jobs
Promoted
Connective Talent
TX, United States

Join a leading maritime technology organization in the US with over 150 years of history and the stable backing of the US government and military as a Data Engineer. Hands-on experience with Azure services: Data Factory, Synapse, and Databricks. Familiarity with Spark SQL or Databricks for data proc...

Promoted
JP Morgan Chase & Co.
Plano, Texas

As a Lead Software Engineer at JPMorgan Chase within the Partner Channel product engineering group, you will build data solutions that enhance partner experiences and expand our customer value proposition. As an experienced member of our Software Engineering Group we look first and foremost for peop...

Promoted
Bank of America Corporation
Plano, Texas

Data Discovery Lead is an integral role within the Enterprise Data Protection and Privacy (EDPP)Team. Data Discovery Process, System of records of discovery reports, Data Discovery tools. Preferably data Discovery, including but not limited to Data Masking, Encryption, Protection and Privacy. The jo...

Promoted
AT&T
Plano, Texas

Design and build large and complex data sets, from spurious sources while thinking strategically about uses of data and how data use interacts with data design. Perform data studies and data discovery around new data sources or new uses for existing data sources. Mandatory: Expertise in SQL (MySQL, ...

Capital One
Plano, Texas

Be part of a group of engineers building data pipelines using big data technologies (Spark, Flink, Kafka, Snowflake, AWS Big Data Services, Snowflake, Redshift) on medium to large scale datasets. We are seeking Data Engineers who are passionate about marrying data with emerging technologies. Plano 1...

eSmartloan
Plano, Texas

Be part of a group of engineers building data pipelines using big data technologies (Spark, Flink, Kafka, Snowflake, AWS Big Data Services, Snowflake, Redshift) on medium to large scale datasets. We are seeking Data Engineers who are passionate about marrying data with emerging technologies. Senior ...

PepsiCo
Plano, Texas

PepsiCo's Data Management and Operations team is tasked with the responsibility of developing quality data collection processes, maintaining the integrity of our data foundations and enabling business leaders and data scientists across the company to have rapid access to the data they need for decis...

JPMorgan Chase Bank, N.A.
Plano, Texas

Lead Testing / QA function, manage the backlog for Agile development, and identify Test Scenarios while writing Positive and Negative Test Cases * Manage Project Task Estimations and Risk and Management Issues * Execute Test Cases and document results in tool such as qTest ...

Lorven Technologies
Plano, Texas

Bachelor's or Master's degree in Computer Science, Electrical Engineering, Computer Engineering. Hands-on experience analyzing and exploring large data sets utilizing Python or Databricks/Spark. SQL and querying large data sets. Hands-on experience with AI engineering and AI/ML projects. ...

USAA
Plano, Texas

Design, build, manage and optimize data pipelines for data structures encompassing data transformation, data models, schemas, metadata, data quality, and workload management. Bachelor’s degree in Computer Science, Computer Software, Computer Engineering, Applied Sciences, Mathematics, Physics, or re...