
Data Pipeline Engineer (PHP)/ 2 days ago
Quick Summary
About Alphamatician
Alphamatician is an alternative data platform specializing in collecting, cleaning, and structuring public web data into investment-ready datasets for institutional investors. Established in 2012, we offer point-in-time datasets covering over 40,000 global companies, with up to 15 years of historical data. Our clients leverage this data for systematic trading strategies, investment research, and quantitative analysis.
The Role: Data Pipeline Engineer (PHP)
As our first dedicated engineering hire, you will own the critical data collection pipeline that drives our entire business. The core platform, built in PHP (CodeIgniter 4), has been in production for over 14 years. This hands-on role involves monitoring, maintaining, and enhancing a production system that daily collects data from numerous web sources, cleans and maps it to company identifiers, and delivers it to institutional clients via API and SFTP. You will primarily work with PHP, supported by Python and Node.js for tools and sub-projects.
This position focuses on direct interaction with scrapers, parsers, databases, and server infrastructure, rather than building dashboards or running queries. You will diagnose and resolve issues at data sources, taking end-to-end ownership of multiple collection processes and progressively expanding your responsibilities.
What You Will Do Day to Day
- Monitor the data pipeline and error logs across various collection processes.
- Diagnose and resolve data quality issues by working across the codebase, database, and underlying infrastructure.
- Own the full lifecycle of individual data collection processes, from source scraping through cleaning, mapping, and loading.
- Progressively take ownership of additional datasets as you become familiar with the system.
- Maintain and improve scraping and parsing logic as source websites evolve.
- Work with a production MySQL database (RDS) and manage data integrity for large-scale datasets.
- Collaborate directly with the founder to troubleshoot complex or novel issues.
- Contribute to documentation and build institutional knowledge of the pipeline.
Tech Stack
- Core application: PHP (CodeIgniter 4) – primary daily language.
- Supporting tools and sub-projects: Python and Node.js.
- Database: MySQL on AWS RDS.
- Infrastructure: Local servers and AWS (EC2, RDS).
- Delivery: API and SFTP.
What We Are Looking For
- 3–6 years of experience with data pipelines, ETL processes, or web scraping infrastructure in a production environment.
- Strong working knowledge of PHP is essential, as it is the core platform. Experience with Python or Node.js is a plus.
- Solid MySQL skills, including query optimization, performance troubleshooting, and large-scale data management.
- Experience with AWS services, specifically RDS and EC2.
- Comfort with web scraping or web data collection, including managing unpredictable external data sources.
- Ability to diagnose problems across code, database consoles, and server logs.
- Self-directed work style, suitable for a small, remote team environment.
Nice to Have
- Experience in financial data, alternative data, or fintech.
- Experience with CodeIgniter or MVC PHP frameworks.
- Experience in a small company or startup, demonstrating versatility.
- Understanding of data mapping, entity resolution, or securities identifiers (tickers, ISINs).
What This Role Is Not
This is not a data science or analytics position. You will not be building models, writing reports, or directly engaging with clients. This is an engineering and operations role focused on ensuring the reliability and continuous improvement of a complex data collection system. If you thrive on making systems function and enjoy debugging real-world data challenges, this opportunity is for you.
Compensation and Structure
- Salary range: $120,000–$140,000 per year, depending on experience.
- Initial engagement as a 6-month W-2 contract-to-hire, with a clear path to full-time conversion.
- Fully remote, US-based position.
- Direct collaboration with the founder from day one.
How to Apply
Submit your resume and a brief note detailing your experience with data pipelines or web scraping at alphamatician.com/careers. Include links to relevant work, open-source contributions, or a portfolio if available. We prioritize practical experience.
Application Question(s):
- How many years of experience do you have working with PHP in a production environment?
- Do you have experience with data pipelines, ETL, or web scraping infrastructure?
- Are you authorized to work in the United States?
