Acerca de Cato
- Python & PySpark: Extensive use of PySpark for building distributed data pipelines. Solid understanding of Spark’s lazy evaluation model and performance best practices (e.g., avoiding premature collect/write operations).
- Spark: Knowledgeable in Spark's core concepts and optimization strategies. While recent experience is Python-based.
- Big Data & Data Lakes: Hands-on experience with HDFS, Hive (structured queries), and managing silver/golden layer transformations in large-scale systems.
- Cloud Platforms: Worked with Google Cloud (BigQuery, Cloud Storage) and Azure (Data Factory, Postgres) for data integration and reporting pipelines.
- Infrastructure & CI/CD: Practical knowledge of GitLab CI/CD, Docker, containerized PostgreSQL, and Django applications. Applied version control strategies, branching policies, and regression testing workflows.
- Monitoring & Debugging: Experienced in pipeline reliability, error detection, and data quality checks (e.g., malformed data parsing, separator mismatch, and alerting via logs).
- Security: Follows the principle of least privilege for role-based access management. Advocates for cost-aware, secure deployments.
- ETL/ELT Pipelines: Design, implement, and monitor data pipelines from ingestion to delivery, with quality and performance in mind.
- Reliable Systems: Strong focus on scalable architecture, fault isolation, and production-ready code.
- Clear Communication: Transparent updates and collaborative approach across stakeholders and teams.
Francés
Competencia profesional completa
Español
Bilingüe o nativo
Inglés
Competencia profesional completa
Experiencia
- CatobyteFreelance Data Engineer & NLP enthusiastAGENCIAS DE SUBCONTRATACIÓNmarzo de 2025 - Hoy (1 año y 3 meses)I help teams build practical data workflows using Python, Spark, and cloud tools. I have hands-on experience with batch data pipelines, file format conversion (CSV/Excel to Parquet/ORC), and cloud storage systems like AWS S3 and BigQuery. I've worked with both on-premise clusters and cloud environments to prepare data for analysis and reporting.My strength lies in simplifying complex problems and delivering clean, efficient solutions. I’m also diving into NLP and deep learning, currently exploring real-world applications using tools like HuggingFace and PyTorch.I'm looking for freelance projects—especially ones with an NLP or AI angle where I can contribute while continuing to learn.
- Sabbatical leaveIndependent projectsAGENCIAS DE SUBCONTRATACIÓNenero de 2024 - marzo de 2025 (1 año y 2 meses)Design and Development of a Technology Blog: Created and managed a blog focused on the topics of data engineering, artificial intelligence, and software development. Wrote in-depth articles for a non-technical audience on advanced concepts in data engineering and AI python HTML / CSS / AWS (Route 53 & S3)
- Corum l'ÉpargneData Engineerenero de 2023 - enero de 2024 (1 año)Paris, FranciaDéveloppement et optimisation de scripts Azure Data Factory pour créer et maintenir des flux de données .J'ai travaillé avec SQL et des bases de données relationnelles pour la maintenance et le développement de l'entrepôt de données de l'entreprise, en me concentrant sur la vérification et la transformation des données financières. Azure Data Factory / Azure Data Factory / Python
Recomendaciones
Sé el primero en recomendar a Cato
Ayuda a este freelance a destacar compartiendo tu experiencia.
Estos perfiles de freelance también coinciden con tus criterios
Agatha Frydrych
Backend Java Software Engineer
4.7
(3)
2
Baptiste Duhen
Fullstack developer
4.6
(4)
5
Amed Hamou
Senior Lead Developer
4
(2)
7
Audrey Champion
Web developer
4.3
(3)
4
Formación
- MasterUniversité Paris-Saclay Télécom ParisTech2017Master
- M1 Ingénierie LogicielleUniversité de Rennes 12014M1 Ingénierie Logicielle