Descripción

My Toolkit & Experience

Python & PySpark: Extensive use of PySpark for building distributed data pipelines. Solid understanding of Spark’s lazy evaluation model and performance best practices (e.g., avoiding premature collect/write operations).
Spark: Knowledgeable in Spark's core concepts and optimization strategies. While recent experience is Python-based.
Big Data & Data Lakes: Hands-on experience with HDFS, Hive (structured queries), and managing silver/golden layer transformations in large-scale systems.
Cloud Platforms: Worked with Google Cloud (BigQuery, Cloud Storage) and Azure (Data Factory, Postgres) for data integration and reporting pipelines.
Infrastructure & CI/CD: Practical knowledge of GitLab CI/CD, Docker, containerized PostgreSQL, and Django applications. Applied version control strategies, branching policies, and regression testing workflows.
Monitoring & Debugging: Experienced in pipeline reliability, error detection, and data quality checks (e.g., malformed data parsing, separator mismatch, and alerting via logs).
Security: Follows the principle of least privilege for role-based access management. Advocates for cost-aware, secure deployments.

Knowledge Sharing & community

I run a trilingual blog (EN/ES/FR) focused on Data Engineering and AI, hosted on AWS. I use it to demystify complex topics for a broader audience and sharpen my own understanding by teaching others.

What I Offer

ETL/ELT Pipelines: Design, implement, and monitor data pipelines from ingestion to delivery, with quality and performance in mind.

Reliable Systems: Strong focus on scalable architecture, fault isolation, and production-ready code.
Clear Communication: Transparent updates and collaborative approach across stakeholders and teams.

Idiomas

Francés
Competencia profesional completa
Español
Bilingüe o nativo
Inglés
Competencia profesional completa

Preferencias de lugar de trabajo

Acepta trabajo presencial

Barcelona (hasta 50 km)

Catobyte
Freelance Data Engineer & NLP enthusiast
AGENCIAS DE SUBCONTRATACIÓN
marzo de 2025 - Hoy (1 año y 3 meses)
I help teams build practical data workflows using Python, Spark, and cloud tools. I have hands-on experience with batch data pipelines, file format conversion (CSV/Excel to Parquet/ORC), and cloud storage systems like AWS S3 and BigQuery. I've worked with both on-premise clusters and cloud environments to prepare data for analysis and reporting.

My strength lies in simplifying complex problems and delivering clean, efficient solutions. I’m also diving into NLP and deep learning, currently exploring real-world applications using tools like HuggingFace and PyTorch.

I'm looking for freelance projects—especially ones with an NLP or AI angle where I can contribute while continuing to learn.
Sabbatical leave
Independent projects
AGENCIAS DE SUBCONTRATACIÓN
enero de 2024 - marzo de 2025 (1 año y 2 meses)
Design and Development of a Technology Blog: Created and managed a blog focused on the topics of data engineering, artificial intelligence, and software development. Wrote in-depth articles for a non-technical audience on advanced concepts in data engineering and AI python HTML / CSS / AWS (Route 53 & S3)
Corum l'Épargne
Data Engineer
enero de 2023 - enero de 2024 (1 año)
Paris, Francia
Développement et optimisation de scripts Azure Data Factory pour créer et maintenir des flux de données .J'ai travaillé avec SQL et des bases de données relationnelles pour la maintenance et le développement de l'entrepôt de données de l'entreprise, en me concentrant sur la vérification et la transformation des données financières. Azure Data Factory / Azure Data Factory / Python