Ahmed – Terraform, AWS, Kubernetes
Ahmed started his career as System Administrator and then switched to SRE/DevOps. He is proficient in AWS, Terraform and Kubernetes, and has also worked on AI and Machine Learning model setup. Ahmed can work as an independent engineer, but sharing knowledge with teammates is what drives his productivity the best. Being result-oriented, Ahmed continuously seeks effective ways to get the job done and deliver maximal customer satisfaction. In addition, he's pretty familiar with the startup kitchen and enjoys the vibe.
11 years of commercial experience in
Main technologies
Additional skills
Direct hire
PossibleReady to get matched with vetted developers fast?
Let’s get started today!Experience Highlights
Senior SRE
Europe’s fastest-growing online curated marketplace for special and hard-to-find objects, hosting tens of thousands of weekly auctions across dozens of categories and attracting over 10 million global visitors monthly.
To support heavy, localized spikes during auction closures and handle complex transactional logic across bidding, payments, curation, and fulfillment, it relies on a highly scalable, distributed microservices architecture. The backend services (predominantly built on Ruby on Rails and TypeScript) are fully orchestrated using Google Kubernetes Engine (GKE) on Google Cloud Platform (GCP). The cloud infrastructure leverages GCP’s advanced managed services—including native GKE autoscaling mechanisms, Cloud SQL/Spanner, and advanced networking frameworks—to maintain low-latency, secure, and highly isolated multi-tenant domains across autonomous product engineering squads.
- Architecting, securing, and maintaining highly available GKE clusters, optimizing Node Pools, HPA (Horizontal Pod Autoscalers), and VPA (Vertical Pod Autoscalers) to efficiently handle traffic surges during peak auction hours;
- Standardizing and managing multi-environment GCP infrastructure dynamically using Terraform, ensuring secure workload isolation, IAM roles definition, and seamless network routing across VPCs;
- Designing, enhancing, and streamlining CI/CD deployment pipelines to safely deliver scalable microservices, promoting a strong engineering culture of "you build it, you run it";
- Configuring and scale a unified GCP-native or open-source monitoring and logging stack (Grafana, Prometheus, Google Cloud Monitoring/Logging), ensuring real-world data isolation and granular visibility into individual product domains;
- Performing proactive resource scaling audits using GKE cost-allocation tools to optimize compute and database spending on GCP without sacrificing marketplace throughput or low-latency SLAs;
- Acting as an escalation lead for critical backend incidents, leading blameless post-mortems and driving systemic self-healing initiatives to continuously minimize Mean Time to Resolution (MTTR).
Senior SRE
An enterprise-grade, open-source multi-cluster management platform designed to automate the deployment, scaling, and full lifecycle operations of thousands of Kubernetes clusters across hybrid-cloud, multi-cloud, on-premises, and edge environments. It runs the control planes of user clusters as deployments inside a central management (master/seed) cluster. This design provides unparalleled multi-tenancy, maximum resource density, and minimized operational overhead. The platform provides out-of-the-box infrastructure abstraction for major cloud providers (AWS, GCP, Azure, OpenStack, vSphere, Hetzner) alongside integrated open-source tooling like KubeOne (single-cluster lifecycle management), KubeLB (cloud-native load balancing), and advanced Monitoring, Logging, and Alerting (MLA) stacks powered by Prometheus, Grafana, Cortex, and Loki.
- Designed, maintained, and scaled the infrastructure and deployment pipelines utilizing Terraform, Go, and GitOps workflows to manage multi-tenant master and seed cluster environments;
- Oversaw the reliability and performance of highly dense master/seed architectures managing thousands of tenant control planes ($etcd$, API servers, controllers);
- Architected and optimized multi-tenant Monitoring, Logging, and Alerting (MLA) frameworks across master and user clusters using Prometheus, Cortex/Thanos, Grafana, and Loki;
- Acted as a critical tier for high-severity incidents, leading root-cause analysis (RCA) and driving post-mortems to transition reactive fixes into proactive platform self-healing capabilities;
- Collaborated closely with product engineering and open-source maintainers to translate production performance bottlenecks into upstream platform enhancements in KKP, KubeOne, or the Machine Controller;
- Maintained, tested, and troubleshot cluster lifecycle automation across highly disparate environments including AWS, Azure, Google Kubernetes Engine (GKE), and Bare-Metal/Edge nodes.
Senior Site Reliability Engineer
It's an app that provides jobseekers with temporary work fitting their lifestyles.
- Worked with cutting-edge technologies such as Golang, Docker, Kubernetes, and Prometheus to help customers modernize their IT systems;
- Focused on designing, building, and improving the core infrastructure;
- Developed and implemented internal systems, processes, and best practices to increase productivity;
- Troubleshot Cloud and Linux issues and responded to after-hours escalations.
Senior Site Reliability Engineer
It's a German web portal focused on cooking.
- Implemented necessary tools to automatically develop, test, deploy, and monitor the microservices across environments;
- Managed CI resources to bootstrap the setup under different cloud providers;
- Monitored system services for request tracing and logging.
DevOps Engineer
It's an Arabic open online courses platform.
- Prepared and maintained Linux servers;
- Performed daily system monitoring;
- Reviewed system and application logs;
- Performed database administration and scripting/programming tasks;
- Managed deployment operations and prepared the automation scripts;
- Developed and maintained installation and configuration procedures.