Ahmed

From Germany (UTC+2)

Site Reliability Engineer|Senior

DevOps|Senior

Skills and seniority verified on Dec 29, 2022

Ahmed – Terraform, AWS, Kubernetes

Ahmed started his career as System Administrator and then switched to SRE/DevOps. He is proficient in AWS, Terraform and Kubernetes, and has also worked on AI and Machine Learning model setup. Ahmed can work as an independent engineer, but sharing knowledge with teammates is what drives his productivity the best. Being result-oriented, Ahmed continuously seeks effective ways to get the job done and deliver maximal customer satisfaction. In addition, he's pretty familiar with the startup kitchen and enjoys the vibe.

11 years of commercial experience in

Cloud computing

E-learning

Edtech

Food and beverages

Information services

Job and career services

Marketplace

Platforms

Main technologies

Terraform

5 years

AWS

8 years

Kubernetes

9 years

Additional skills

Python

Grafana

Prometheus

GCP

Ansible

Microservices

Jenkins

GitHub Actions

Docker

Bash

Apache

PostgreSQL

Redis

CircleCI

Golang

Django

Microsoft Azure

ElastiCache

Redshift

DynamoDB

Lambda

Heroku

Datadog

Node.js

Typescript

Nvidia GPU

Kubeflow

Direct hire

Possible

Ready to get matched with vetted developers fast?

Let’s get started today!

Experience Highlights

Senior SRE

May 2025 - Ongoing1 year 1 month

Project Overview

Europe’s fastest-growing online curated marketplace for special and hard-to-find objects, hosting tens of thousands of weekly auctions across dozens of categories and attracting over 10 million global visitors monthly.

To support heavy, localized spikes during auction closures and handle complex transactional logic across bidding, payments, curation, and fulfillment, it relies on a highly scalable, distributed microservices architecture. The backend services (predominantly built on Ruby on Rails and TypeScript) are fully orchestrated using Google Kubernetes Engine (GKE) on Google Cloud Platform (GCP). The cloud infrastructure leverages GCP’s advanced managed services—including native GKE autoscaling mechanisms, Cloud SQL/Spanner, and advanced networking frameworks—to maintain low-latency, secure, and highly isolated multi-tenant domains across autonomous product engineering squads.

Responsibilities:

Architecting, securing, and maintaining highly available GKE clusters, optimizing Node Pools, HPA (Horizontal Pod Autoscalers), and VPA (Vertical Pod Autoscalers) to efficiently handle traffic surges during peak auction hours;
Standardizing and managing multi-environment GCP infrastructure dynamically using Terraform, ensuring secure workload isolation, IAM roles definition, and seamless network routing across VPCs;
Designing, enhancing, and streamlining CI/CD deployment pipelines to safely deliver scalable microservices, promoting a strong engineering culture of "you build it, you run it";
Configuring and scale a unified GCP-native or open-source monitoring and logging stack (Grafana, Prometheus, Google Cloud Monitoring/Logging), ensuring real-world data isolation and granular visibility into individual product domains;
Performing proactive resource scaling audits using GKE cost-allocation tools to optimize compute and database spending on GCP without sacrificing marketplace throughput or low-latency SLAs;
Acting as an escalation lead for critical backend incidents, leading blameless post-mortems and driving systemic self-healing initiatives to continuously minimize Mean Time to Resolution (MTTR).

Project Tech stack:

Ruby on Rails

Kubernetes

Kubeflow

Ansible

Vault

Terraform

Prometheus

Grafana

GitHub Actions

CircleCI

Senior SRE

Mar 2023 - May 20252 years 2 months

Project Overview

An enterprise-grade, open-source multi-cluster management platform designed to automate the deployment, scaling, and full lifecycle operations of thousands of Kubernetes clusters across hybrid-cloud, multi-cloud, on-premises, and edge environments. It runs the control planes of user clusters as deployments inside a central management (master/seed) cluster. This design provides unparalleled multi-tenancy, maximum resource density, and minimized operational overhead. The platform provides out-of-the-box infrastructure abstraction for major cloud providers (AWS, GCP, Azure, OpenStack, vSphere, Hetzner) alongside integrated open-source tooling like KubeOne (single-cluster lifecycle management), KubeLB (cloud-native load balancing), and advanced Monitoring, Logging, and Alerting (MLA) stacks powered by Prometheus, Grafana, Cortex, and Loki.

Responsibilities:

Designed, maintained, and scaled the infrastructure and deployment pipelines utilizing Terraform, Go, and GitOps workflows to manage multi-tenant master and seed cluster environments;
Oversaw the reliability and performance of highly dense master/seed architectures managing thousands of tenant control planes ($etcd$, API servers, controllers);
Architected and optimized multi-tenant Monitoring, Logging, and Alerting (MLA) frameworks across master and user clusters using Prometheus, Cortex/Thanos, Grafana, and Loki;
Acted as a critical tier for high-severity incidents, leading root-cause analysis (RCA) and driving post-mortems to transition reactive fixes into proactive platform self-healing capabilities;
Collaborated closely with product engineering and open-source maintainers to translate production performance bottlenecks into upstream platform enhancements in KKP, KubeOne, or the Machine Controller;
Maintained, tested, and troubleshot cluster lifecycle automation across highly disparate environments including AWS, Azure, Google Kubernetes Engine (GKE), and Bare-Metal/Edge nodes.

Project Tech stack:

Kubernetes

Terraform

Ansible

Jenkins

GitHub Actions

Nvidia GPU

Golang

AWS

Vault

Python

Senior Site Reliability Engineer

Oct 2021 - Apr 20231 year 6 months

Project Overview

It's an app that provides jobseekers with temporary work fitting their lifestyles.

Responsibilities:

Worked with cutting-edge technologies such as Golang, Docker, Kubernetes, and Prometheus to help customers modernize their IT systems;
Focused on designing, building, and improving the core infrastructure;
Developed and implemented internal systems, processes, and best practices to increase productivity;
Troubleshot Cloud and Linux issues and responded to after-hours escalations.

Project Tech stack:

Ruby on Rails

Python

Prometheus

AWS

Kubernetes

Terraform

Senior Site Reliability Engineer

Mar 2019 - Jul 20212 years 4 months

Project Overview

It's a German web portal focused on cooking.

Responsibilities:

Implemented necessary tools to automatically develop, test, deploy, and monitor the microservices across environments;
Managed CI resources to bootstrap the setup under different cloud providers;
Monitored system services for request tracing and logging.

Project Tech stack:

Kubernetes

Terraform

AWS

GitHub Actions

GitHub

Jenkins

Prometheus

Grafana

GCP

Python

Microservices

Microsoft Azure

Docker

Helm

Flutter

DevOps Engineer

Apr 2018 - Feb 201910 months

Project Overview

It's an Arabic open online courses platform.

Responsibilities:

Prepared and maintained Linux servers;
Performed daily system monitoring;
Reviewed system and application logs;
Performed database administration and scripting/programming tasks;
Managed deployment operations and prepared the automation scripts;
Developed and maintained installation and configuration procedures.

Project Tech stack:

Ruby

Python

Ruby on Rails

Nagios

Jenkins

AWS

Docker

Kubernetes

Ansible

Rancher

Vagrant

Bash

PostgreSQL

Redis

Apache

Nginx

Keep in mind, the experience summary might exclude non-relevant projects

Education

2015

Computer Science

Bachelor's Degree

Languages

English

Advanced

Hire Ahmed or someone with similar qualifications in days

All developers are ready for interview and are are just waiting for your request