Matt
From United Kingdom (UTC+1)
Lemon.io stats
1
offers now 🔥Matt – AWS, Kubernetes, Terraform
Matt is a Staff-level Platform and SRE Engineer with extensive experience designing cloud-native platforms on Kubernetes, AWS, and GCP using Terraform and Go. He has led large-scale infrastructure migrations, developed Kubernetes operators, and built internal developer platforms that improve scalability and engineering productivity. Matt combines hands-on technical expertise with strong leadership, mentoring engineers and collaborating effectively with senior stakeholders across engineering and product. He excels at modernizing legacy infrastructure, driving reliability initiatives, and solving complex distributed systems challenges.
16 years of commercial experience in
Main technologies
Additional skills
Direct hire
PossibleReady to get matched with vetted developers fast?
Let’s get started today!Experience Highlights
Staff Site Reliability Engineer
Worked on internal tooling, management, and reliability of multi-petabyte ClickHouse clusters, including migrating Kafka-fed ingestion clusters processing 100,000s of events per second from EC2 to Kubernetes.
- Led the migration of EKS-based deployments to EC2;
- Served as the primary developer of an internal ClickHouse query proxy and routing layer;
- Participated in the on-call rotation, supporting all ClickHouse clusters and ingestion pipelines.
Staff Platform Engineer
Served as European Tech Lead for the Platform Infrastructure team, leading the development of a Go-based Kubernetes operator for automated management of Neo4j clusters, databases, users, and infrastructure capacity.
- Served as the European Tech Lead for the Platform Infrastructure team;
- Designed and led the development of a production-grade Kubernetes operator for managing Neo4j clusters and databases;
- Led the re-architecture of Kubernetes cluster deployments and management using Argo CD;
- Defined platform engineering strategy in collaboration with other tech leads, product management, and senior stakeholders;
- Partnered with product management, senior stakeholders, and engineering leadership to strengthen team trust, psychological safety, and cross-functional collaboration;
- Mentored senior engineers.
Lead Platform Engineer
Led the development of a Kubernetes-native internal developer platform supporting on-premises and cloud deployments, including secrets management, IAM, observability, developer tooling, GitHub-based authentication, and application containerization.
- Spearheaded the introduction of Platform Engineering as the founding engineer and lead of the Platform team;
- Owned the team roadmap and identified key pain points and areas for Platform support;
- Established mTLS-encrypted network meshing between cloud-based Kubernetes clusters and on-prem/colocated bare metal hardware;
- Led the implementation and delivery of new platform solutions as the project owner;
- Managed and mentored engineers on the Platform team;
- Deployed secrets management solutions via OpenBao, including supporting tooling and integrations;
- Centralized identity management via Keycloak as a primary auth solution for teams and services;
- Delivered Kubernetes deployments as-a-service;
- Created supporting tooling projects primarily in Go, including a templating engine for configuration files and secrets setups.
Platform Tech Lead
Architected and delivered a customer-facing control plane for dedicated PostgreSQL deployments while leading the migration of microservices from AWS ECS to Kubernetes as part of a cell-based platform rearchitecture.
- Owned the roadmap for the Platform team;
- Liaised with other teams to promote best practices and solve workflow pain points;
- Spearheaded a year-long project developing a control plane for dedicated database hardware on Kubernetes;
- Created a custom Kubernetes operator handling custom event emission and routing;
- Established dynamic network and routing flows with Istio and supporting tooling;
- Supported the migration of existing services from ECS to Kubernetes;
- Contributed to mentoring, cost optimization, SLI/SLOs, alerting, on-call, CI/CD assessment and expansion, and hiring roadmaps;
- Delivered infrastructure as code with Pulumi and Terraform.
Staff Platform Engineer
Led the migration of Ruby-based clinical services from AWS ECS to EKS and implemented Argo CD-based continuous delivery, environment promotion, and developer tooling for secrets management.
- Served as the technical lead for the Platform Engineering team;
- Helped define and establish the Platform Engineering technical roadmap;
- Improved key pain points in the SDLC, infrastructure management, and reliability architectural review;
- Led secrets management improvements;
- Set the technical direction and excellence for the team;
- Created a proposal for migration to Kubernetes;
- Executed and led the Kubernetes migration as the primary SME;
- Improved CI/CD strategy and execution;
- Rearchitected secrets management;
- Moved the engineering organization towards a GitOps model of operation.
Senior SRE, Delivery:Orchestration
Enhanced and optimized delivery pipelines for self-managed and SaaS platform offerings, improving the safety, efficiency, and scalability of feature delivery.
- Created workflows, frameworks, architecture, and automation for Engineering teams to reach production effectively and efficiently;
- Enabled developer groups to adopt self-serve deployments and releases through tooling creation and product contribution;
- Collaborated extensively with multiple internal teams in asynchronous and synchronous forms;
- Used Kubernetes, Chef, Terraform, Prometheus, Thanos, Grafana, and GitLab CI/CD capabilities in depth.
Director of Core Engineering
Served as Founding SRE and later Director at an early-stage startup, building the observability platform, establishing SLIs/SLOs for critical user journeys, and leading engineering organization growth and team transformation.
- Joined as the founding member of a new SRE team responsible for reliability, operational excellence, and production readiness of platform services and infrastructure;
- Expanded the team to include principal engineers across stack disciplines;
- Introduced Honeycomb as the primary observability platform;
- Led workshops on Honeycomb and worked with cross-functional teams to design dashboards and representative SLOs;
- Created a proof-of-concept Cortex-backed Prometheus and Grafana deployment on Kubernetes;
- Oversaw the management and administration of Kubernetes clusters running in GKE;
- Introduced Jsonnet and Tanka for Kubernetes resource declaration and management;
- Brought GCP configuration and resources under Terraform management from scratch;
- Wrote and introduced a production readiness review stage for new services and larger feature sets;
- Oversaw the Core Engineering team across SRE, development, and security after promotion.
Principal SRE
Led platform engineering initiatives across observability, security, deployments, performance, tooling, and infrastructure during SaaS expansion from 7 to over 50 cloud regions across four cloud providers, establishing the foundation of the observability team and leading an internal guild of over 30 engineers driving reliability improvements.
- Worked on performance testing, CDN migrations, and extensive Terraform work;
- Served as a founding member of the Cloud Observability Team;
- Took responsibility for internal logging and metrics ingest pipelines, APM integrations, and dashboard and alerting creation and curation;
- Led the ZooKeeper Guild, a cross-functional group of over 30 engineers;
- Created a metrics, monitoring, and alerting strategy that became the foundation of the Cloud Observability team;
- Became a subject-matter expert on ZooKeeper and drove platform stability improvements;
- Helped move from a single centralised logging and metrics ingest solution to over 30 Elasticsearch deployments across multiple regions using Beats and Logstash;
- Planned and executed a migration from Akamai to Google Cloud CDN;
- Co-developed a dedicated standalone testing environment for a primary datastore using Terraform, Java, and Clojure;
- Mentored less senior engineers and collaborated closely on roadmap planning and cross-team efforts.
Systems Engineering Lead
Served as Lead Engineer, managing a team of 6+ Systems Engineers developing reusable Terraform and Puppet modules in collaboration with Architecture for a heavily regulated healthtech environment.
- Managed and mentored team members;
- Provided technical leadership for a team of 5+ systems engineers and supported hiring and team growth;
- Created and expanded team standards for software quality, security, and shared responsibility;
- Introduced infrastructure as code and orchestration using Terraform for AWS infrastructure and account configuration;
- Completed extensive security reviews of infrastructure, systems, and software;
- Architected and executed large migration projects, including the introduction of microservices architectures;
- Introduced CI through Jenkins and static analysis with CheckMarx;
- Re-architected existing software and systems for improved reliability and stability;
- Migrated a large development team to Vagrant-managed VMs on personal laptops;
- Created standardized log aggregation and monitoring with a central Elasticsearch deployment running on Docker;
- Improved Solr architecture by moving to a ZooKeeper-managed multi-node deployment;
- Optimized AWS account performance and cost;
- Performed advanced system administration across Linux and Windows hosts;
- Automated infrastructure lifecycles, deployments, support tasks, and security operations across AWS accounts and products.