$ kubectl describe engineer/zeljko

resource spec

Staff Platform Engineer
@ Consensys - Linea
Apr 2024 – Current
  • Architected multi-AZ Linea Edge EKS clusters with strict network segmentation between core infrastructure and public-facing P2P Execution/Consensus Layer nodes, ensuring operational isolation and resilience.
  • Established a production-grade GitOps delivery pipeline using ArgoCD with the Apps-of-Apps pattern, GitHub Actions CI, and AWS IAM federation — providing fully auditable, declarative Kubernetes deployments.
  • Designed and enforced fine-grained RBAC governance across Kubernetes and ArgoCD, integrating Okta SSO and AWS IAM for centralised identity and least-privilege access control.
  • Deployed a hardened Web3Signer REST service within an isolated AWS account, backed by HSM-based key management for tamper-proof signing of core protocol transactions.
  • Contributed to Infura's Kubernetes Operator, implementing controllers for automated blockchain node lifecycle management — covering scaling, state synchronisation, snapshot backups, and peer discovery.
  • Migrated all EKS clusters from AWS Auto Scaling Groups to Karpenter, achieving a 60% reduction in infrastructure costs through intelligent Spot instance provisioning and bin-packing.
  • Drove continuous cloud cost optimisation by engineering Karpenter NodePool topologies that classify workloads by criticality — routing fault-tolerant, stateless services onto Spot instances while reserving On-Demand capacity for stateful and latency-sensitive components; combined with right-sizing resource requests and consolidation policies, this approach sustained cost efficiency gains without compromising SLO adherence.
  • Delivered comprehensive Kubernetes security hardening: NetworkPolicy enforcement, Istio service mesh with mTLS, Velero disaster recovery, VolumeSnapshot CronJobs, and External Secrets Operator for secrets hygiene.
  • Engineered a custom Go-based pod autodiscovery service that multiplexes Consensus Layer engine API requests across all pods, enabling out-of-sync nodes to transparently resume chain synchronisation without manual intervention.
  • Maintained 100% uptime SLO for public-facing RPC and P2P infrastructure, owning on-call rotations and incident response for a production blockchain network serving millions of transactions daily.
  • Instituted a rigorous post-incident culture through structured Root Cause Analysis (RCA) and blameless post-mortems, systematically eliminating classes of failure and driving measurable improvements in mean time to recovery (MTTR).
  • Defined and tracked service-level indicators (SLIs) and service-level objectives (SLOs) across the platform stack, using error-budget burn alerts to balance reliability investment with feature velocity.
  • Championed proactive reliability engineering by conducting regular game days and chaos experiments against EKS workloads, validating failover paths and strengthening the platform's resilience posture ahead of incidents.
Head of DevOps
@ Route3
May 2023 – Jan 2024
  • Designed and implemented a secure GitOps framework using GitHub Actions with AWS OIDC and IAM role federation, provisioned end-to-end via Terraform — eliminating static credentials from all deployment pipelines.
  • Architected and delivered a full-stack observability platform spanning metrics, logs, traces, and continuous profiling: Grafana, Prometheus, Telegraf, InfluxDB, Loki, and Grafana Pyroscope.
  • Developed a Go-based internal platform service integrating the AWS SDK for dynamic resource orchestration and go-ansible for idempotent service provisioning across EC2 fleets.
Senior DevOps Engineer
@ Polygon Labs
Sep 2021 – May 2023
  • Ranked among the top 5 core contributors to the Polygon Edge open-source project, driving protocol tooling, infrastructure automation, and developer experience improvements across the release lifecycle.
  • Architected a network stress-testing and TPS benchmarking framework capable of targeting any EVM-compatible chain, with automated per-deployment Slack reporting for real-time performance visibility.
  • Led the infrastructure-as-code effort for the Polygon Edge AWS Quick Start, authoring Terraform modules that enabled one-click production-grade deployments on AWS.
  • Spearheaded the creation of official Helm Charts for Polygon Edge, standardising Kubernetes-native deployment patterns for node operators across the ecosystem.
Senior Network Engineer
@ IPHouse d.o.o.
Apr 2017 – Aug 2021
  • Designed and operated core on-premises infrastructure on a VMware vSphere cluster, encompassing BGP/OSPF routing, stateful firewall policies, site-to-site VPN, and enterprise Veeam Backup & Replication with tested DR runbooks.
  • Deployed and maintained IPAM, IDP, and IDS solutions to enforce network governance and threat detection across a multi-tenant environment; led and mentored a team of five engineers.
$ esc
cd ~/ home get blog all posts get projects open-source workloads describe engineer resource spec crash pod CrashLoopBackOff demo get post/easy-mikrotik-backup Mikrotik Backups Made Easy get post/ec2-fleet-commands-without-ssh EC2 Fleet Command Execution Without Opening SSH get post/evm-chain-performance-testing-with-tpser EVM Chain Performance Testing with tpser get post/kubernetes-pvc-snapshot-management-with-kmon Kubernetes Storage Operations Made Easy with kmon get post/teams-direct-routing-without-sbc-hardware Microsoft Teams Direct Routing Without the Hardware SBC get post/veeam-backup-grafana-dashboard Monitoring Veeam B&R with Govein get post/vmware-vcenter-vm-inventory-export-to-excel Exporting VMware vCenter VM Inventory to Excel open job/gombak Go-based automation service for MikroTik router backup management — supports single-device and fleet-wide discovery via L2TP, SSH-based access, configurable retention policies, and system service integration for scheduled unattended backups. open deploy/tsbc Containerised Session Border Controller that bridges SIP/UDP-based PBX systems with Microsoft Teams Direct Routing — orchestrates Kamailio, RTPEngine, and LetsEncrypt TLS to handle signalling and media translation without dedicated SBC hardware. open cronjob/aws-commander CLI tool for fleet-wide remote execution on EC2 instances via AWS SSM Run Command — supports ad-hoc shell commands, script files, and Ansible playbooks, targeting instances by ID or tag without requiring inbound SSH access or open security group rules. open exporter/govein Metrics exporter that queries Veeam Backup & Replication 12+ via its REST API and ships structured job telemetry to InfluxDB 2.0 — ships with a Grafana dashboard template and supports standalone binary, Docker Compose, and Kubernetes Helm deployment. open tool/tpser EVM chain performance testing toolkit with two operating modes — a block-range analyser for historical TPS and gas utilisation reporting, and a sustained load generator for stress-testing nodes at configurable transaction rates over extended durations. open cli/vmex CLI utility that queries VMware vCenter via the vSphere API and exports filtered VM inventory data to formatted Excel workbooks — addressing the limitations of vCenter's native CSV-only export for operational reporting and auditing workflows. open cli/kmon Kubernetes administrative CLI and k9s plugin that automates common storage operations — spins up debug pods from live PVCs, restores volumes from VolumeSnapshots, and generates on-demand or CronJob-scheduled snapshots with configurable snapshot class support.