Sr. Site Reliability Engineer Job at Talent Groups, Mckinney, TX

NDBndG55WWFRMU5XemtPR1JtSHVHdDk1Z2c9PQ==
  • Talent Groups
  • Mckinney, TX

Job Description

Senior Site Reliability Engineer (Contract to Hire)

Location: McKinney, TX (Hybrid, 2–3 days onsite)

Must be authorized to work in the U.S.

Overview:

Our client is seeking a Senior Site Reliability Engineer to lead platform reliability and traffic enforcement in a Kubernetes-hosted SASE (Secure Access Service Edge) environment. This role ensures high availability, observability, and fair multi-tenant traffic handling across distributed systems.

Key Responsibilities:

Platform Reliability & Operations

  • Own uptime (target: 99.99%) and stability of multi-region Kubernetes environments.
  • Architect resilient, scalable infrastructure with proactive capacity planning and automated remediation.
  • Lead incident response, root cause analysis, disaster recovery, and change management.

Observability & Monitoring

  • Build a full-stack observability pipeline (Prometheus, OpenTelemetry, Grafana, etc.).
  • Implement golden signals, tracing, and alerting to drive real-time performance insights.
  • Develop automation for issue detection and resolution.

Kubernetes & Infrastructure

  • Manage full Kubernetes lifecycle (upgrades, autoscaling, GitOps automation).
  • Integrate and optimize OpenStack-based infrastructure beneath Kubernetes.
  • Enforce security compliance, resource efficiency, and FinOps best practices.

Traffic Enforcement & Networking

  • Design a Kubernetes-native traffic control layer for per-tenant/session enforcement.
  • Implement CRDs, custom controllers, and service mesh (e.g., Istio, Linkerd) for dynamic policy management.
  • Operate SDN telemetry agents (Cilium Hubble, WireGuard) and integrate with observability stack.

Leadership & Strategy

  • Contribute to infrastructure architecture and reliability strategy.
  • Mentor team members and promote Kubernetes best practices.
  • Partner cross-functionally across engineering, security, and product teams.

Required Skills:

  • Kubernetes in production across multi-region architectures.
  • Observability tools: Prometheus, OpenTelemetry, Grafana, Jaeger, Loki.
  • Strong Linux networking (tc, nftables, WireGuard, iptables).
  • Infrastructure automation: Helm, Terraform, ArgoCD/Flux (GitOps).
  • Programming: Go (preferred), Python/Bash scripting.
  • Familiarity with OpenStack (Nova, Neutron, Ceph) and CNI (Cilium preferred).

Preferred Experience:

  • Service mesh deployment (Istio, Linkerd), multi-cluster tools (Fleet, Rancher).
  • Chaos engineering frameworks (Chaos Mesh, Litmus).
  • Developer platform abstraction on Kubernetes.
  • FinOps cost optimization practices.
  • Edge Kubernetes and NFV/SDN background.
  • Active participation in the Kubernetes community.

Job Tags

Contract work,

Similar Jobs

Yale New Haven Health

Credentialing Specialist Job at Yale New Haven Health

 ...and compassion - must guide what we do, as individuals and professionals, every day. Under the supervision of the Manager, the Credentialing Specialist is responsible for the execution of all activities related to the appointment or re-appointment of Members and... 

Agiliti

Medical Equipment Service and Delivery Driver Job at Agiliti

 ...The Medical Equipment Service and Delivery Driver is responsible for driving to and from healthcare locations to complete delivery and equipment...  ...for a district office, including customer delivery and pick-up of medical equipment; processing, cleaning, inspecting, and... 

Home Energy Pros

Canvassers Job at Home Energy Pros

 ...Vision insurance Home Energy Pros is expanding in Fort Wayne, and were looking for motivated, outgoing individuals to join our canvassing team! As a Canvasser, youll be the first point of contact with homeownerscreating interest, starting conversations, and setting... 

Arkansas Poly & Printing

Print Operator Job at Arkansas Poly & Printing

 ...Job Description Job Description Now hiring multiple positions within our Print Department! ~ Work 36/48~3-4 day workweeks ~ Day shift 9 am-9 pm - Night Shift 9 pm-9 am ~40 hrs PTO after 90 days of employment ~ Health insurance available 1st of the month,... 

Ladder

Electrician Helper with Ireland Electric Corporation Job at Ladder

 ...Description Helpers will take direction and assist electricians in all aspects of project; we prefer candidates with experience bending pipe, using power tools, working on ladders, dirt work, etc. However, if the right candidate shows strong interest in learning, we...