Department: Technology
Location: Calgary, AB (In-Office, Hybrid) or Remote (Canada)
Employment Type: Full-Time
ABOUT US
Ziing was founded in 2018 by supply chain and technology experts.
We think of mobility as a breeding ground for collective change. We saw an opportunity to be the first logistics company that didn’t just integrate digital technology into our operations—we founded our entire business upon it. Harnessing the power of data, integrated systems, clean technology, and AI-driven automation, we help our clients deliver on their promise of excellent customer service.
At Ziing, we are setting new standards with our transformative enterprise platform that builds and integrates web, iOS, and Android applications. Our approach combines the forefront of modern technologies, including generative AI and comprehensive Microsoft ecosystems, within a culture that merges the agility of a Startup with the reliability of Enterprise frameworks. We’re committed to innovation, smart risk-taking, and impactful technology solutions.
Our core values of humble, hungry, and mindful are at the forefront of everything we do.
ABOUT THE OPPORTUNITY
We are looking for a Staff Platform Reliability Engineer to own reliability engineering for the Ziing.ai platform.
This is a senior individual contributor role for an engineer who can build reliable systems, not just operate them. You will work across cloud infrastructure, backend services, APIs, integrations, data pipelines, event streams, observability, incident response, deployment reliability, performance, and platform automation.
Ziing.ai powers high-stakes logistics workflows where reliability directly affects delivery performance, customer trust, driver productivity, operational control, SLA adherence, and financial accuracy. The platform coordinates routing, dispatch, driver execution, real-time visibility, compliance workflows, partner handoffs, analytics, and reconciliation across complex delivery networks.
You will help ensure those systems are observable, scalable, resilient, secure, and easy for engineering teams to operate. You will participate in on-call, lead reliability improvements, reduce operational toil, build automation, define SLOs, strengthen incident response, and partner with engineering and product teams to make reliability part of how Ziing.ai is designed and shipped.
This is not a people-management role. It is a high-impact IC role for someone who can lead through technical depth, ownership, influence, and execution.
WHAT YOU'LL OWN
Platform Reliability and Operational Excellence
- Own reliability engineering for critical Ziing.ai platform services across routing, dispatch, driver workflows, visibility, compliance, integrations, analytics, and financial workflows.
- Build and improve the systems, standards, tooling, and practices that allow Ziing.ai to run safely and predictably in production.
- Partner with engineering teams to identify reliability risks early in design, implementation, deployment, and operation.
- Define reliability patterns for services, APIs, queues, databases, event pipelines, mobile backend workflows, and third-party integrations.
- Ensure reliability improvements are tied to business-critical workflows, not only infrastructure-level metrics.
- Support production readiness reviews for major platform changes, customer-impacting launches, and high-risk integrations.
- Help ensure Ziing.ai can operate reliably in regulated, audit-ready, and high-expectation delivery environments.
- Lead or support incident response for high-severity issues affecting Ziing.ai platform availability, data correctness, delivery execution, integrations, or customer-facing workflows.
Infrastructure and Automation
- Identify repetitive operational work and replace it with reliable automation.
- Automate routine checks for service health, integration health, data freshness, queue backlogs, failed jobs, latency spikes, and abnormal workflow behavior.
- Explore practical AI-assisted reliability workflows, including incident summarization, anomaly detection, log analysis, alert correlation, runbook recommendations, and change-impact analysis.
- Partner with platform and backend engineering to improve cloud architecture, infrastructure as code, deployment pipelines, environment consistency, and release safety.
- Improve deployment reliability through progressive delivery, health checks, automated rollback, smoke tests, canary releases, and release observability.
- Build observability across logs, metrics, traces, events, queues, APIs, mobile backend services, integrations, and data pipelines.
- Ensure the team can quickly answer what is broken, who is affected, how severe it is, what changed, and what action is needed.
- Participate in on-call rotations for production services
Integration, Data, and Financial Workflow Reliability
- Own reliability practices for APIs, customer integrations, partner integrations, order ingestion, status events, proof-of-delivery flows, billing events, reconciliation workflows, and downstream analytics.
- Build observability and alerts for integration failures, schema mismatches, delayed events, duplicate events, missing status updates, failed webhooks, and data-quality anomalies.
- Build automated checks that detect abnormal delivery lifecycle patterns before they create customer or operational impact.
- Support safe recovery procedures for integration failures, event replay, data correction, and reconciliation exceptions.
- Ensure reliability systems support compliance, auditability, and operational governance.
- Improve backup, restore, disaster recovery, and business continuity practices.
Cross-Functional Technical Leadership
- Work closely with engineering, product, QA, operations, customer success, support, data, security, and leadership.
- Influence architecture and implementation choices that affect reliability, scalability, performance, observability, and operability.
- Mentor engineers on reliability patterns, incident response, observability, and production ownership.
- Translate production risk into clear technical priorities and business impact.
- Communicate trade-offs clearly across technical and non-technical audiences.
WHAT YOU BRING
- Bachelor’s degree (or higher) in Computer Science, Engineering, Information Systems, Software Engineering, or a related technical field, or equivalent practical experience.
- • Equivalent hands-on experience building and operating production-grade cloud platforms, distributed systems, infrastructure automation, observability, and incident response practices will be strongly considered.
- Cloud, Kubernetes, infrastructure, reliability engineering, security, or DevOps certifications are an asset.
- 6+ years of experience in Site Reliability Engineering, Platform Engineering, Infrastructure Engineering, Backend Engineering, DevOps Engineering, or Production Engineering.
- Strong software engineering ability in at least one language such as Python, Go, Java, TypeScript, C#, or similar.
- Strong understanding of distributed systems, microservices, APIs, asynchronous processing, queues, databases, caching, retries, idempotency, and failure modes.
- Experience with cloud infrastructure on AWS, Azure, or GCP.
- Experience with Kubernetes, containers, Terraform or similar IaC tooling, CI/CD pipelines, and Linux-based systems.
- Experience with observability tools such as Datadog, Prometheus, Grafana, OpenTelemetry, CloudWatch, New Relic, Splunk, Sentry, or similar.
- Experience defining or operating SLOs, SLIs, error budgets, alerting standards, dashboards, runbooks, and incident response practices.
- Experience participating in on-call rotations and responding to production incidents.
- Strong communication skills, especially during incidents, technical reviews, postmortems, and cross-functional prioritization.
- Experience in logistics, transportation, final-mile delivery, field-service software, routing, dispatch, fleet operations, marketplace operations, or operational SaaS.
WHY JOIN ZIING?
You'll be joining a company that's redefining final-mile logistics through technology. We move quickly, solve meaningful problems, and give our people the opportunity to make a measurable impact on both our platform and our business.
WHAT WE OFFER
- Competitive Total Compensation including $1,200 annual cell phone allowance
- 25+ days of paid time off
- Comprehensive group benefits program which includes a Health & Wellness Account
- Employee Assistance Program
- Complimentary access to our onsite gym in downtown Calgary
- Underground parking
Our headquarters are in Calgary where we offer in-office and hybrid work models. We don’t believe in mandates so our Flex with Purpose philosophy opens the opportunity for Remote candidates who fit our technical requirements as well.
APPLICATION PROCESS
Qualified individuals should submit:
- Cover Letter (maximum one page)
- Resume
This integral role is being sourced by Ziing's Talent Team. We are not accepting agency or recruiter submissions at this time.
We thank all applicants for their interest. Only those selected for an interview will be contacted.
The successful candidate will be required to complete background screening, which may include employment references, education verification, and criminal record checks.