How to Monitor SSL Certificates Across Multi-Cloud Infrastructure
Monitoring TLS certificates across AWS, Azure, GCP, Cloudflare, and on-prem is harder than it looks. Here's a practical playbook for centralized visibility in multi-cloud environments.
If your infrastructure lives in one cloud, you can probably get by with that cloud’s native certificate tooling. AWS Certificate Manager, Google Cloud’s managed certificates, Azure Key Vault — each one handles the happy path well. The moment you span two clouds, or mix cloud with on-prem, or use a CDN in front of origin servers, native tooling stops being enough.
This post is a practical playbook for teams who discovered, usually the hard way, that certificates are a cross-cutting concern.
Why native tooling breaks down
Each platform optimizes for certificates it issues itself:
- AWS ACM auto-renews certs it issued for use with ALB, CloudFront, API Gateway, and friends. It does not auto-renew imported certs. It does not tell you about certs running on EC2 instances unless you wire that up yourself.
- Cloudflare manages the edge cert for proxied traffic. If you have a grey-clouded record, or a direct-to-origin service, Cloudflare has no visibility.
- Google-managed SSL certificates work for GCLB but not for self-managed GKE ingresses, GCE instances, or anything outside the load balancer.
- Azure Key Vault stores certs but doesn’t know which services are actually using them. Renewing in Key Vault doesn’t propagate to App Services or AKS ingresses without additional automation.
The common failure mode: a team adopts a second cloud, assumes the first cloud’s tooling still covers them, and discovers a blind spot only when an expired cert triggers an outage.
The visibility layer: what you actually need
Forget the tooling for a moment. Think about what questions you need to answer in a 3 AM incident:
- What certificates exist for our domains — anywhere?
- Which endpoints serve which certificates?
- Which certificates are expiring in the next 30 / 14 / 7 days?
- Is the live chain on each endpoint complete and trusted?
- Did any certificate change unexpectedly in the last 24 hours?
No single cloud provider’s UI answers all five of those. You need a layer above the clouds.
Building the visibility layer
You have three options, roughly in order of effort and reliability.
Option 1: Scripts and spreadsheets (don’t)
A cron job that runs openssl s_client against a list of domains and emails you on problems. This is where every team starts. It’s fine for <20 certs in a single environment. It breaks the moment you have:
- DNS load balancing returning different answers per PoP
- Cloudflare in front of origins (you check the edge, miss the origin)
- Cert rotation happening mid-scan
- Ops turnover and the list goes stale
You will find missing endpoints only after an outage.
Option 2: Certificate Transparency + synthetic monitoring
A much stronger approach:
- Discover via CT logs. Every publicly trusted certificate gets logged to Certificate Transparency logs within hours of issuance. Subscribing to CT logs (or querying services like crt.sh or CertSpotter) tells you about certs without needing to know the endpoint in advance — catching shadow IT, forgotten subdomains, and certs issued by other teams.
- Verify via live TLS handshake. For each endpoint you care about, connect and read the actual cert chain. This catches the “cert rotated but deployment forgot” failure mode.
- Monitor from multiple vantage points. If you’re globally distributed, check from multiple regions. DNS-based load balancing can hand different clients different certs.
This works but requires real engineering time. You’re building a small SaaS inside your company.
Option 3: A dedicated tool
At some point, the ROI of building your own stops making sense. This is the category CertShield lives in: take the CT-log discovery, the live TLS handshake verification, and the alerting pipeline, and hand it to you as a service. Other tools in this space include SSL Labs (great for ad-hoc audits, not monitoring), Datadog (if you already use them, their SSL integration is decent), and Nagios plugins (if you enjoy configuring Nagios).
The decision criteria: how much time will your team spend building and maintaining the visibility layer versus paying for it? If you have fewer than 50 certs, scripts will suffice. Between 50 and 500, the scripts become a meaningful burden. Above 500, you really need a purpose-built tool.
The practical checklist
Whatever you use, here’s what your monitoring must do:
- Discovery is continuous, not one-time. New services spin up daily. New certs get issued without anyone filing a ticket. Your discovery pipeline needs to keep running forever.
- You monitor the live chain, not the file on disk. A valid cert on the server is useless if the web server is still serving the old one.
- Alerts go somewhere humans actually read. Email alone is a black hole. Wire it into PagerDuty, Slack, or your incident channel with proper severity thresholds.
- You alert on unexpected changes, not just expiration. A cert rotating unexpectedly can indicate a compromise, a misconfiguration, or a rogue deployment. All three are worth knowing about.
- Expiry alerts fire with enough runway. 30 days minimum for non-automated certs; 7 days for ACME-automated ones.
What to do this week
- List your domains. Every apex and subdomain your organization uses.
- Run a CT log discovery against each apex domain (crt.sh is free and fast for small batches). Count the results. If the number surprises you, that’s your blind spot.
- For the top 20 most critical endpoints, pick two third-party vantage points (cron VM in a different cloud, GitHub Actions runner, etc.) and verify the chain from outside your infrastructure.
- Write down the answer to “who gets paged if a cert expires on a Sunday at 2 AM?” If the answer is “nobody, we’d find out Monday,” fix that.
Multi-cloud certificate monitoring isn’t glamorous work, but the alternative is learning about your blind spots during an outage. Pick one of the three options above and start this week.
CertShield is the managed visibility layer for multi-cloud TLS. Start monitoring in minutes →