Deliverability
Test emails are still emails: The inbox is not your sandbox
Test Emails Are Still Emails: The Inbox Is Not Your Sandbox
Testing your email setup is smart. Testing it by blasting real mailbox providers with fake traffic? Less smart. Here’s how to validate your sending workflow without creating bounces, throttling, spam trap hits, or reputation problems.
Developers love a test. Marketers love a preview. Product teams love a staging environment.
Mailbox providers, however, do not love receiving 248,000 “password reset” emails for one lonely Gmail address because someone wanted to see what would happen.
And that’s the thing about email testing: something always happens.
Misconceptions about test emails
A test email doesn’t disappear into a harmless little QA fog. If you send it through a real mailstream, it behaves like real email. It gets handed off, evaluated, throttled, deferred, bounced, retried, logged, and, depending on how spicy your test gets, remembered. In other words: your test email did not poof into the ether. It got a little backpack and went looking for a home.
Your intent may be “just testing.”
Your traffic may be saying, “Hello, I am a malfunctioning robot with boundary issues.”
Your test emails are still emails
Testing is good. Please test. Test your password resets. Test your signup flows. Test your templates. Test your API integration. Test your SMTP submission. Test your webhooks. Test your links. Test your rendering. Test your whole beautiful little machine before it starts emailing actual humans.
But testing email is different from testing something that stays inside your own application.
When your test triggers a real outbound message, you are no longer just testing your app. You are generating real mailstream activity. And testing receivers’ patience.
That means mailbox providers are seeing:
- repeated messages to the same recipient
- high volumes to one domain
- mail to nonexistent addresses
- mail to typo domains
- repeated retries
- unusual spikes
- deferrals, throttling, and blocks
- bounces that count towards your sending patterns
- alerting functions that send one email for every success, failure, or retry
- production systems accidentally using dev/staging notification logic
- unpaced alerts to one Gmail address, company inbox, or shared alias
- email notifications used as logs instead of actual alerting
- traffic that looks less like QA and more like abuse looking for a backdoor
Gmail is not sitting there saying, “Aw, they’re probably QA-ing something, bless their hearts.”
Mailbox providers cannot see your Jira ticket. They do see your traffic.
“Fake” addresses are not always fake
One of the most common testing mistakes is assuming a domain is fake because you made it up.
Unfortunately, the internet does not check with your imagination before registering domains.
That random domain you “invented” could belong to someone else. It could route to Google Workspace. It could use Microsoft 365. It could be parked, monitored, expired, repurchased, sinkholed, or configured in a way that still creates real delivery attempts and real bad reputation signals.
Even when a domain truly does not accept mail, your messages may still generate DNS lookups, connection attempts, retries, bounces, and log noise. Multiply that by thousands (or hundreds of thousands) and suddenly your harmless test has become everyone else’s problem with a sketchy subject line (believe it or not, 100,000 emails that just say “hi” to one address does not look as innocuous as you’d think).
And then there are typo domains.
Spam traps are email addresses used to identify senders with poor acquisition, validation, or list hygiene practices. Spamhaus describes typo domain traps as traps on domains that look similar to common domains, like misspelled versions of major mailbox providers. Mail sent to those traps can suggest the sender is trying to reach real people but is using bad address data.
So if you think you are the first Poindexter to type gmial.com, well, you’re wrong, and Spamhaus has a little surprise for you.
Please stop load testing mailbox providers
Load testing your application is responsible.
Load testing Gmail, Outlook, Yahoo, or someone’s corporate email domain by accident is demonstrably not that.
If you need to test whether your system can handle high traffic, isolate the parts of the system you’re actually testing. Your app may need to simulate message generation, queueing, retries, template rendering, or event handling. But that does not always require attempting to deliver real email.
A good email testing plan separates:
- application performance testing
- API integration testing
- message generation testing
- template/content testing
- address validation
- live inbox testing
- deliverability testing
Those are related, but they are not the same thing. Cousins, but not the identical kind.
If your application test creates real outbound mail every time it runs, you need guardrails. Rate-limit the test. Use test mode. Use a sandbox. Use internal suppression. Use fake data safely. Make sure one broken loop cannot become 300,000 password resets to Steve in Accounting.
Steve has suffered enough.
Email is not your alerting system
There’s another common way “just testing” turns into real deliverability trouble: alerting emails.
Maybe a developer sets up a quick function to email themselves every time an API call fails. Maybe a staging alert accidentally follows the code into production. Maybe every successful event sends a confirmation email to one address, because at some point someone needed to prove the thing worked and then everyone moved on with their lives.
Individually, these choices can look harmless. One alert email. One monitored inbox. One Gmail address. One alias that posts to Slack. One “temporary” setup everyone promises they’ll replace later.
Then production happens.
If your alerting logic says “send an email immediately for every event,” a noisy system can become a downright cacophonous sender. Suddenly one inbox is receiving thousands (or millions) of nearly identical alerts. The receiving mailbox provider starts deferring or blocking the traffic. Your sender reputation takes the hit. And worst of all, your alerting system becomes less useful at the exact moment you need it most.
Because an inbox full of alerts does not tell you what’s wrong.
It tells you that a lot of things happened.
Which is less observability, and more fog machine with timestamps.
When alerts are too frequent, people stop reading them. When every error gets its own email, the important failure looks exactly like the unimportant one. When a mailbox provider starts throttling your alerts, your internal view of the problem may get distorted too. You may think you sent 50,000 alerts because the issue happened 50,000 times, when what you actually built was an unpaced notification bazooka that turned one problem into two.
And if the truly critical alert is trying to arrive behind a mountain of repetitive noise? Good luck, tiny backpack-carrying email. Godspeed.
Email can be useful for occasional notifications, summaries, and human-readable updates. But it should not be your logging system, your primary incident monitor, or the first domino in a mission-critical alert chain.
If you must use email for alerts:
- Aggregate events. One email saying “15,000 errors occurred in the last minute” is more useful than 15,000 emails.
- Add cooldowns and rate limits. Do not let one broken loop generate unlimited mail.
- Escalate only what matters. Email should be reserved for important alerts, not every burp and hiccup in your system.
- Use proper monitoring tools where possible. Purpose-built alerting systems exist for a reason, and that reason is “please don’t make Gmail your incident response platform.”
- Send to controlled destinations. Avoid routing massive alert volume to personal Gmail accounts, shared aliases, or company domains that may have their own rate limits.
- Review old alerts. If nobody reads a recurring alert, kill it, aggregate it, or move it somewhere more appropriate.
- Separate logging from alerting. Logs belong in logging tools. Alerts should tell humans when action is needed.
A good alert should make a problem clearer. If your alerting setup creates deferrals, blocks, noise, confusion, and a reputation problem, congratulations! Your smoke alarm is now also on fire.
Testing does not get a reputation hall pass
Mailbox providers are not grading your intent. They are evaluating behavior.
From the receiving side, a broken test loop and an abusive sending pattern can look very similar:
- Why is this sender hammering one mailbox?
- Why are they sending huge volumes to nonexistent users?
- Why are they retrying so much bad traffic?
- Why are they generating password resets nobody asked for?
- Why are they sending to domains that look like typos?
- Why are they wasting receiver resources on this bizarre side quest?
- Why is every email labeled “Urgent Alert” and yet no one opens them?
Testing can still consume mailbox provider resources. It can still create negative engagement patterns. It can still hit traps. It can still produce bounces. It can still trigger throttling. It can still make your domain or IP look sloppy.
And sloppy is not a great brand identity. Unless your name is Joe.
Use the right test for the job
The safest testing strategy starts with one question:
What are you actually trying to prove?
Because “send a live email and see what happens” is not always the right answer. Sometimes you need to know whether your application can call the API. Sometimes you need to know whether a template renders. Sometimes you need to validate an address. Sometimes you need to confirm that a real inbox receives mail.
Those are different tests. They deserve different tools.
If you’re testing your Mailgun API integration, use Test Mode
Mailgun’s Test Mode lets you submit a message with o:testmode set to yes or true. Mailgun accepts the message but does not actually send it, which makes it useful for testing API calls and message generation without delivering real email.
That means you can confirm your application is successfully creating and submitting messages without turning Gmail into your staging environment.
Tiny note from the billing fairy: test mode messages may still be charged, so check your plan and usage expectations before turning your test suite into a spam cannon.
If you’re testing SMTP submission, use the drop-message header
For SMTP testing, Mailgun supports X-Mailgun-Drop-Message, which allows Mailgun to accept the message without delivering it. That is much cleaner than sending live test messages to a mailbox provider just to confirm your SMTP setup works.
Translation: you can test the pipe without spraying water all over the carpet. Your neighbors appreciate it.
If you’re doing small live tests, use a sandbox domain
Mailgun sandbox domains are designed for controlled testing. They require authorized recipients, which keeps your early tests limited to addresses you’ve explicitly approved. Sandbox domains are used by adding authorized recipients from the domain setup page.
Use this for “does the message arrive?” testing.
Do not use this as a loophole for load testing. The sandbox is for play, not the neighborhood cats’ litterbox.
If you’re testing rendering, links, or accessibility, use Inspect
If your real question is “does this email look right?” then sending thousands of live messages is wildly overqualified for the job.
Mailgun Inspect includes email previews, link validation, image checks, and accessibility testing so you can catch quality issues before sending.
In other words: don’t make mailbox providers intern as your unwitting QA department. They are already busy judging all of us.
If you’re testing whether an address is valid, use Validate
If your question is “does this address appear deliverable?” the answer is not “send it something and see what screams.”
Mailgun Validate supports address validation through the API, including single-address validation.
This matters because sending to bad addresses wastes resources, creates bounces, and can hurt your sender reputation. Mailgun’s Validate product page explicitly connects bad addresses with wasted effort, reputation harm, and blocklist risk.
A handy little testing map
| What you’re testing | Better approach |
| Can my app call Mailgun successfully? | Use Mailgun Test Mode |
| Can I submit through SMTP? | Use X-Mailgun-Drop-Message |
| Can I send a small real test? | Use a sandbox domain with authorized recipients |
| Does the template render correctly? | Use Mailgun Inspect |
| Are the links broken? | Use Mailgun Inspect |
| Is this email address valid? | Use Mailgun Validate |
| Can my app handle high request volume? | Load test your application without generating live outbound email |
| Do my event handlers work? | Use controlled test sends, tags, and non-delivering test paths where possible |
| Did an event happen? | Log it in a logging/monitoring tool |
| Did a lot of errors happen? | Send an aggregated alert or dashboard notification |
| Is something actually on fire? | Use a purpose-built alerting system with escalation rules |
| Do I need email alerts? | Add aggregation, cooldowns, rate limits, and clear severity thresholds |
When you really do need live test emails
Sometimes, yes, you need to send a real email to a real inbox.
That’s fine. Just be deliberate.
Use inboxes you control. Keep the volume small. Avoid sending the same message repeatedly to one recipient. Watch your logs. Stop when you see deferrals or throttling. Don’t test with random addresses, even if you think you made them up. Don’t use typo domains. Don’t use scraped addresses. Don’t use old customer records. Don’t assume “internal” means “safe” if your internal domain is hosted by a major mailbox provider.
A corporate Google Workspace inbox is still connected to Google infrastructure. A Microsoft 365 domain is still connected to Microsoft infrastructure. No, business domains and consumer mailbox domains are not identical in every backend detail. But from a sender reputation perspective, it is deeply unwise to assume the mailbox provider’s left hand will pretend it did not see what the right hand just deferred.
The inbox remembers.
Or at least it writes things down in a way that should make us behave like it remembers.
Better testing habits, fewer deliverability jump scares
Before you run your next test, ask:
Does this need to be delivered, or just accepted?
If it only needs to be accepted, use Test Mode or SMTP drop-message behavior.
Does this need a real recipient?
If yes, use a controlled address you own and keep volume low.
Am I testing address quality?
Use validation instead of sending.
Am I testing rendering or links?
Use pre-send inspection tools.
Could this loop run out of control?
Add rate limits, caps, and monitoring before you test.
Could these “fake” addresses belong to someone else or hit traps?
Assume yes. The internet is haunted.
Am I using email as a logging system?
Don’t. Log it somewhere appropriate; email a human only when action is needed.
Does my alerting workflow send one email per event in production?
Add aggregation, cooldowns, and rate limits before something breaks and finds out.
Could dev/staging alerts quietly follow me into production?
Assume yes. Make sure they can’t.
Final thought: test like the email is real
Testing is part of responsible sending. We want you to test. We beg you to test. Untested production email is how chaos gets a calendar invite.
But responsible testing means knowing which parts of your system need a real email and which parts just need a successful API response, rendered template, validated address, simulated event, or controlled sandbox send.
Because once a message enters the mailstream, it is no longer “just a test.”
It is an email.
And email leaves a trail (and carries a tiny backpack).