Disaster recovery: what to do when a ban wave hits 30% of stack
Disaster recovery: what to do when a ban wave hits 30% of stack
you wake up to a flood of notification emails. account suspended. account suspended. account suspended. before your morning coffee is brewed, you’ve lost 30% of your operating stack overnight. no warning, no grace period, no clear explanation from the platform. this is not a hypothetical scenario. it happened to me in q1 2024, and the way you respond in the first six hours determines whether you recover at all.
ban waves are not random. platforms roll out large-scale enforcement actions in coordinated batches, usually triggered by a new detection model being pushed to production, a manual review initiative, or an upstream signal like a fraud report from a payment processor. when one wave catches 30% of your accounts, it almost always means the remaining 70% are either on a watchlist or about to be. how you triage, quarantine, and rebuild in the hours and days that follow is the difference between a temporary setback and a full wipeout.
this article is for operators who already understand account warming, proxy hygiene, and antidetect browser basics. i’m not going to explain what a residential proxy is. what i am going to do is walk through the tactical playbook i use after a major enforcement event, including the mistakes i made the first time and what i do differently now.
background and prior art
ban waves as a platform enforcement mechanism have been documented across amazon seller central, facebook business manager, google ads, and marketplace platforms like etsy and ebay going back at least to 2016. the mechanics have been studied in fraud detection literature. the core idea is that platforms batch their enforcement actions deliberately. real-time bans create a feedback loop where operators can observe which specific behavior triggered the action and immediately adjust. batched enforcement denies that signal. you can’t tell which account behavior caused the ban because the ban comes weeks after the behavior.
the academic framing for this is “strategic obfuscation in adversarial classification systems.” platforms don’t just want to ban bad actors, they want to degrade the ability of bad actors to learn from enforcement. google published work internally on adversarial ML for ad fraud that informed much of the industry’s thinking, and security researcher Clarence Chio documented similar patterns in his 2018 O’Reilly book on machine learning and security. the EFF’s browser fingerprinting research at coveryourtracks.eff.org is essential reading if you want to understand what detection surface you’re actually presenting to platforms.
the practical implication: when a ban wave hits, your forensics window is narrow. every hour you spend in reactive mode rather than analytical mode increases the probability that the remaining accounts get swept up in the next wave.
the core mechanism
understanding what a ban wave actually is changes how you respond to one. platforms don’t maintain a single “ban queue.” they maintain risk scores, behavioral clusters, and relational graphs. when a new detection model deploys, it scores every account on the platform against the new criteria. accounts above a threshold get flagged. those flags get queued for automated action or manual review. the queue processes over hours or days, which is why ban waves feel like they “roll out” rather than happening all at once.
the relational graph piece is what kills most operators. your accounts don’t exist in isolation. they share infrastructure: ip ranges, device fingerprints, payment methods, email domains, phone number prefixes, behavioral timing patterns, shipping addresses, referral chains. when the platform’s graph model identifies account A as bad, it traverses the graph and scores every account connected to A. a 30% hit rate often means the platform found one or two high-confidence bad actors in your stack and let the graph walk the rest.
this means your immediate priority after a ban wave is graph isolation, not account recovery. trying to recover banned accounts before you’ve isolated surviving accounts is like trying to bail out a boat while you’re still running it into the rocks.
the practical steps, in order:
1. stop all active operations immediately. every login, every action, every transaction on surviving accounts adds behavioral data to the platform’s model. if the detection model is actively scoring accounts, fresh activity gives it more signal. go dark for at least 12-24 hours.
2. audit your infrastructure map. build a full dependency graph of what your surviving accounts share with your banned accounts. this includes:
banned_accounts -> shared infrastructure
proxy pool overlap: which surviving accounts used any IPs from the same /24 blocks?
browser profile overlap: shared canvas fingerprints, webgl hashes, font lists
payment method links: same card BINs, same billing address prefixes
email infrastructure: same domain, same MX records, same sending IP history
device fingerprint: matching hardware signatures across profiles
behavioral timing: accounts created the same day, warmed on the same schedule
3. quarantine anything with graph overlap. accounts that share infrastructure with banned accounts are compromised from a signal perspective even if they haven’t been banned yet. quarantine means: no logins, no actions, treated as write-off pending further analysis.
4. identify your clean cohort. the accounts with no graph overlap to the banned set are your recovery nucleus. these are the only accounts you operate during the recovery window.
5. begin forensics on the banned accounts. what did they have in common that the clean cohort didn’t? this is the only way to understand what the detection model actually caught.
worked examples
example 1: amazon seller stack, march 2024
i was running 22 amazon seller accounts across three categories. one ban wave took out 8 accounts in 36 hours. the initial read was “lost 36% of stack.” the actual damage was worse because 6 of the 14 surviving accounts shared proxy infrastructure with at least one banned account.
forensics revealed the common thread: all 8 banned accounts had used the same residential proxy provider’s sg exit nodes during a 3-day window in february when that provider apparently had their ip range flagged by amazon’s payment processor fraud team. the 6 contaminated survivors had used different exit nodes from the same provider but the /16 subnet was now on amazon’s watchlist.
recovery path: quarantined the 6 compromised accounts for 60 days with zero logins. rebuilt on a completely separate proxy provider (switched from provider A at ~$3/gb to provider B at ~$4.50/gb, accepting the cost increase for cleaner ip history). the 8 clean accounts continued operating. net loss at 90 days: 3 accounts that didn’t survive the quarantine period when we tested logins. 5 accounts successfully reactivated.
cost of the incident: approximately $4,200 in lost gmv during the 90-day recovery window, plus $800 in new account infrastructure costs.
example 2: facebook business manager stack, q3 2023
this was a referral farming operation across 34 accounts. ban wave took out 11 in one night. the graph analysis here was harder because facebook’s detection surface includes behavioral signals that are difficult to enumerate, not just infrastructure.
what forensics eventually found: the 11 banned accounts had all interacted with the same external landing page domain within a 2-week window. the domain had been flagged by facebook’s link shim (they document this lightly in their platform policies), and any account that had loaded the domain, even in a different context, got scored negatively. this is a good example of “external signal contamination” - the ban wasn’t caused by the accounts themselves but by a third-party asset they touched.
the surviving 23 accounts had either not visited the domain or had visited it earlier when it wasn’t flagged. graph overlap wasn’t the issue here. the issue was shared behavioral history with a toxic external asset.
recovery: the banned accounts were write-offs. we appealed 4 of the 11 through facebook’s standard appeals process. zero were recovered. the surviving accounts got scrubbed of any links or saved creatives pointing to the flagged domain, then went dormant for 14 days before resuming operations.
example 3: e-commerce marketplace, ongoing monitoring
this example is more recent and still partially live so i’ll keep it generic. stack of 40+ accounts across two platforms. rolling ban waves over 6 months had been taking 2-4 accounts per month. individually each loss looked random. when i finally built a proper relational graph and looked at the full 6-month picture, a pattern emerged: every banned account had been created within 48 hours of a specific batch of phone verification numbers from one sms verification service.
the phone number provider had apparently sold the same number blocks to multiple buyers. some of those buyers were operating low-quality spam operations that got flagged fast, and the phone numbers became toxic signals. any account verified with those numbers was eventually scored negative, regardless of what the account itself did.
fix: audited all accounts against the phone number batches, quarantined anything from the toxic batches, rebuilt verification infrastructure using a different provider and started buying numbers in smaller batches across multiple providers to reduce single-point-of-failure risk.
edge cases and failure modes
failure mode 1: over-indexing on the wrong variable
the first time you run forensics after a ban wave, you will find correlations that are coincidental. 8 banned accounts all used the same browser version? that’s probably not it, that version was the current stable release used by half your stack. be skeptical of signals that would implicate your entire stack, not just the banned cohort. good forensics looks for things that are different about the banned accounts relative to the survivors, not just things they all have in common.
failure mode 2: premature account recovery attempts
the number one mistake i see operators make is trying to appeal or recover banned accounts immediately. appeals generate additional scrutiny. they confirm to the platform that the account is actively managed. they often trigger deeper graph traversal that sweeps in connected accounts. unless you have a specific, strong reason to believe the ban was a false positive (and it almost never is), treat banned accounts as write-offs for at least 30 days and focus entirely on protecting surviving accounts.
failure mode 3: rebuilding on the same infrastructure
the infrastructure that got you banned is compromised. this seems obvious but operators routinely spin up replacement accounts on the same proxy pool, the same browser fingerprint templates, the same email domains, because those are the assets they already have and they’re trying to move fast. fast rebuilds on compromised infrastructure just accelerates the next ban wave. the cost of new clean infrastructure is always lower than the cost of another 30% loss event.
failure mode 4: not documenting the incident
you will not remember the specifics of this ban wave in 6 months. write a post-mortem while the details are fresh. what got banned, when, what they had in common, what survived, what you changed. this documentation becomes the playbook for the next incident and the input for your ongoing infrastructure risk assessment. operators who don’t document incidents are condemned to repeat them.
failure mode 5: treating the 70% as safe
after a 30% ban wave, there’s a strong psychological pull toward assuming the 70% that survived are clean. they’re not necessarily clean. they survived this wave. that means either they’re genuinely below the detection threshold, or they’re in the queue for the next wave. run the graph analysis on the survivors against the banned cohort before you conclude they’re safe. accounts that share significant infrastructure with the banned set but weren’t caught yet are not “safe.” they’re “not yet caught.”
what we learned in production
the biggest operational lesson from running a stack through multiple ban waves is that recovery speed is not the primary metric you should optimize for. survival rate of the unaffected stack is the primary metric. i used to measure recovery success by how quickly i could get back to the same number of active accounts. the correct measure is: what percentage of the accounts that were operating before the ban wave are still operating 90 days later, including the replacements.
by that metric, the difference between operators who recover well and operators who spiral into successive ban waves is almost entirely a function of whether they went dark and did the forensics before rebuilding. the operators who stay in business long-term have pre-built playbooks: a documented process for the first 6 hours, clear criteria for quarantine decisions, pre-established relationships with alternative infrastructure providers so they’re not scrambling to set up new proxy or fingerprint infrastructure under pressure. i also use a simple health dashboard that tracks each account’s risk score based on infrastructure overlap. you can build a basic version of this in a spreadsheet. the proxyscraping.org blog has useful material on monitoring proxy health metrics that feed into this kind of dashboard.
the antidetect browser ecosystem has also matured meaningfully in how it handles compartmentalization. if you’re not already using profile-level infrastructure isolation, meaning each profile has its own dedicated proxy endpoint and fingerprint profile with no shared components, you’re building in single points of failure. the antidetectreview.org blog covers the current state of browser isolation tools in reasonable depth if you need to audit your current setup against best practices.
one thing that is consistently underestimated is the recovery window for phone verification assets. email domains can be replaced in hours. proxy pools can be switched in a day. phone numbers, especially in jurisdictions where platforms require real carrier verification, have a warmup requirement that can run 30-90 days before an account verified on them will pass elevated scrutiny checks. if your ban wave hits during a period when you need to rebuild quickly, and you don’t have a reserve stock of pre-aged verified accounts or pre-verified numbers, you will be slower than you want to be. the airdrop and farming operator community has documented some of this at airdropfarming.org/blog in the context of wallet and account aging, which maps closely to the verification asset aging problem.
the other consistent lesson: the platforms are getting better faster than most operators adapt. the detection models that caught your stack in 2023 are not the same ones running in 2026. what worked as camouflage 18 months ago may be a positive signal for the current model. staying current on platform policy changes, processor network-level fraud signals, and device fingerprinting research is not optional overhead. it’s core to the operation.
references and further reading
-
EFF Cover Your Tracks - Browser Fingerprinting Research - the electronic frontier foundation’s live tool for measuring browser fingerprint uniqueness, plus their underlying research documentation. essential for understanding what detection surface your profiles present.
-
Facebook Ad Standards and Transparency Report - facebook’s documented policies on account and content enforcement. sparse on operational detail but useful for understanding the stated policy basis for enforcement actions.
-
Amazon Seller Central - Account Health Overview - amazon’s official documentation on account health metrics and enforcement criteria. reading this alongside observed enforcement patterns gives you the delta between stated and actual enforcement behavior.
-
USENIX Security - Research on Adversarial Machine Learning in Fraud Detection - the USENIX security conference proceedings contain several years of peer-reviewed research on adversarial classification in platform fraud systems. the 2021-2023 proceedings are particularly relevant to current platform detection architectures.
-
Mozilla Developer Network - Browser Fingerprinting Technical Reference - MDN’s technical documentation on canvas API, webgl, and other browser APIs used in fingerprinting. understanding what data these APIs expose is prerequisite knowledge for evaluating antidetect browser effectiveness.
Written by Xavier Fok
disclosure: this article may contain affiliate links. if you buy through them we may earn a commission at no extra cost to you. verdicts are independent of payouts. last reviewed by Xavier Fok on 2026-05-19.