WAF and SEO: Allowlisting Crawlers on IDX Pages
Starting in version 3.16, the Flexmls IDX plugin sends a header on IDX pages so you can allowlist verified search engine bots in your firewall and preserve SEO during attacks
Preserving SEO During Attacks: Search Engine Bot Whitelisting
If your site uses a Web Application Firewall (WAF) or "Under Attack" mode (e.g. Cloudflare, SiteGround, or your host's security tools), aggressive bot traffic to your IDX listings can sometimes cause the firewall to block or challenge all visitors—including legitimate search engine crawlers. When crawlers are blocked, search engines can't index your listings, which can hurt your SEO.
This article explains how to allowlist verified search engine bots for your IDX pages so that indexing continues to work even when your firewall is protecting the site from bad traffic.
What the Flexmls IDX Plugin Does
The plugin helps your firewall tell "IDX page requests" from the rest of your site:
-
Custom header: When someone visits an IDX page (e.g. your listing search or a listing detail URL), the plugin sends a response header: `X-Flexmls-IDX: idx`. Your firewall can use this to apply different rules only to IDX traffic.
-
Typical IDX paths: IDX content usually lives under a path like `/idx/` or whatever you set as the "Permalink base" in Flexmls IDX → Settings → Behavior (e.g. `/listings/`).
You do not need to change any plugin settings for this. The header is sent automatically on IDX pages.
Why Allowlist Search Engine Bots?
-
Google, Bing, and other search engines send crawlers (e.g. Googlebot, Bingbot) to your site to index your listings. If the firewall blocks or heavily challenges these crawlers, your IDX pages may be dropped from search results or updated slowly.
-
Allowlisting means telling your firewall: "For requests that look like they're from a verified search engine bot, allow them (or challenge them less) when they're visiting IDX pages." That way you can keep strong protection for everyone else while preserving SEO.
The actual list of "who is a real crawler" is maintained by your firewall or host—the plugin only gives the firewall a way to know "this is an IDX page" so you can write rules like "allow verified bots on IDX."
What You Need to Do (Overview)
- Confirm your firewall/WAF supports custom rules (e.g. by request header or URL path).
- Add a rule that allows (or skips strict challenge for) verified search engine crawlers when they request pages that are part of your IDX (e.g. URL contains your IDX path, or the response/page is tagged with the plugin's header).
- Use your provider's official documentation to allowlist only verified crawlers (by user-agent and/or IP), not just any claim of "Googlebot."
Example: Cloudflare
If you use Cloudflare:
- Go to Security → WAF → Custom rules (or Firewall Rules in the older dashboard).
- Create a rule that allows or skips challenge for requests that:
- Target your IDX path (e.g. URI Path contains `/IDX/` or your custom permalink base), and
- Are from a verified search engine crawler (e.g. Cloudflare's "Known Bots" or a rule that matches Google's and Bing's verified crawler IPs/user-agents).
Cloudflare provides Verified Bots (e.g. Googlebot, Bingbot) that you can reference in rules. Example idea (adjust to your Cloudflare UI):
- If: `(http.request.uri.path contains "/IDX/") and (cf.client.bot)`
and the request is classified as a verified good bot (e.g. Googlebot, Bingbot),
Then: Allow / Skip.
Exact field names and options depend on your Cloudflare plan and dashboard. See [Cloudflare's bot management and WAF docs](https://developers.cloudflare.com/waf/).
Example: Other WAFs or Hosts
-
SiteGround, other hosts: Many hosts offer "allowlist" or "whitelist" options by IP or user-agent. Ask support: "How do I allow only verified Googlebot/Bingbot to access a specific path (e.g. /IDX/) so they can index my listings during high traffic or Under Attack mode?"
-
nginx / Apache: Your host or server admin can add rules that allow requests when:
-
The request path matches your IDX path, and
-
The user-agent (and optionally IP) matches the provider's verified crawler list (see Google and Bing's official docs for crawler IP ranges and user-agents).
-
Official Crawler Information (for your firewall rules)
Use the official sources when building allowlists so you don't allow impersonators:
-
Google: [Google crawler (user agent) overview](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) and [Verify Googlebot](https://developers.google.com/search/docs/crawling-indexing/verifying-googlebot).
-
Bing: [Bingbot](https://www.bing.com/webmasters/help/bingbot-2955537a) and how to verify it.
Many WAFs (e.g. Cloudflare) already incorporate verified bot lists so you can reference "Known Bots" or "Verified Bots" instead of maintaining IP lists yourself.
If you're not sure how to add these rules, contact your hosting or security provider and share this page; they can help you configure allowlisting for verified bots on your IDX path.