HubSpot Domain Crawler: Auto-Crawl URLs in Customer Agent

What This Update Actually Is

Customer Agent now has a domain crawler baked into its URL source configuration. When you add a public URL as a knowledge source, you'll see a toggle called "Import related URLs." Turn it on, and HubSpot's crawler discovers every page linked from that root URL, up to 5,000 pages total.

You don't have to feed it a sitemap or a spreadsheet of links. The crawler walks your site, finds the pages, and pulls them in. Every page it imports refreshes automatically on a weekly cadence.

Path-level controls let you get surgical. You can tell the crawler to import only pages under /support/ or skip everything under /blog/. That means you're feeding your agent exactly the knowledge it needs, not noise it doesn't.

Why HubSpot Shipped This

Before this update, adding website knowledge to Customer Agent meant importing URLs one at a time. For a company with a 300-page support site, that's a setup problem. For a company whose docs get updated every week, it's an ongoing maintenance nightmare.

The internal frustration is real. Humans responsible for the agent would update a help article, then forget to re-import the URL. The agent would keep answering with stale information. That gap erodes trust in the tool fast.

HubSpot's answer is to remove the human from the loop entirely. Set the crawler once, define your path rules, and the agent's knowledge stays current automatically. This is a foundational shift in how Customer Agent is meant to scale.

How to Use It Step by Step

Navigate to Customer Agent settings inside your HubSpot portal.
Open the Knowledge Sources section and choose to add a public URL source.
Enter your root domain URL (for example, your support subdomain or docs site).
Toggle on "Import related URLs" to activate the domain crawler.
Choose your scope: import all pages on the domain, or filter by path. Use include filters to pull in only specific sections. Use exclude filters to block sections like /blog/ or /events/ that aren't relevant to support.
Save your configuration. The crawler runs its first pass and then refreshes all crawled pages every week on its own.

One caveat worth naming: the 5,000-page cap is a hard limit per crawl. If your domain has more than 5,000 pages, use path filters to prioritize your most important sections. Don't try to crawl your entire marketing site if only /support/ is what the agent actually needs.

What It Touches in Your HubSpot Strategy

This update lives inside Customer Agent, but its ripple effects touch several layers of how your portal is set up.

Service Hub is the obvious home. If you're running a support operation, Customer Agent is your frontline deflection layer. Better knowledge means fewer tickets escalating to humans and faster resolution for the ones that do.

Key Takeaway

If your team has built a structured knowledge base or help center inside Content Hub, the domain crawler can ingest that entire site with a single configuration. That makes Content Hub the backbone of your agent's intelligence, not just a publishing tool.

Marketing Hub portals running informational sites or resource hubs can also benefit. If your agent handles pre-sales questions, crawling your product pages and FAQ sections keeps it aligned with your current messaging, even as campaigns change.

The weekly refresh cadence matters for revops teams thinking about data integrity. Stale agent knowledge is a data quality problem. This feature essentially automates a quality check that most teams were skipping entirely because it was too tedious to do manually.

Key Takeaway

Path-level controls are the most underrated part of this feature. They let you separate your public support content from internal-facing or marketing-only pages, so the agent answers from the right knowledge every time.

If you're thinking about how AI tools in HubSpot interact with your broader data layer, it's worth reading about HubSpot Agent CLI and how AI agents operate on your CRM data. The two features aren't directly connected, but they reflect the same design direction: let AI do the maintenance work so humans can focus on judgment calls.

Who Should Care Most

This feature is highest-value for a specific set of portal configurations. Here's who should move on it now:

Service Hub admins running Customer Agent deflection who have a support site or documentation hub with more than a handful of pages.
Content Hub teams who publish and update help content regularly and can't keep agent knowledge current through manual imports.
Marketing Hub operators using Customer Agent for pre-sales chat on product or pricing pages where content changes with campaigns.
Revops and portal admins at companies with large, frequently updated websites who need a hands-off maintenance model.
Smaller teams without dedicated content ops resources who've avoided Customer Agent because the setup felt too labor-intensive to sustain.

If you're not running Customer Agent yet but you have Service Hub Professional or Enterprise, this is a good moment to revisit the decision. The barrier to a well-informed agent just dropped significantly.

George's Take

I've seen this pattern play out in portal after portal: a team gets excited about Customer Agent, spends a day importing URLs, and then six months later the agent is answering questions based on content that was rewritten in March. The humans interacting with it notice the gaps before the admins do. That's a trust problem, and it's almost always a maintenance problem in disguise. This crawler removes that failure mode from the equation. It's not a flashy AI feature. It's the plumbing that makes the flashy AI feature actually work.

“The best AI features in HubSpot aren't the ones that impress you in a demo. They're the ones that quietly remove the maintenance debt that was making the whole system unreliable.”

— George B. Thomas

If your portal already has Customer Agent enabled, go configure this today. It's a single toggle with significant downstream impact. And if you're still mapping out how your HubSpot content strategy feeds your AI tools, check out how Marketing Hub functions as a full operating system for your team, not just a collection of features.

Want help configuring Customer Agent with the right knowledge sources for your specific portal setup? Let's talk. A Sidekick strategy session can get your agent trained on the right content in one focused conversation.

Frequently Asked Questions

How many pages can HubSpot's domain crawler import into Customer Agent?

The domain crawler imports up to 5,000 pages per domain source. If your site has more than 5,000 pages, use path-level include filters to prioritize the sections most relevant to your agent, such as your support or documentation subdirectory, rather than crawling the entire domain.

How often does the domain crawler refresh imported pages in Customer Agent?

HubSpot refreshes all pages imported through the domain crawler on a weekly cadence automatically. You don't need to trigger a manual re-import. Any updates you publish to your support site or documentation hub will be reflected in the agent's knowledge within seven days.

What are path filters in HubSpot Customer Agent's domain crawler?

Path filters let admins control which sections of a domain the crawler imports. You can include only specific paths, such as /support/ or /docs/, or exclude paths you don't want in the agent's knowledge base, such as /blog/ or /events/. This keeps the agent focused on relevant content.

Which HubSpot plans include the domain crawler for Customer Agent?

The domain crawler is available on Service Hub, Marketing Hub, Sales Hub, Content Hub, Data Hub, and Smart CRM, all at Professional and Enterprise tiers. It's also available to portals with HubSpot Credits. Free and Starter plans don't have access to this feature.

Can I use the domain crawler with an external website, or does it only work with HubSpot-hosted pages?

The domain crawler works with any publicly accessible URL, not just HubSpot-hosted pages. You can point it at an externally hosted documentation site, a third-party knowledge base, or any public-facing support hub. The only requirement is that the pages are publicly reachable without authentication.

What's the difference between the old URL import method and the new domain crawler in Customer Agent?

The old method required admins to add each URL individually and manually re-import pages when content changed. The domain crawler automatically discovers all linked pages from a root URL, imports up to 5,000 at once, and refreshes them weekly. It replaces a labor-intensive manual process with a single, self-maintaining configuration.

HubSpot Training

HubSpot Implementation

AI Services

Design

Content

Domain Crawler for Imported URLs in Customer Agent

What This Update Actually Is

Why HubSpot Shipped This

How to Use It Step by Step

What It Touches in Your HubSpot Strategy

Who Should Care Most

George's Take

Frequently Asked Questions

How many pages can HubSpot's domain crawler import into Customer Agent?

How often does the domain crawler refresh imported pages in Customer Agent?

What are path filters in HubSpot Customer Agent's domain crawler?

Which HubSpot plans include the domain crawler for Customer Agent?

Can I use the domain crawler with an external website, or does it only work with HubSpot-hosted pages?

What's the difference between the old URL import method and the new domain crawler in Customer Agent?

Comments

Leave a Comment

Related Resources

Breeze Ate the Show: What HubSpot's Twenty-Eight May 22 Updates Are Telling Every Admin

The B2B Customer Journey In 2026: Why Most Maps Fail And How To Build One That Actually Guides Buyers

Customer Agent Now Responds to Form Submissions Automatically

Need Help Making Sense of HubSpot?