When monitoring a website, Conductor Website Monitoring is able to find and index URLs based on many different types of relations such as incoming links or the sitemap reference. More about how Conductor Website Monitoring finds URLs here: URL finding.
Once Conductor Website Monitoring finds and indexes a URL, it keeps being monitored. Conductor Website Monitoring doesn’t automatically remove URLs from its index even if the content of the pages gets removed, if the page starts returning a 404 or the URL has no incoming relations anymore.
However, there are two ways in which you can remove URLs from Conductor Website Monitoring’s index:
- The URL Exclusion List
- The Purge orphan pages feature
This is a highlight box. The borders that appear in the editor and this explainer will not appear in the published article. Use the Title and Body below for examples, best practices, or other information you'd like to call out.
When you use the methods below to remove certain URLs from monitoring, all of the existing data Conductor Website Monitoring has collected related to these URLs will be deleted.
End of Highlight box
URL Exclusion List
The URL Exclusion List allows you to exclude certain parts of the website from Conductor Website Monitoring’s monitoring based on URL patterns and essentially works as a virtual robots.txt file for Conductor Website Monitoring.
The URL Exclusion List lets you disallow practically any files and pages apart from the robots.txt file, sitemaps which can be found in the default locations: /sitemap.xml and /sitemap_index.xml and the homepage.
Conductor Website Monitoring does not follow the directives in the actual robots.txt file to be able to access the whole website and report the indexability and relations data accurately.
However, you can easily import the directives from your robots.txt file to the URL Exclusion List in Conductor Website Monitoring.
Setting up the URL Exclusion List
You can import directives to your exclusion list and add your own exclusion rules.
Import directives from the robots.txt file to the URL Exclusion List
- Go to the website's Settings.
- At the left, click Set up URL Exclusion List.
-
Import the directives from the website’s robots.txt file (if there is one) to the URL Exclusion List.
- Click Import exclusions.If you don’t want to import any directives from the robots.txt file, click Skip.
- If you have imported existing robots.txt directives, they are shown here.
- The URL Exclusion List follows the robots.txt format and supports both Disallow and Allow directives. The order of the directives doesn’t matter in the URL Exclusion list.
Add rules to the URL Exclusion List
As the next step, you can also add your own URL patterns to exclude from monitoring. If you have imported existing robots.txt directives, they are shown here as well.
The URL Exclusion List follows the robots.txt format and supports both Disallow and Allow directives. The order of the directives doesn’t matter in the URL Exclusion list.
When adding rules to the URL Exclusion List, the Disallow directive is added to the patterns by default.
Here are a few common example use cases for custom exclusion rules that can be useful for you:
- An asterisk (*) matches any character.
- /admin/ excludes all URLs starting with /admin/
- *?filter= excludes all URLs containing ?filter=
If you want Conductor Website Monitoring to monitor a specific subdirectory of the website you can do so using the Allow directive.
The Allow directive is used to override the Disallow directive and allows Conductor Website Monitoring to monitor specific paths within excluded subdirectories:
/media/
Allow: /media/press
Allow: /media/blog
The example above excludes the /media/ subdirectory except for /media/press and /media/blog.
Once everything is set up as you wish, click Apply changes. The exclusions rules will take effect in the next few minutes:
- All monitored URLs matching the exclusion rules will be immediately removed from Conductor Website Monitoring’s index.
- Conductor Website Monitoring will not crawl the URLs matching the pattern at all, unless you remove the exclusion rule from the URL Exclusion List.
Purging orphan pages
The Purge orphan pages feature removes all URLs without any incoming relations from Conductor Website Monitoring's index.
This feature is useful for example when you have removed certain pages from your website and their URLs are not being linked within the website anymore.
If you don't want or need Conductor Website Monitoring to monitor such orphaned URLs anymore, you can use this feature to remove them from Conductor Website Monitoring's index.
What is an orphaned page
A URL is considered orphaned in Conductor Website Monitoring if:
- It doesn't have any incoming links.
- It doesn't have any incoming redirects.
- It doesn't have any incoming canonicals.
- It isn't referenced in the XML sitemap.
If a URL has any incoming relations pointing to it or is referenced in an XML sitemap it is not considered orphaned and the Purge orphan pages feature doesn’t apply to it.
Using the Purge orphaned pages feature
To purge orphaned pages:
- Go to the website’s Settings.
- On the left side, click Purge orphan pages. A pop-up window will appear.
- Here you need to select both of the boxes to confirm that you are aware that orphaned URLs will be purged and not monitored anymore by Conductor Website Monitoring.
- Click Purge orphans.