A Robots.txt file is a text file that tells crawlers (or web robots) which URL’s they can access on your site. This file is mainly used to manage requests from search engines, but it can be altered to remove access to your site from other crawlers as well.
This means that in some cases, a robots.txt file can prevent our technology from properly checking your site. If you tried clearing your cache and checking your firewall rules but your dashboard still says the ad script, ads.txt or privacy policy is still missing, it could be a configuration issue with your robots.txt file.
Since we don’t have a comprehensive list of user agents for all monitoring services and advertisers, it is best practice to prevent your robots.txt file from being too restrictive.
Why does a Site need a Robots.txt file?
A robots.txt file can serve numerous purposes such as preventing AI bots from accessing your site and preventing certain user agents from overloading your site.
In its basic form, the crawl instructions in a robots.txt file include allow or disallow commands for certain user agents:
User-agent: [user-agent name]Disallow: [URL string not to be crawled]
For example, the simple robots.txt file below blocks the entire /wp-admin folder (and all of its subfolders) and the /wp-includes folder from being crawled by every user agent:
Note: The above robots.txt file is just an example and your site’s robots.txt may not include the same commands as the robots.txt file on your site.
A good crawler will try to access the robots.txt file and follow the instructions listed in the file. The file is usually accessed at the root of your site in this format: www.rootdomain.com/robots.txt.
If a robots.txt file is written too aggressively with disallow rules, it can accidentally prevent valid crawlers (like Journey’s crawlers) from accessing your site. Overly aggressive robots.txt files can also block advertisers from checking your site and ads.txt file.
Do I Need To Edit My Robots.txt file?
Note: If you are unfamiliar with how your site files work, and you don’t normally make changes to them, we don’t suggest editing them without the help of your host or a developer.
If you are using WordPress, your robots.txt file can be edited in your site’s root-level domain files in the following ways:
- FTP
- cPanel
- With a plugin
If you are using another CMS, you’ll need to reach out to your host for more information on how to access and edit the robots.txt file.