Documentation
Everything you need to know about using Crawlinker
Introduction
Crawlinker is a powerful website crawler that helps you find and fix broken links, analyze redirects, and identify SEO issues across your entire website. Whether you're managing a small blog or a large enterprise site, Crawlinker provides the insights you need to maintain a healthy web presence.
What Crawlinker Does
- Broken Link Detection: Identifies all 404, 500, and other HTTP error responses
- Redirect Analysis: Tracks redirect chains and identifies unnecessary redirects
- SEO Auditing: Analyzes meta tags, titles, H1 tags, and page load times
- Comprehensive Reports: Provides detailed, exportable reports for your entire site
Quick Start
Get started with Crawlinker in less than 30 seconds:
1. Enter Your URL
Navigate to the homepage and enter your website URL in the scan form:
2. Configure Options (Optional)
Click "Advanced Options" to specify allowed or excluded paths:
- Allowed Paths: Only scan these specific paths (e.g.,
/blog/) - Excluded Paths: Skip these paths during scanning (e.g.,
/admin/)
3. Start Scanning
Click the "Start Scan" button. Crawlinker will begin analyzing your website immediately.
4. View Results
Once the scan completes, you'll be taken to the dashboard where you can view all findings organized by category.
Features Overview
Broken Link Detection
Crawlinker finds every broken link on your site, including:
- 404 Not Found errors
- 500 Server errors
- 403 Forbidden errors
- Timeout errors
- DNS resolution failures
For each broken link, you'll see:
- Source page (where the link appears)
- Target URL (the broken link)
- HTTP status code
- Link type (internal or external)
Redirect Analysis
Track and optimize your redirects:
- 301 (Permanent) redirects
- 302 (Temporary) redirects
- Redirect chains (multiple redirects in sequence)
- Redirect loops
SEO Insights
Identify common SEO issues:
- Missing or duplicate title tags
- Missing or duplicate meta descriptions
- Missing H1 tags
- Images without alt text
- Slow page load times
Running a Scan
Basic Scan
The simplest way to scan your website:
- Enter your website URL (e.g.,
https://example.com) - Click "Start Scan"
- Wait for the scan to complete
What Gets Scanned
By default, Crawlinker will:
- Start from your homepage
- Follow all internal links
- Check all external links
- Analyze all HTML pages
- Check images, stylesheets, and scripts
Advanced Options
Allowed Paths
Limit scanning to specific sections of your website. Enter one path per line:
When allowed paths are specified, Crawlinker will only scan URLs that start with these paths.
Excluded Paths
Skip specific sections during scanning. Enter one path per line:
Common use cases:
- Skip admin areas
- Exclude authentication pages
- Ignore tag/category pages (for blogs)
- Skip search results pages
/tag/ and /category/ on WordPress sites to avoid scanning duplicate content.
Reading Results
Dashboard Overview
After scanning, you'll see four main statistics:
- Broken Links: Total number of broken links found
- Redirects: Total redirect chains detected
- Pages Crawled: Total number of pages analyzed
- SEO Issues: Total SEO problems identified
Sorting and Filtering
Use the table controls to find specific issues:
- Search: Filter by URL or keyword
- Sort: Click column headers to sort by URL, status code, etc.
- Pagination: Navigate through large result sets
Exporting Data
Download your results in CSV format for further analysis or to share with your team.
Broken Links Report
Understanding Status Codes
| Status Code | Meaning | Action |
|---|---|---|
| 404 | Not Found | Remove the link or update to correct URL |
| 500 | Server Error | Contact site administrator or remove link |
| 403 | Forbidden | Check permissions or remove link |
| Timeout | Request Timeout | Check if server is slow or down |
Fixing Broken Links
- Identify the source page (where the link appears)
- Determine if the target page has moved or been deleted
- Update the link to the correct URL, or remove it
- Re-scan to verify the fix
Redirects Report
Types of Redirects
- 301 (Permanent): Content has permanently moved to a new location
- 302 (Temporary): Content temporarily at a different location
Redirect Chains
A redirect chain occurs when a URL redirects multiple times before reaching the final destination:
Redirect chains slow down page load times and can hurt SEO. Update links to point directly to the final destination.
SEO Issues Report
Common SEO Problems
Crawlinker identifies these SEO issues:
Missing Title Tags
Every page should have a unique title tag (55-60 characters).
Missing Meta Descriptions
Meta descriptions should be 150-160 characters and unique per page.
Missing H1 Tags
Each page should have exactly one H1 tag that describes the page content.
Images Without Alt Text
All images should have descriptive alt text for accessibility and SEO.
Slow Load Times
Pages taking longer than 3 seconds to load may hurt rankings and user experience.
Rate Limits
To ensure fair usage, the following rate limits apply:
- Scans: 10 scans per hour per IP address
- API Requests: 100 requests per hour per IP address
Frequently Asked Questions
How long does a scan take?
Scan time depends on your website size. Small sites (under 100 pages) typically complete in 1-2 minutes. Larger sites may take longer.
Is Crawlinker really free?
Yes! Crawlinker is completely free to use with no signup required.
Can I scan password-protected sites?
Currently, Crawlinker can only scan publicly accessible pages.
How often should I scan my site?
We recommend scanning monthly, or after major content updates.
Does Crawlinker respect robots.txt?
Yes, Crawlinker respects your robots.txt directives.
Troubleshooting
Scan Not Starting
- Verify your URL includes the protocol (http:// or https://)
- Check that your website is publicly accessible
- Ensure you haven't hit the rate limit
Missing Pages in Results
- Check your excluded paths settings
- Verify pages are linked from your homepage
- Check your robots.txt file
False Positives
If Crawlinker reports errors that don't exist:
- Verify the issue manually in a browser
- Check if your server has rate limiting enabled
- Ensure JavaScript-rendered content is server-side rendered
Support
Need help? We're here to assist:
- Email: support@crawlinker.com
- GitHub: Report issues at github.com/crawlinker/issues
- Response Time: We typically respond within 24 hours