Documentation

Everything you need to know about using Crawlinker

Introduction

Crawlinker is a powerful website crawler that helps you find and fix broken links, analyze redirects, and identify SEO issues across your entire website. Whether you're managing a small blog or a large enterprise site, Crawlinker provides the insights you need to maintain a healthy web presence.

What Crawlinker Does

  • Broken Link Detection: Identifies all 404, 500, and other HTTP error responses
  • Redirect Analysis: Tracks redirect chains and identifies unnecessary redirects
  • SEO Auditing: Analyzes meta tags, titles, H1 tags, and page load times
  • Comprehensive Reports: Provides detailed, exportable reports for your entire site
No Signup Required
Crawlinker is completely free to use. Simply enter your website URL and start scanning. No account or credit card required.

Quick Start

Get started with Crawlinker in less than 30 seconds:

1. Enter Your URL

Navigate to the homepage and enter your website URL in the scan form:

https://example.com

2. Configure Options (Optional)

Click "Advanced Options" to specify allowed or excluded paths:

  • Allowed Paths: Only scan these specific paths (e.g., /blog/)
  • Excluded Paths: Skip these paths during scanning (e.g., /admin/)

3. Start Scanning

Click the "Start Scan" button. Crawlinker will begin analyzing your website immediately.

4. View Results

Once the scan completes, you'll be taken to the dashboard where you can view all findings organized by category.

Features Overview

Broken Link Detection

Crawlinker finds every broken link on your site, including:

  • 404 Not Found errors
  • 500 Server errors
  • 403 Forbidden errors
  • Timeout errors
  • DNS resolution failures

For each broken link, you'll see:

  • Source page (where the link appears)
  • Target URL (the broken link)
  • HTTP status code
  • Link type (internal or external)

Redirect Analysis

Track and optimize your redirects:

  • 301 (Permanent) redirects
  • 302 (Temporary) redirects
  • Redirect chains (multiple redirects in sequence)
  • Redirect loops

SEO Insights

Identify common SEO issues:

  • Missing or duplicate title tags
  • Missing or duplicate meta descriptions
  • Missing H1 tags
  • Images without alt text
  • Slow page load times

Running a Scan

Basic Scan

The simplest way to scan your website:

  1. Enter your website URL (e.g., https://example.com)
  2. Click "Start Scan"
  3. Wait for the scan to complete
Scan Time
Scan time depends on your website size. Small sites (under 100 pages) typically complete in 1-2 minutes. Larger sites may take 5-10 minutes or longer.

What Gets Scanned

By default, Crawlinker will:

  • Start from your homepage
  • Follow all internal links
  • Check all external links
  • Analyze all HTML pages
  • Check images, stylesheets, and scripts

Advanced Options

Allowed Paths

Limit scanning to specific sections of your website. Enter one path per line:

/blog/ /docs/ /products/

When allowed paths are specified, Crawlinker will only scan URLs that start with these paths.

Excluded Paths

Skip specific sections during scanning. Enter one path per line:

/admin/ /login/ /private/ /category/ /tag/

Common use cases:

  • Skip admin areas
  • Exclude authentication pages
  • Ignore tag/category pages (for blogs)
  • Skip search results pages
Pro Tip
Use excluded paths to speed up scans and focus on the content that matters. For example, exclude /tag/ and /category/ on WordPress sites to avoid scanning duplicate content.

Reading Results

Dashboard Overview

After scanning, you'll see four main statistics:

  • Broken Links: Total number of broken links found
  • Redirects: Total redirect chains detected
  • Pages Crawled: Total number of pages analyzed
  • SEO Issues: Total SEO problems identified

Sorting and Filtering

Use the table controls to find specific issues:

  • Search: Filter by URL or keyword
  • Sort: Click column headers to sort by URL, status code, etc.
  • Pagination: Navigate through large result sets

Exporting Data

Download your results in CSV format for further analysis or to share with your team.

Redirects Report

Types of Redirects

  • 301 (Permanent): Content has permanently moved to a new location
  • 302 (Temporary): Content temporarily at a different location

Redirect Chains

A redirect chain occurs when a URL redirects multiple times before reaching the final destination:

example.com → www.example.com → example.com/home → example.com/home/

Redirect chains slow down page load times and can hurt SEO. Update links to point directly to the final destination.

Performance Impact
Each redirect adds 200-500ms to page load time. Chains of 3+ redirects significantly impact user experience and SEO rankings.

SEO Issues Report

Common SEO Problems

Crawlinker identifies these SEO issues:

Missing Title Tags

Every page should have a unique title tag (55-60 characters).

Missing Meta Descriptions

Meta descriptions should be 150-160 characters and unique per page.

Missing H1 Tags

Each page should have exactly one H1 tag that describes the page content.

Images Without Alt Text

All images should have descriptive alt text for accessibility and SEO.

Slow Load Times

Pages taking longer than 3 seconds to load may hurt rankings and user experience.

Rate Limits

To ensure fair usage, the following rate limits apply:

  • Scans: 10 scans per hour per IP address
  • API Requests: 100 requests per hour per IP address
Need Higher Limits?
Contact us if you need higher rate limits for your use case.

Frequently Asked Questions

How long does a scan take?

Scan time depends on your website size. Small sites (under 100 pages) typically complete in 1-2 minutes. Larger sites may take longer.

Is Crawlinker really free?

Yes! Crawlinker is completely free to use with no signup required.

Can I scan password-protected sites?

Currently, Crawlinker can only scan publicly accessible pages.

How often should I scan my site?

We recommend scanning monthly, or after major content updates.

Does Crawlinker respect robots.txt?

Yes, Crawlinker respects your robots.txt directives.

Troubleshooting

Scan Not Starting

  • Verify your URL includes the protocol (http:// or https://)
  • Check that your website is publicly accessible
  • Ensure you haven't hit the rate limit

Missing Pages in Results

  • Check your excluded paths settings
  • Verify pages are linked from your homepage
  • Check your robots.txt file

False Positives

If Crawlinker reports errors that don't exist:

  • Verify the issue manually in a browser
  • Check if your server has rate limiting enabled
  • Ensure JavaScript-rendered content is server-side rendered

Support

Need help? We're here to assist:

  • Email: support@crawlinker.com
  • GitHub: Report issues at github.com/crawlinker/issues
  • Response Time: We typically respond within 24 hours
Found a Bug?
We appreciate bug reports! Please include your URL, steps to reproduce, and expected vs actual behavior.