OpenClaw CLI Web Scraping Cheat Sheet | Data Automation Commands

I. Core Commands & Setup


                            openclaw init --project my_project

Initializes a new OpenClaw project directory, creating necessary configuration files and a default scraping template.

Always start with 'init' to ensure proper project structure and dependency management for your scraping tasks.


                            openclaw help

Displays detailed help information for a specific OpenClaw command, including available options and arguments.

Use 'openclaw help' without a command for a general overview of all available commands.


                            openclaw config set proxy.url http://myproxy:8080

Sets a global or project-specific configuration parameter, such as proxy settings, timeout values, or default output formats.

Configuration changes can be scoped to the current project or applied globally using appropriate flags.


                            openclaw update

Checks for and installs the latest version of the OpenClaw CLI tool and its core dependencies.

Regularly updating OpenClaw ensures access to new features, performance improvements, and security patches.

II. Basic Scraping & Extraction


                            openclaw fetch https://example.com/products --output products.html

Fetches the content of a specified URL and saves it to a local file. Supports various output formats.

Combine with `--wait` to simulate user browsing for dynamic content loading before saving.


                            openclaw extract --url https://example.com --selector 'h1.title' --output title.txt

Extracts data from a given URL using CSS selectors or XPath expressions and outputs the result.

Use multiple `--selector` flags to extract different data points in a single command, often combined with JSON output.


                            openclaw browse --url https://example.com/login --interactive

Opens an interactive browser session for debugging selectors, performing manual actions, or handling complex CAPTCHAs.

This mode is invaluable for visually inspecting the DOM and testing selectors in real-time.


                            openclaw scrape-list --url https://example.com/articles --item-selector '.article-card' --field 'title:.article-title' --field 'link:a@href'

Scrapes a list of items from a page, defining a common item selector and then specific fields within each item.

This command streamlines the extraction of structured data from article lists, product pages, or search results.

III. Advanced Data Processing with AI


                            openclaw process-ai --input raw_text.txt --model 'summarizer' --output summary.txt

Applies an AI model to process input text, such as summarization, sentiment analysis, or entity recognition.

Explore available AI models with `openclaw models list` to find the best fit for your data processing needs.


                            openclaw extract-ai --url https://invoice.com/doc123 --template 'invoice_parser' --output invoice_data.json

Utilizes AI to intelligently extract structured data from semi-structured or unstructured documents (e.g., invoices, reports) using predefined templates.

Custom AI templates can be trained for highly specific document types to achieve superior extraction accuracy.


                            openclaw clean-data --input messy_data.csv --rules 'trim,deduplicate' --output clean_data.csv

Applies AI-powered data cleaning rules to an input dataset, handling tasks like formatting, deduplication, and error correction.

Define custom cleaning rules in a configuration file for complex transformations and consistency across multiple datasets.


                            openclaw classify-text --input reviews.txt --model 'sentiment_analyzer' --output classified_reviews.json

Classifies text data based on a specified AI model, useful for categorizing content, identifying spam, or gauging sentiment.

Integrating this into a scraping pipeline allows for immediate categorization of extracted text content.

IV. Authentication & Session Management


                            openclaw auth login --url https://myportal.com/login --user ENV_USER --pass ENV_PASS

Performs an automated login to a website using provided credentials, storing session cookies for subsequent requests.

Always use environment variables for sensitive information like usernames and passwords, never hardcode them.


                            openclaw session save my_session.json

Saves the current scraping session's cookies and local storage state to a file for later reuse.

Saving sessions is crucial for resuming interrupted scrapes or accessing authenticated content without re-logging in repeatedly.


                            openclaw session load my_session.json --url https://myportal.com/dashboard

Loads a previously saved session, restoring cookies and state, and navigates to a specified URL.

Combine with `fetch` or `extract` commands to seamlessly continue authenticated scraping tasks.


                            openclaw auth logout --url https://myportal.com/logout

Performs an automated logout from a website, clearing session cookies and terminating the active session.

It is good practice to explicitly log out when a scraping task is complete, especially for sensitive accounts.

V. Output & Reporting


                            openclaw export --input data.json --format csv --output final_data.csv

Converts extracted data from one format (e.g., JSON) to another (e.g., CSV, Excel, XML).

Use `--header` and `--delimiter` flags for fine-grained control over CSV output.


                            openclaw report generate --template summary_report.md --data results.json

Generates a custom report based on extracted data and a predefined template, supporting various output formats.

Templates can be written in Markdown, HTML, or other templating languages for dynamic report generation.


                            openclaw pipeline run my_pipeline.yml --output combined_results.json

Executes a predefined data processing pipeline, combining multiple scraping, extraction, and transformation steps.

Pipelines are ideal for complex workflows, ensuring data consistency and automated processing.


                            openclaw visualize --data metrics.json --type chart --output dashboard.html

Creates interactive data visualizations (charts, graphs) from extracted metrics and saves them as HTML.

This command helps in quickly understanding and presenting the insights derived from scraped data.

VI. Error Handling & Debugging


                            openclaw fetch --url https://broken.example.com --retries 5 --delay 3

Configures the command to retry failed requests a specified number of times with a delay between attempts.

Essential for handling transient network issues, server rate limits, or temporary website outages.


                            openclaw log level debug

Sets the logging level for OpenClaw, enabling more verbose output for debugging purposes.

Use 'info' for general operations, 'warn' for potential issues, and 'error' for critical failures.


                            openclaw validate-selectors --url https://example.com --selector 'h1.title' 'p.content'

Tests and validates CSS selectors or XPath expressions against a live URL, reporting if elements are found.

Run this command before a full scrape to quickly debug selector issues and avoid failed runs.


                            openclaw debug-session --url https://problematic.site --headless=false

Launches a browser in non-headless mode for visual inspection of page loading, JavaScript execution, and network requests.

Use this to diagnose issues where content is not loading correctly or selectors are failing due to dynamic rendering.

OpenClaw CLI Cheat Sheet: Ultimate Reference

I. Core Commands & Setup

II. Basic Scraping & Extraction

III. Advanced Data Processing with AI

IV. Authentication & Session Management

V. Output & Reporting

VI. Error Handling & Debugging

Frequently Asked Questions about OpenClaw

What makes OpenClaw different from other scraping tools?

Is OpenClaw suitable for large-scale web scraping projects?

How does OpenClaw handle JavaScript-rendered content?

Can I integrate OpenClaw with my existing data workflows?

What kind of AI models does OpenClaw support?