Ultimate Reference

OpenClaw CLI Cheat Sheet: Ultimate Reference

Accelerate your data extraction with powerful AI-driven commands. From basic fetches to advanced automation, OpenClaw empowers your web scraping workflows.

I. Core Commands & Setup

openclaw init --project my_project
Copied!

Initializes a new OpenClaw project directory, creating necessary configuration files and a default scraping template.

Always start with 'init' to ensure proper project structure and dependency management for your scraping tasks.

openclaw help
Copied!

Displays detailed help information for a specific OpenClaw command, including available options and arguments.

Use 'openclaw help' without a command for a general overview of all available commands.

openclaw config set proxy.url http://myproxy:8080
Copied!

Sets a global or project-specific configuration parameter, such as proxy settings, timeout values, or default output formats.

Configuration changes can be scoped to the current project or applied globally using appropriate flags.

openclaw update
Copied!

Checks for and installs the latest version of the OpenClaw CLI tool and its core dependencies.

Regularly updating OpenClaw ensures access to new features, performance improvements, and security patches.

II. Basic Scraping & Extraction

openclaw fetch https://example.com/products --output products.html
Copied!

Fetches the content of a specified URL and saves it to a local file. Supports various output formats.

Combine with `--wait` to simulate user browsing for dynamic content loading before saving.

openclaw extract --url https://example.com --selector 'h1.title' --output title.txt
Copied!

Extracts data from a given URL using CSS selectors or XPath expressions and outputs the result.

Use multiple `--selector` flags to extract different data points in a single command, often combined with JSON output.

openclaw browse --url https://example.com/login --interactive
Copied!

Opens an interactive browser session for debugging selectors, performing manual actions, or handling complex CAPTCHAs.

This mode is invaluable for visually inspecting the DOM and testing selectors in real-time.

openclaw scrape-list --url https://example.com/articles --item-selector '.article-card' --field 'title:.article-title' --field 'link:a@href'
Copied!

Scrapes a list of items from a page, defining a common item selector and then specific fields within each item.

This command streamlines the extraction of structured data from article lists, product pages, or search results.

III. Advanced Data Processing with AI

openclaw process-ai --input raw_text.txt --model 'summarizer' --output summary.txt
Copied!

Applies an AI model to process input text, such as summarization, sentiment analysis, or entity recognition.

Explore available AI models with `openclaw models list` to find the best fit for your data processing needs.

openclaw extract-ai --url https://invoice.com/doc123 --template 'invoice_parser' --output invoice_data.json
Copied!

Utilizes AI to intelligently extract structured data from semi-structured or unstructured documents (e.g., invoices, reports) using predefined templates.

Custom AI templates can be trained for highly specific document types to achieve superior extraction accuracy.

openclaw clean-data --input messy_data.csv --rules 'trim,deduplicate' --output clean_data.csv
Copied!

Applies AI-powered data cleaning rules to an input dataset, handling tasks like formatting, deduplication, and error correction.

Define custom cleaning rules in a configuration file for complex transformations and consistency across multiple datasets.

openclaw classify-text --input reviews.txt --model 'sentiment_analyzer' --output classified_reviews.json
Copied!

Classifies text data based on a specified AI model, useful for categorizing content, identifying spam, or gauging sentiment.

Integrating this into a scraping pipeline allows for immediate categorization of extracted text content.

IV. Authentication & Session Management

openclaw auth login --url https://myportal.com/login --user ENV_USER --pass ENV_PASS
Copied!

Performs an automated login to a website using provided credentials, storing session cookies for subsequent requests.

Always use environment variables for sensitive information like usernames and passwords, never hardcode them.

openclaw session save my_session.json
Copied!

Saves the current scraping session's cookies and local storage state to a file for later reuse.

Saving sessions is crucial for resuming interrupted scrapes or accessing authenticated content without re-logging in repeatedly.

openclaw session load my_session.json --url https://myportal.com/dashboard
Copied!

Loads a previously saved session, restoring cookies and state, and navigates to a specified URL.

Combine with `fetch` or `extract` commands to seamlessly continue authenticated scraping tasks.

openclaw auth logout --url https://myportal.com/logout
Copied!

Performs an automated logout from a website, clearing session cookies and terminating the active session.

It is good practice to explicitly log out when a scraping task is complete, especially for sensitive accounts.

V. Output & Reporting

openclaw export --input data.json --format csv --output final_data.csv
Copied!

Converts extracted data from one format (e.g., JSON) to another (e.g., CSV, Excel, XML).

Use `--header` and `--delimiter` flags for fine-grained control over CSV output.

openclaw report generate --template summary_report.md --data results.json
Copied!

Generates a custom report based on extracted data and a predefined template, supporting various output formats.

Templates can be written in Markdown, HTML, or other templating languages for dynamic report generation.

openclaw pipeline run my_pipeline.yml --output combined_results.json
Copied!

Executes a predefined data processing pipeline, combining multiple scraping, extraction, and transformation steps.

Pipelines are ideal for complex workflows, ensuring data consistency and automated processing.

openclaw visualize --data metrics.json --type chart --output dashboard.html
Copied!

Creates interactive data visualizations (charts, graphs) from extracted metrics and saves them as HTML.

This command helps in quickly understanding and presenting the insights derived from scraped data.

VI. Error Handling & Debugging

openclaw fetch --url https://broken.example.com --retries 5 --delay 3
Copied!

Configures the command to retry failed requests a specified number of times with a delay between attempts.

Essential for handling transient network issues, server rate limits, or temporary website outages.

openclaw log level debug
Copied!

Sets the logging level for OpenClaw, enabling more verbose output for debugging purposes.

Use 'info' for general operations, 'warn' for potential issues, and 'error' for critical failures.

openclaw validate-selectors --url https://example.com --selector 'h1.title' 'p.content'
Copied!

Tests and validates CSS selectors or XPath expressions against a live URL, reporting if elements are found.

Run this command before a full scrape to quickly debug selector issues and avoid failed runs.

openclaw debug-session --url https://problematic.site --headless=false
Copied!

Launches a browser in non-headless mode for visual inspection of page loading, JavaScript execution, and network requests.

Use this to diagnose issues where content is not loading correctly or selectors are failing due to dynamic rendering.

Frequently Asked Questions about OpenClaw

What makes OpenClaw different from other scraping tools?

OpenClaw integrates advanced AI models directly into the CLI, enabling intelligent data extraction from unstructured content, automated data cleaning, and sophisticated content classification, beyond traditional selector-based scraping.

Is OpenClaw suitable for large-scale web scraping projects?

Yes, OpenClaw is designed for scalability. It includes features like proxy rotation, session management, rate limiting, and pipeline automation, making it robust for large and complex data extraction tasks.

How does OpenClaw handle JavaScript-rendered content?

OpenClaw utilizes an integrated headless browser environment, allowing it to fully render JavaScript-heavy pages before extraction. You can also specify wait times or interact with elements to trigger dynamic content.

Can I integrate OpenClaw with my existing data workflows?

Absolutely. OpenClaw supports various input/output formats (JSON, CSV, XML, Excel) and can be easily scripted or integrated into CI/CD pipelines, cron jobs, or other automation frameworks.

What kind of AI models does OpenClaw support?

OpenClaw supports a range of pre-trained AI models for tasks like summarization, sentiment analysis, entity recognition, document parsing (e.g., invoices), and text classification. Users can also integrate custom or fine-tuned models.