OpenClaw CLI: Ultimate Web Scraping & Data Automation Cheat Sheet

Core Scraping & Extraction


                            openclaw scrape  --selector "" --output

Initiate a scraping job on a specified URL, extracting data based on a CSS or XPath selector and saving it to a JSON file.

Use specific selectors for precise data targeting. Combine with `--ai-parse` for smart content recognition.


                            openclaw extract  --ai-model "smart_parser" --schema

Perform advanced data extraction using an AI model to intelligently identify and structure data according to a predefined JSON schema.

Define a robust schema to guide the AI for accurate and consistent data output.


                            openclaw fetch  --proxy "http://proxy.example.com:8080" --user-agent "Mozilla/5.0"

Fetch a webpage using custom proxy settings and a specific user agent to mimic different browser environments or bypass IP blocks.

Rotate proxies and user agents for large-scale scraping to avoid detection and rate limiting.


                            openclaw headless  --wait-for-selector ".content" --screenshot

Open a URL in a headless browser, wait for a specific element to load, and then capture a full-page screenshot.

Crucial for scraping dynamic, JavaScript-rendered websites. Adjust wait times for complex pages.


                            openclaw parse-html  --pattern "regex_pattern" --field "data"

Parse local HTML files using regular expressions to extract specific data fields without making a new web request.

Ideal for reprocessing previously saved HTML or for complex pattern matching on local content.

Data Transformation & Cleaning


                            openclaw transform  --jq-query ".items[] | {title: .name, price: .cost}" --output

Transform JSON data using a JQ query to reshape, filter, or reformat the output structure.

Master JQ syntax for powerful and flexible data manipulation directly from the command line.


                            openclaw clean  --remove-duplicates "product_id" --output

Clean a CSV file by removing duplicate rows based on a specified unique identifier field.

Apply cleaning steps early in your data pipeline to ensure data quality before further processing.


                            openclaw standardize  --field "price" --currency "USD"

Standardize data in a specified field, for example, converting all currency values to a common format.

Use this command to ensure consistency across diverse datasets, improving analysis reliability.


                            openclaw merge   --on "id" --output

Merge two JSON files based on a common key, combining their data into a single output file.

Ensure the 'on' key exists and is consistent across both files for successful merging.

Automation & Workflow


                            openclaw schedule "0 0 * * *" "scrape_job.yml"

Schedule a scraping job defined in a YAML configuration file to run at a specific time using cron syntax.

Use a dedicated configuration file for complex jobs to manage parameters and targets efficiently.


                            openclaw monitor  --interval "1h" --notify "[email protected]"

Set up continuous monitoring for changes on a webpage, triggering notifications if significant updates are detected.

Configure notification channels (email, webhook, Slack) in OpenClaw settings for seamless alerts.


                            openclaw pipeline run

Execute a predefined data pipeline, orchestrating multiple scraping, transformation, and storage steps.

Pipelines are essential for complex workflows, ensuring steps run in the correct sequence with error handling.


                            openclaw export  --format "csv" --delimiter ","

Export extracted data from a JSON file into various formats like CSV, specifying delimiters as needed.

OpenClaw supports multiple export formats, including XML and Excel, for versatile data integration.


                            openclaw integrate --service "webhook" --endpoint "https://api.example.com/data"

Integrate OpenClaw with external services by sending extracted data to a specified webhook endpoint.

Automate data delivery to dashboards, databases, or other applications using custom webhooks.

AI & Advanced Features


                            openclaw classify  --field "description" --model "sentiment"

Apply an AI model to classify text data within a specified field, such as sentiment analysis or topic categorization.

Leverage OpenClaw's built-in AI models for quick insights or integrate custom models for specialized tasks.


                            openclaw summarize  --ai-model "gpt4-turbo" --max-tokens 200

Generate a concise summary of long text content using a specified large language model, limiting the output tokens.

Useful for condensing lengthy articles or reviews into actionable summaries post-extraction.


                            openclaw detect-changes   --threshold 0.1

Compare two HTML files to detect structural or content changes, reporting differences above a set threshold.

Ideal for monitoring competitor price changes or content updates on critical web pages.


                            openclaw generate-schema  --output

Automatically infer a JSON schema from a sample HTML page, aiding in rapid schema development for extraction.

Start with an inferred schema and refine it for more precise and robust data extraction.


                            openclaw semantic-search  --data  --top-k 5

Perform a semantic search within a local JSON data corpus, returning the top K most relevant results based on meaning.

Pre-process your data corpus for optimal semantic search performance and relevance.

Configuration & Management


                            openclaw config set proxy.default "http://localhost:8888"

Set a global configuration parameter, for example, defining a default proxy server for all operations.

Manage frequently used settings centrally to streamline command execution and maintain consistency.


                            openclaw config get log.level

Retrieve the current value of a specific configuration parameter, such as the logging level.

Review configuration settings to troubleshoot behavior or confirm active parameters.


                            openclaw profiles create "e-commerce" --headless --user-agent "Chrome"

Create a named profile with predefined settings like headless mode and user agent for specific scraping scenarios.

Use profiles to quickly switch between different scraping configurations without retyping parameters.


                            openclaw profiles use "e-commerce"

Activate a previously created named profile, applying its settings to subsequent OpenClaw commands.

Simplify complex command lines by encapsulating common options within profiles.


                            openclaw auth login --provider "api_key" --key "YOUR_API_KEY"

Authenticate OpenClaw with external services or APIs using various authentication providers like API keys.

Securely manage your credentials using OpenClaw's built-in authentication system.

Debugging & Troubleshooting


                            openclaw debug  --verbose --log-file debug.log

Run a scraping operation in debug mode, providing verbose output and logging all activities to a specified file.

Essential for diagnosing issues with selectors, network requests, or AI model behavior.


                            openclaw dry-run  --show-plan

Execute a job configuration in dry-run mode to preview the actions OpenClaw would take without actually performing them.

Verify complex pipeline configurations and command sequences before live execution.


                            openclaw validate-schema

Validate a JSON data file against a specified JSON schema to ensure data integrity and conformity.

Integrate schema validation into your pipelines to catch data formatting errors early.


                            openclaw inspect  --url

Inspect a specific HTML element on a live URL, revealing its attributes, text content, and computed styles.

Use this to quickly test and refine CSS or XPath selectors directly from the CLI.


                            openclaw logs tail --follow

Display the OpenClaw system logs in real-time, following new entries as they are generated.

Monitor background processes and scheduled jobs for status updates and error messages.

OpenClaw CLI: The Ultimate Web Scraping & Data Automation Cheat Sheet

Core Scraping & Extraction

Data Transformation & Cleaning

Automation & Workflow

AI & Advanced Features

Configuration & Management

Debugging & Troubleshooting

Frequently Asked Questions about OpenClaw

What is OpenClaw CLI?

Is OpenClaw suitable for dynamic websites?

How does OpenClaw handle rate limiting?

Can OpenClaw integrate with other tools?