I. Core Commands & Setup
openclaw init --project my_project
Initializes a new OpenClaw project directory, creating necessary configuration files and a default scraping template.
Always start with 'init' to ensure proper project structure and dependency management for your scraping tasks.
openclaw help
Displays detailed help information for a specific OpenClaw command, including available options and arguments.
Use 'openclaw help' without a command for a general overview of all available commands.
openclaw config set proxy.url http://myproxy:8080
Sets a global or project-specific configuration parameter, such as proxy settings, timeout values, or default output formats.
Configuration changes can be scoped to the current project or applied globally using appropriate flags.
openclaw update
Checks for and installs the latest version of the OpenClaw CLI tool and its core dependencies.
Regularly updating OpenClaw ensures access to new features, performance improvements, and security patches.
II. Basic Scraping & Extraction
openclaw fetch https://example.com/products --output products.html
Fetches the content of a specified URL and saves it to a local file. Supports various output formats.
Combine with `--wait` to simulate user browsing for dynamic content loading before saving.
openclaw extract --url https://example.com --selector 'h1.title' --output title.txt
Extracts data from a given URL using CSS selectors or XPath expressions and outputs the result.
Use multiple `--selector` flags to extract different data points in a single command, often combined with JSON output.
openclaw browse --url https://example.com/login --interactive
Opens an interactive browser session for debugging selectors, performing manual actions, or handling complex CAPTCHAs.
This mode is invaluable for visually inspecting the DOM and testing selectors in real-time.
openclaw scrape-list --url https://example.com/articles --item-selector '.article-card' --field 'title:.article-title' --field 'link:a@href'
Scrapes a list of items from a page, defining a common item selector and then specific fields within each item.
This command streamlines the extraction of structured data from article lists, product pages, or search results.
III. Advanced Data Processing with AI
openclaw process-ai --input raw_text.txt --model 'summarizer' --output summary.txt
Applies an AI model to process input text, such as summarization, sentiment analysis, or entity recognition.
Explore available AI models with `openclaw models list` to find the best fit for your data processing needs.
openclaw extract-ai --url https://invoice.com/doc123 --template 'invoice_parser' --output invoice_data.json
Utilizes AI to intelligently extract structured data from semi-structured or unstructured documents (e.g., invoices, reports) using predefined templates.
Custom AI templates can be trained for highly specific document types to achieve superior extraction accuracy.
openclaw clean-data --input messy_data.csv --rules 'trim,deduplicate' --output clean_data.csv
Applies AI-powered data cleaning rules to an input dataset, handling tasks like formatting, deduplication, and error correction.
Define custom cleaning rules in a configuration file for complex transformations and consistency across multiple datasets.
openclaw classify-text --input reviews.txt --model 'sentiment_analyzer' --output classified_reviews.json
Classifies text data based on a specified AI model, useful for categorizing content, identifying spam, or gauging sentiment.
Integrating this into a scraping pipeline allows for immediate categorization of extracted text content.
IV. Authentication & Session Management
openclaw auth login --url https://myportal.com/login --user ENV_USER --pass ENV_PASS
Performs an automated login to a website using provided credentials, storing session cookies for subsequent requests.
Always use environment variables for sensitive information like usernames and passwords, never hardcode them.
openclaw session save my_session.json
Saves the current scraping session's cookies and local storage state to a file for later reuse.
Saving sessions is crucial for resuming interrupted scrapes or accessing authenticated content without re-logging in repeatedly.
openclaw session load my_session.json --url https://myportal.com/dashboard
Loads a previously saved session, restoring cookies and state, and navigates to a specified URL.
Combine with `fetch` or `extract` commands to seamlessly continue authenticated scraping tasks.
openclaw auth logout --url https://myportal.com/logout
Performs an automated logout from a website, clearing session cookies and terminating the active session.
It is good practice to explicitly log out when a scraping task is complete, especially for sensitive accounts.
V. Output & Reporting
openclaw export --input data.json --format csv --output final_data.csv
Converts extracted data from one format (e.g., JSON) to another (e.g., CSV, Excel, XML).
Use `--header` and `--delimiter` flags for fine-grained control over CSV output.
openclaw report generate --template summary_report.md --data results.json
Generates a custom report based on extracted data and a predefined template, supporting various output formats.
Templates can be written in Markdown, HTML, or other templating languages for dynamic report generation.
openclaw pipeline run my_pipeline.yml --output combined_results.json
Executes a predefined data processing pipeline, combining multiple scraping, extraction, and transformation steps.
Pipelines are ideal for complex workflows, ensuring data consistency and automated processing.
openclaw visualize --data metrics.json --type chart --output dashboard.html
Creates interactive data visualizations (charts, graphs) from extracted metrics and saves them as HTML.
This command helps in quickly understanding and presenting the insights derived from scraped data.
VI. Error Handling & Debugging
openclaw fetch --url https://broken.example.com --retries 5 --delay 3
Configures the command to retry failed requests a specified number of times with a delay between attempts.
Essential for handling transient network issues, server rate limits, or temporary website outages.
openclaw log level debug
Sets the logging level for OpenClaw, enabling more verbose output for debugging purposes.
Use 'info' for general operations, 'warn' for potential issues, and 'error' for critical failures.
openclaw validate-selectors --url https://example.com --selector 'h1.title' 'p.content'
Tests and validates CSS selectors or XPath expressions against a live URL, reporting if elements are found.
Run this command before a full scrape to quickly debug selector issues and avoid failed runs.
openclaw debug-session --url https://problematic.site --headless=false
Launches a browser in non-headless mode for visual inspection of page loading, JavaScript execution, and network requests.
Use this to diagnose issues where content is not loading correctly or selectors are failing due to dynamic rendering.