Getting Started & Configuration
openclaw init --project
Initialises a new OpenClaw project in the current directory, setting up necessary configuration files and folders.
Always start with `init` for organised project management and to configure project-specific settings.
openclaw config set
Sets a global or project-specific configuration parameter, such as API keys, proxy settings, or default output formats.
Use `openclaw config list` to view all current settings. Remember to quote string values.
openclaw login --token
Authenticates your OpenClaw CLI session using a provided API token, granting access to advanced features and cloud services.
Keep your API token secure. Consider using environment variables for sensitive data rather than directly in commands.
openclaw help
Displays detailed help information and available options for a specific OpenClaw command.
This is your quickest way to understand command syntax and discover less common flags or arguments.
Basic Scraping & Extraction
openclaw scrape url --selector 'div.product-name'
Scrapes data from a specified URL using CSS selectors or XPath expressions to target specific elements.
For complex selections, combine multiple selectors or use the `--xpath` flag for more flexibility.
openclaw extract --from-file --pattern ']*>(.*?)
'
Extracts data from a local HTML file using a regular expression pattern, useful for post-processing downloaded content.
Test your regex patterns rigorously before applying them to large datasets to avoid unexpected results.
openclaw get url --output
Fetches the full HTML content of a given URL and saves it to a specified file, typically for later offline processing.
This command is ideal for archiving web pages or when you need to inspect the raw HTML before extraction.
openclaw follow --depth 2 --selector 'a.next-page'
Navigates and scrapes through multiple pages linked from a starting URL, respecting a specified maximum depth.
Always set a reasonable `--depth` to prevent infinite loops and manage resource consumption on large sites.
Advanced AI-Driven Extraction
openclaw smart-extract url --data-type 'product_info'
Utilises AI to intelligently identify and extract structured data, such as product details, articles, or contact information, without explicit selectors.
Specify the `--data-type` for better accuracy; OpenClaw's AI learns from common web data structures.
openclaw learn-pattern url --example-data '{"price": "£19.99"}'
Trains OpenClaw's AI to recognise and extract specific data patterns from a URL based on provided examples.
Provide diverse examples for robust pattern recognition across varied page layouts.
openclaw classify url --model 'news_article'
Applies a pre-trained AI model to classify the content type of a web page, useful for filtering or categorising scraped data.
Custom models can be trained for highly specific classification tasks to improve relevance.
openclaw summarise url --length 'short'
Generates a concise summary of the main content on a web page using natural language processing (NLP).
Adjust `--length` to 'medium' or 'long' for more detailed summaries, depending on your analysis needs.
openclaw identify-elements url --role 'main_content'
Uses AI to pinpoint and return selectors for elements on a page that fulfil a specified role, like 'main_content' or 'navigation'.
This command is invaluable for dynamically generating selectors, especially on sites with inconsistent markup.
Data Processing & Transformation
openclaw transform --map 'old_field:new_field, price:price_gbp'
Transforms fields within a JSON or CSV file, renaming, reformatting, or combining data points.
Use advanced mapping expressions for complex transformations, including nested object manipulation.
openclaw filter --query 'price > 100 AND category = "electronics"'
Filters data records from an input file based on a specified query expression, supporting logical operators.
Ensure your query syntax is correct for the data type of the fields you are filtering.
openclaw merge --on 'product_id'
Combines two data files (JSON or CSV) based on a common key, akin to a database join operation.
Specify the `--output` flag to save the merged data to a new file, otherwise it prints to stdout.
openclaw clean --remove-duplicates --trim-whitespace
Performs various data cleaning operations on an input file, such as removing duplicates or trimming whitespace.
Chain multiple cleaning flags for comprehensive data hygiene, like `--convert-case upper`.
Automation & Scheduling
openclaw schedule '0 9 * * 1-5' --task 'scrape_daily_deals'
Schedules a predefined OpenClaw task to run automatically at specified intervals using cron-like syntax.
Ensure your task script or command is executable and correctly configured within the OpenClaw environment.
openclaw monitor url --changes-only --notify-email
Monitors a URL for content changes and sends notifications only when updates are detected.
Combine with `--selector` to monitor specific parts of a page, reducing noise from irrelevant changes.
openclaw workflow run
Executes a predefined sequence of OpenClaw commands and scripts, orchestrating complex data pipelines.
Define workflows in YAML files for clear, version-controlled automation sequences.
openclaw trigger event 'new_post' --action 'process_article'
Sets up event-driven automation, where a specified action is performed upon a detected event.
Integrate with webhooks to trigger actions from external systems or services.
Error Handling & Debugging
openclaw log view --level 'error'
Displays system and command logs, filtered by severity level, to help diagnose issues.
Use `--follow` to stream logs in real-time, especially useful during long-running tasks or debugging.
openclaw debug url --show-dom --screenshot
Launches a debugging session for a URL, showing the rendered DOM, network requests, and optionally a screenshot.
The `--headless-browser` flag can be set to false to open a visible browser for interactive debugging.
openclaw retry --last-command --max-attempts 3 --delay 5s
Retries the last executed command a specified number of times with an optional delay between attempts.
Essential for handling transient network issues or temporary server unavailability during scraping.
openclaw validate --schema
Validates the structure and data types of an output file against a predefined JSON schema.
Ensures data consistency and quality, crucial for integration into downstream systems.