PicoScraper is an easy to use, Json-defined scraper

> picoscraper my-scraper.json my-output-db.sqlite3

Define with JSON

  • Simple: One JSON file for everything, both configuration and elements.
  • CSS Selectors: From pasting directly from your browser inspector, to complex selectors and pseudoclasses.
  • Properties: Text, href, classes, ids, sources, custom attributes...
      
{
  "start_url": "file:url-list.txt",
  "random_sleep": 3,
  "rows_selector": "table[id*=\"post\"]",
  "columns_selector": [
    {
      "field": "thread",
      "css_selector": "a[href*=\"thread\"]",
      "default": "",
      "property": "href"
    },
    {
      "field": "date",
      "css_selector": "span:contains(\"AM\"),span:contains(\"PM\")",
      "default": "no-date-defined",
      "property": "text"
    }
  ]
}
      
    

Multiple output choices

  • Files: Excel & CSV.
  • Databases: SQLite, and basically anything with an ODBC driver.

Can't get more lazy loading URLs

Load from lists

"start_url": "file:url-list.txt"

Load from ranges

"start_url": "http://example.com/sub/[5-15:5]/page/parameter"
http://example.com/sub/5/page/parameter
http://example.com/sub/10/page/parameter
http://example.com/sub/15/page/parameter

Crawl & Scrape

      
{
  "start_url": "http://example.com/",
  "crawl_levels": 5,
  "scrape_selector": "a[href*=\"thread_\"]",
  "random_sleep": 3,
  "rows_selector": "table[id*=\"post\"]",
  "columns_selector": []
}
      
    
Gets all links from http://example.com, uses any link that matches with a[href*=\"thread_\"] as starting url for a scrape, saves the output, and repeats the process for the next 5 levels.

Demo VS Full License

License
Demo
Full
Price
Free
Unlimited
Time
10$/m
or
100$/y
Basic
Row Limit
10K
None
Basic
Pagination
Selector,
Scroll
Selector,
Scroll
Export
SQLite
✔️
✔️
Export
Excel
✔️
✔️
Export
CSV
✔️
✔️
Export
JSON
✔️
✔️
Export
ODBC
✔️
Load
URL List
✔️
✔️
Load
Ranges
✔️
✔️
Load
Crawl & Scrape
✔️
Advanced
JS Rendering
✔️
Advanced
Rotating
Proxys
✔️
Advanced
Browser
Fingerprinting
✔️
Advanced
Captcha Solver
Integration
✔️
Advanced
User / Password
Handling
✔️
Download Buy License Learn More!

Current roadmap