WaterCrawl
π·οΈ WaterCrawl is a powerful web application that uses Python, Django, Scrapy, and Celery to crawl web pages and extract relevant data.
β¨ Features
- πΈοΈ Advanced Web Crawling & Scraping - Crawl websites with highly customizable options for depth, speed, and targeting specific content
- π Powerful Search Engine - Find relevant content across the web with multiple search depths (basic, advanced, ultimate)
- π Multi-language Support - Search and crawl content in different languages with country-specific targeting
- β‘ Asynchronous Processing - Monitor real-time progress of crawls and searches via Server-Sent Events (SSE)
- π REST API with OpenAPI - Comprehensive API with detailed documentation and client libraries
- π Rich Ecosystem - Integrations with Dify, N8N, and other AI/automation platforms
- π Self-hosted & Open Source - Full control over your data with easy deployment options
- π Advanced Results Handling - Download and process search results with customizable parameters
Check our API Overview to learn more about these features.
π οΈ Client SDKs
- β Python Client - Full-featured SDK with support for all API endpoints
- β Node.js Client - Complete JavaScript/TypeScript integration
- β Go Client - Full-featured SDK with support for all API endpoints
- β PHP Client - Full-featured SDK with support for all API endpoints
- π Rust Client - Coming soon
π Integrations
- β Dify Plugin (source code)
- β N8N workflow node (source code)
- β Dify Knowledge Base
- π Langflow (Pull Request - Not Merged yet)
- π Flowise (Coming soon)