Prev | Home | Next |
Exotic Amazon (Chinese mirror: exotic-amazon) is a complete solution for crawling the entire amazon.com website, ready to use out of the box, containing most data types of Amazon, and it will be permanently provided for free and open source.
The methods and processes for data collection of other e-commerce platforms are basically similar. You can modify and adjust the business logic based on this project, and its infrastructure solves all the difficulties faced by large-scale data collection.
Thanks to the comprehensive Web data management infrastructure provided by PulsarRPA, the entire solution consists of no more than 3500 lines of Kotlin code and less than 700 lines of X-SQL to extract more than 650 fields.
git clone https://github.com/platonai/exotic-amazon.git
cd exotic-amazon && mvn
java -jar target/exotic-amazon*.jar
# Or on Windows:
java -jar target/exotic-amazon-{the-actual-version}.jar
Open System Glances to get a clear view of the system status.
All extraction rules (Chinese mirror: exotic-amazon) are written in X-SQL. Data type conversion and data cleaning are also handled by powerful X-SQL inline processing, which is an important reason why we developed X-SQL. A good example of X-SQL is x-asin.sql (Chinese mirror: exotic-amazon), which extracts more than 70 fields from each product page.
By default, results are written in json format to the local file system.
There are several ways to save results to the database:
Prev | Home | Next |