Newprosoft

web scraping software

Web Content Extractor Command Line Options

It is possible to perform Web Content Extractor commands from the command line. Possible key prefixes are "-" and "/".

Syntax:

WCExtractor.exe [project_filename] [-dr] [-dt] [-rt] [-rft] [-at"filename"] [-gt"filename"] [-aiv"control_name,control_id,filename"] [-proxy"filename"][-s] [-ddr] [-fr] [-fr"column_name{Criteria}value{Condition}[;column_name{Criteria}value{Condition}]"] [-qe] [-qe"filename"] [-eft"filename"] [-ex] [-min] [-background] [-stop"projectname"]
Key Command
project_filename File name of the project (*.wcepr) to open.
-dr Delete all results
-dt Delete all URLs
-rt Reset all URLs
-rft Reset failed URLs
-ret Reset empty URLs
-at"filename" Add new URLs from file, filename - name of the csv or txt file that contains URLs separated by newlines.
-gt"filename" Generate new URLs using script (jscript or vbscript), filename - name of the script file that contains 'Main' function.
-aiv"control_name,control_id,filename" Add new input values from file, control_name - name of the input element, control_id - id of the input element, filename - name of the csv or txt file that contains input values separated by newlines.
-proxy"filename" Set proxy addresses from file, filename - name of the csv or txt file that contains proxy addresses separated by newlines. Use the following syntax for the proxy address: <ip_address>:<port> where <ip_address> is the Ip address of the proxy server, and <port> is the port number that is assigned to the proxy server.
-s Start project
-ddr Delete duplicate records
-fr Filter results. The program will filter results, using the results filter, which was the last to be used in the project. If the results filter has never been used, then this function is not available.
-fr"column_name{Criteria}value{Condition}
[;column_name{Criteria}value{Condition}]"
Filter results, criteria: 0 - contains, 1 - does not contain, 2 - equals, 3 - does not equal, 4 - begins with, 5 - does not begin with, 6 - ends with, 7 - does not end with, 8 - is larger than, 9 - is less than; condition: 0 - AND, 1 - OR
-qe Complete quick export. The program will export data, using the export configuration, which was the last to be executed in the project. If the project data has never been exported, then this function is not available.
-qe"filename" Complete quick export, filename - name of the new output file. The program will export data, using the export configuration, which was the last to be executed in the project. If the project data has never been exported, then this function is not available.
-eft"filename" Export failed URLs, filename - name of the output file (html, csv or txt).
-ex Exit when all URLs are downloaded.
-min Run the program in the minimized state.
-background Run the program in background
-stop"projectname" Stop project, save and close the program.

Examples

To launch the program, then open the "myproject.wcepr" project file, delete all previous results, reset all URLs, start the extraction process, export data and close the program, you should use the following command:

"C:\Program Files (x86)\Web Content Extractor\WCExtractor.exe" "C:\Users\User1\Documents\Web Content Extractor Projects\myproject.wcepr" -dr -rt -s -qe -ex

To launch the program, then open the "myproject.wcepr" project file, delete all previous URLs, add new URLs from "urls.csv" file, start the extraction process, export data and close the program, you should use the following command:

"C:\Program Files (x86)\Web Content Extractor\WCExtractor.exe" "C:\Users\User1\Documents\Web Content Extractor Projects\myproject.wcepr" -dt -at"C:\Program Files (x86)\Web Content Extractor\urls.csv" -s -qe -ex

Note: The program will export data, using the export configuration, which was the last to be executed in the project. If the project has never been exported, then this function is not available.