You can open the Settings window by clicking "Tools->Settings"or by pressing Alt+F7. This window allows you to change the following settings:
"Reload last project at startup" - if this option is enabled, the program will automatically open the last project when you start the program.
"Restart the program when memory reached" - if this option is enabled and memory reached the limit, then it will be restarted automatically to release the memory.
"Hide the results view after" - if this option is enabled, then the program will hide the results view after x minutes.
"Enable logging" - enable this option if you want to log web scraper events. The logging directory is the directory where the Web Content Extarctor stores log files, the default logging directory is a current directory.
"Internet Connection Settings" - the program uses the Internet connection settings from Internet Explorer. You can change these settings by clicking the "Change"button.
"Enable to open JSON documents in Internet Explorer browser" - to change this option you have to run Web Content Extractor as an administrator.
"User Agent String " - the string attached to the request header (if you use Google Chrome you need to restart the program to have this change take effect). This is a global setting and applies to all projects.
"Delay between download and parsing data" - the delay that is necessary to execute all scripts on a page.
"Time-out to receive a response to a request " - the maximum time the program will wait for a response from the server after requesting a page.
"Enable images" - enable this option if you want to see images in the web browser.
"Convert json content to html" - if this option is enabled, the program will convert json data to html.
"Enable authentication dialog" - if this option is enabled, the program will display a Basic Authentication dialog to ask the user for a username and password.
"Enable a pop-up window" - if this option is enabled, the program will open popup windows in the main window.
"Split merged table cells" - if this option is enabled, the program will separate all merged cells in the webpage table into individual cells.
"Delay between requests" - the delay necessary to prevent the server from being overloaded by multiple requests from the program. We recommend that you set the delay to at least 1-2 seconds.
"Maximum number of download threads" - the number of simultaneous connections to a server.
"Maximum crawling time" - limits the maximum crawling time. Set the number of minutes a project is allowed to run. If this is reached, the program stops the project. If set to zero, no time limit is imposed.
"Crawl only unique URLs" - if this option is enabled, the program will add only new links to the project, i.e. links that are not in the task list yet.
"Extract only unique data" - if this option is enabled, the program will add only new data to the project, i.e. data that are not in the database yet.
"Resolve redirect URLs" - if this option is enabled, the program will update the URLs of redirected links.
"Remove hash from URLs" - if this option is enabled, the program will remove hash string from the URLs. A hash string is the part of the URL that appears after the '#' sign.
"Use separate thread for parsing" - if this option is enabled, the program will use a separate thread for the parsing process.
"Use Proxy Server " - if this option is enabled, the program will use proxy server to internet connection. Use the following syntax for the proxy address: <ip_address>:<port> where <ip_address> is the Ip address of the proxy server, and <port> is the port number that is assigned to the proxy server. If your proxy server requires authentication, you have to use: <username>:<password>@<ip_address>:<port>
"Change browser proxy every x requests" - the program will change the browser proxy every x requests.
"Detect Captcha Page" - if this option is enabled, the program will scan pages for captcha page patterns. You can specify four types of patterns (used "contain" function, not "equal"):
If the program detects the captcha page, then it stops the extraction process and shows the browser window to enable you to enter the captcha text.