Crawling Rules are rules that determine links the program should follow.
First, you should select the level of pages the rules will be applied to. The page level is the number of links the program has to follow to move from the start page to the current one. The level of all start pages is 1. The level of all pages that can be opened from the start page is 2. The level of all pages that can be opened from a page with level 2 is 3 and so on.
There are two ways to add a level: by clicking the "Add" button or by clicking the "Duplicate" button. A click on the "Add" button just adds a new level that contains no rules. A click on the "Duplicate" button adds a new level and copies all rules from the current level to the new one. You can delete the level by selecting it and and clicking the "Delete" button (you can delete only the last level).
The "Page URL" field contains the URL of the page that will be used to create rules. You can type the URL manually or you can click the button and open the necessary page using the built-in browser. For level 0, the wizard sets the initial value of the URL equal to the URL of the start page. You can leave this field empty.
There are two types of rules: Basic Rules and Advanced Rules:
Basic Rules allow you to specify the positions of the necessary links on a page. The program will look through these positions on every page of the corresponding level and if there is a link in this position, the program will extract its URL and add it to the task list to download it later.
To specify link positions, click the "Add" button, wait till the page is loaded and click the links the program should follow. The positions of these links will be automatically selected in the right window. If you click a link by mistake, click it once again to clear the selection. Click the "OK" button and all the selected positions will be saved to the list. You will be able to edit this list later by clicking the "Edit" button.
You can specify a script to change link URLs. This parameter is optional and you can skip it. It increases the performance of the program if the link URL is written in Javascript because you can turn it into the URL of the target page without executing it. For example, if the link contains the following script: "javascript:goto('http://www.domain.com')" and its execution takes the program to the page with the URL "http://www.domain.com", here you can specify the script that will turn the URL from "javascript:goto('http://www.domain.com')" into "http://www.domain.com" so the program will not waste time on loading the page and executing the script, but it will just load the target page at once. To specify the script, click the "URL Transformation Script" button. This script will be applied to all links.
The "Delete" button deletes the selected position. The "Up" and "Down" buttons move the selected position up or down the list respectively.
Advanced Rules allow you to specify the patterns of the links the program should or should not follow. If a page contains many links or links are located in different positions on different pages, it is difficult or impossible to use Basic Rules so you have to use Advanced Rules in these cases. You can specify six types of rules:
It is recommended to use the first two types of rules for links whose text remains the same on different pages. For example, Next, More, Details, etc. It is recommended to use the other two types of rules if the necessary links have some common part in their URLs (to see the URL of a link, move the mouse pointer over it and the URL will appear in the status bar at the bottom of the browser window). For example, for these links:
you can specify the following rule: "Follow links if URL contains: /Category.php; /ProductDetails.php"
For these links:
you can specify the following rule: "Follow links if URL contains: /artist/".
You can type the patterns manually (separating them with a semicolon) or select them from the list generated by the program, but this list includes not all possible patterns but only the most obvious ones. To select patterns from the list, click the "Edit" button and after the page is loaded in the new window click one of its links. If the list contains the corresponding pattern, its checkbox will become selected and all links corresponding to this pattern will be automatically highlighted. If the list does not contain the corresponding pattern, you should add it manually by clicking the "Add" button.
Options: