Web Content Extractor Documentation

Script Wizard

Script Wizard allows you to create a script for exctracting a substring without having to write its code.

To generate script you should select the part of the text you want to extract and click the "Generate Script" button. The program will generate some script and show its result at the bottom of the window. If this result does not coincide with the substring you wanted to extract, you need to change the script parameters by clicking "Set Script Parameters" button. In the "Predefined Script" window you can change the parameters of the script functions, add new functions, remove the ones you don't need, duplicate the existing function and change the order in which these functions are executed.

To change the parameters of the script functions, select function and click "Edit" button.

 

You can use the following functions:

  1. Add_String_At_The_Beginning - this function adds string at the beginning of a specific string.

  2. Add_String_To_The_End - this function adds string to the end of a specific string.

  3. Download _URL - this function downloads a specific URL.

  4. Extract_Email_Addresses - this function extracts email addresses from a specific string using pattern matching (a regular expression).

  5. Extract_Phone_Numbers - this function extracts phone numbers from a specific string using pattern matching (a regular expression).

  6. Left_String - this function extracts the substring starting from the first character and ending with the substring specified in the strSearchFor parameter. The strReverseSearch parameter specifies the search order and can have one of the following values: 0 - the search begins from the beginning of string, 1 - the search begins from the end of string.

  7. Replace_String - this function replaces the substring specified in the strFindWhat parameter with the substring specified in the second strReplaceWith parameter.

  8. Replace_String_RegEx - this function replaces all strings that match a regular expression pattern with a specified replacement string. The pattern parameter specifies the pattern of an expression. The replacement is the replacement string.

  9. Right_String - this function extracts the substring located between the substring specified in the strSearchFor parameter and the end of the string. The strReverseSearch parameter specifies the search order and can have one of the following values: 0 - the search begins from the beginning of string, 1 - the search begins from the end of string.

  10. Split_String - this function extracts the substring with the index specified in the strItemIndex parameter from the array of strings resulting from dividing the initial string into substrings with the help of the delimiter specified in the strDelimiter parameter.

  11. Strip_HTML_Tags - this function removes all HTML tags from a specific string.

  12. Sub_String - this function extracts the substring located between the substring specified in the strStartSearchFor parameter and the substring specified in the strEndSearchFor parameter. The strReverseSearch parameter specifies the search order and can have one of the following values: 0 - the search begins from the beginning of string, 1 - the search begins from the end of string.

  13. Sub_String_RegEx - this function extracts string from a specific string using regular expression (Regex). The "strPattern" parameter specifies the pattern of an expression.

  14. Trim_String - this function removes character specified in the strSearchFor parameter on both sides of a string.

Available constants:

  • \r\n - the new line character used in scripts written in Javascript.

  • \r\n\r\n - double new line characters used in scripts written in Javascript.

  • vbNewLine - the new line character used in scripts written in Basic.

  • vbNewLine + vbNewLine - double new line characters used in scripts written in Basic.

 

You can create your own script. To do it, you should click the "Edit Script..." button. You will see the "Script Editor" window where you can create and edit scripts.