Web Content Extractor Documentation

Menu Bar

Provides access to all Web Content Extractor commands.

The following items are available in the Web Content Extractor main menu:

File

  • Sub-item New Project opens the New Project Wizard window.
  • Sub-item Open Project opens a dialog to select a previously saved extraction project. After you have selected one, the extraction settings will be loaded.
  • Sub-items Save Project and Save Project As allow you to save the project for future usage.
  • Sub-items Upload Project To Web allows you to upload the project to web scraping platform and run the scraping tasks in the cloud
  • Sub-item Compact Project allows you to compact the project file. The compact process does not compress your data — it makes your database file smaller by eliminating unused space.
  • Sub-item Recent Project shows a list of the projects that have beeen recently worked with.
  • Sub-item Exit closes the application.

View

  • Sub-item Project URL View allows you to show only first 1,000 URLs in Web Scraper URLs view.
  • Sub-item Results View allows you to show only first 1,000 records in Results view.
  • Sub-item Toolbar allows you to choose whether to display Toolbar in the program window.
  • Sub-item Status Bar allows you to choose whether to display Status Bar in the program window.

Project

  • Sub-item Start starts the extraction process.
  • Sub-item Stop stops the extraction process.
  • Sub-item Pause suspends the extraction process which can be later resumed from the exact spot where it was paused.
  • Sub-item Delete All URLs allows to delete all the URLs included in the current project.
  • Sub-item Reset All URLs allows to reset all the URLs included in the current project.
  • Sub-item Reset Failed URLs allows to reset the failed URLs.
  • Sub-item Reset Empty URLs allows to reset the downloaded URLs with no data and no links.
  • Sub-item Find URL allows you to find and select a URL that match a search.
  • Sub-item Find Next URL allows you to find and select for the next occurrence.
  • Sub-item Export All URLs allows you to export all URLs to a file.
  • Sub-item Export Failed URLs allows you to export only failed URLs to a file.
  • Sub-item Properties opens the project properties window.

Results

  • Sub-item Export allows you to export the extracted data into a file. You can specify the file format. Saving is impossible when extraction process is running. For more details please refer to the Export Data section.
  • Sub-item Quick Export allows you to immediately export the extracted data using the last export settings.
  • Sub-item Filter includes two sub-sub-items: Edit - which opens the dialog where you can edit the conditions of results filter and Apply/Remove Filter (the tick shows that the filter is applied).
  • Sub-item Sort allows you to sort the extracted data by one of the parameters (columns). When you click one of the column names a tick appears next to it showing that the table is sorted by this parameter.
  • Sub-item Resolve Redirect URL allows you to get the 'final' redirected URL.
  • Sub-item Download Images/Files allows you to download the field URLs.
  • Sub-item Rename Images/Files allows you to rename the downloaded image files.
  • Sub-item Update Field Text allows you to update the extracted data by using the Text Transformation Script.
  • Sub-item Delete All allows you to delete all the extraction results.

Tools

  • Sub-item Login To Website opens browser window that allows you to login to website or enter captcha text.
  • Sub-item WCE scheduler opens WCE scheduler window.
  • Sub-item Settings opens settings window.

Help

  • Sub-item Contents launches Web Content Extractor Help system.
  • Sub-item About provides information about Web Content Extractor program.