HttpGet Documentation - Version 2.1.0 Copyright 2001-2003, David G. Holm, Berrien Springs, Michigan, USA. HttpGet is a dual-mode Java application that gets a web page from a web host and saves it with certain changes, allowing you to save a static copy of a web page for later viewing, allowing you to download the specific files that the web page references when you go to view the page. HttpGet is intended for use with web sites with a regular publication schedule for files or images who's names vary with each publication. HttpGet works with both Java 1 and Java 2. Command Line Interface Mode: Java 1 Syntax: jre -cp .;HttpGet.jar Main host [path [port [test]]] Java 2 Syntax: java -jar HttpGet.jar host [path [port [test]]] Where host is the host part of the URL to get, path is the path part of the URL to get (the default is /), port is the host port (the default is 80), and test generates debug output and gets the web page without translation. Here's an example that saves the web page at http://www.sluggy.com/daily.php to file using Java 2: java -jar HttpGet.jar www.sluggy.com /daily.php > sluggy.html Unless the optional test command line parameter is used, HttpGet makes the following changes to the downloaded web page: 1) If the web page does not have a BASE HREF tag, then HttpGet adds one to the HEAD section using the combined host and page values. For the earlier example, the tag ends up as: 2) All SCRIPT sections are removed from the downloaded page. 3) All onload, onclose, and onexit event names are removed from all BODY and FRAMESET tags. Graphical User Interface Mode: Java 1 Syntax: jre -cp .;HttpGet.jar;swing.jar Main Java 2 Syntax: java -jar HttpGet.jar When HttpGet starts up in GUI mode, it loads the contents of 'schedule.dat' into a four-column table with the headings "Name" (a descriptive name for a web page), "Address" (the address of a web page), "Port" (the web server port number to use, with a default of 80), "Schedule" (see below for details), and "Status" (one of "Setup", "Connect", "Fetch", "Convert", "Saving", "Done", "Stopping", "Stopped", or "Failed".). The table will be empty if the file 'schedule.dat' does not exist. There are six (6) buttons, named Fetch, Add, Change, Save, Delete, and Exit, with a selection status field located between the Delete and Exit buttons. The "Schedule" field is a positional field representing the seven days of the week, starting with Sunday. An "X" (or an "x") in any position indicates that the web site is scheduled for retrieval on that day. Use any other characters (other than a space, because the field is space trimmed when it is saved and retrieved) to indicate an unscheduled day. You do not have to fill the field out to seven characters (e.g., "X" means retrieve on Sunday only and "_X_X_X" means to retrieve on Monday, Wednesday, and Friday). The Fetch button fetches the contents of the chosen web page(s) or all of the web pages if none are chosen. The web pages are written to numbered files in the 'html' subdirectory (which must exist - HttpGet will not create it). The numbers correspond to the position of the web page in the table. For example, the fifth web page in the table will be written to 'html/04.htm' (if chosen, or if all scheduled pages are being saved). If no web pages are chosen, or if all of the web pages are chosen, then HttpGet also creates a file named 'all.html' in the main directory, with links to all of the files in the 'html' subdirectory. This file is designed to be used with the 'index.html' file that is included in the HttpGet.ZIP archive and sets up two frames: A left frame for the 'all.html' list and a right frame for each web page (this frame is initially loaded with the file 'help.html'). If you stop a fetch of all web sites before fetch completes, then the 'all.html' file will only have links to the web pages prior to the one that shows the "Stopped" status. A total of five (5) passes are made through the table. On the first pass, an attempt is made to fetch each (or each chosen) web page. On subsequent passes, only the failed web sites are attempted. This maximizes the chances of fetching all (or all chosen) web pages successfully. The Add button adds one or more blank rows to the table. If one or more rows are selected, a blank row is inserted ahead of each selected row. Otherwise, a blank row is appended to the end of the table. Double click on the name column to add a site name (for example, "Sluggy Freelance"). Double click on the address column (or tab over to it) to add a web page address (for example, "www.sluggy.com/daily.php"). You do not need to include an "http://" prefix. The Change button brings up a dialog that lets you to edit the Name, Address, and Port values for the highlighted web page (or the first one if more than one is highlighted). In addition to the three fields, this dialog has an OK button and a Cancel button. The OK button accepts the changes without any prompting, unless the port value is not an integer, in which case a warning dialog is displayed and the change dialog remains open. The Cancel button gets rid of any changes without prompting, but if you use the window close button after changes have been made to the web page settings, a save prompt is displayed. The Save button saves the contents of the table to the file 'schedule.dat', using an intermediate file named 'schedule.da@' in order to reduce the risk of data loss. (HttpGet first saves the schedule to the intermediate file, then deletes the original file, and then finally renames the intermediate file to the original file name). The Delete button deletes all of the selected rows. There is no confirmation, because the deletion isn't permanent until you use the Save button. The selection status field displays how many table rows are selected. The Exit button exits the HttpGet program, unless the table has been changed, in which case a confirmation prompt is displayed. Choosing Yes exits without saving the changes. Choosing No keeps the program running. Note: For Fetch, Add, and Delete, multiple consecutive and non-consecutive selections are possible. Use Click and Shift+Click to select the first (or only) consecutive range. Use Ctrl+Click for lone non-consecutive selections. Use Ctrl+Click and Ctrl+Shift+Click for additional consecutive ranges. The columns can be moved around and the window can be resized, but the new positions and sizes are not saved and the window always starts up with the same initial size and column positions. Double-clicking on a row has the same effect as selecting that one row and then clicking on the Fetch button. Right-clicking on a row has the same effect as selecting that one row, but has the additional effect of bringing up a context menu, from which you can choose any one of Add, Change, Delete, and Fetch, which results in the same action as selecting the one row and then clicking on the corresponding button. Command Line Interface Schedule Mode: Java 1 Syntax: jre -cp .;HttpGet.jar Main -schedule Java 2 Syntax: java -jar HttpGet.jar host -schedule When HttpGet starts up in CLI schedule mode, it loads the contents of the file 'schedule.dat' and processes it as if you were in the GUI mode and had activated the "Fetch" button with no web pages selected, which fetches only the web pages scheduled to be fetched today. But instead of operating in GUI mode, the program operates in CLI mode and sends all status messages to the stdout device. CSV format description for schedule.dat file: The schedule.dat file is a comma-separated values file, with quote marks around each data field, commas separating consecutive data fields, and either a single LF character or a CR LF character pair terminating each data record. Each record is one entry in the schedule table when using the GUI mode. The first data field is the name of the web site. The second data field is the URL of the web site, the third data field is the optional port used to access the web site (if a port value is not specified, then port 80 is used). The fourth data field is the web site access schedule (see the descripton of the "Schedule" field in the GUI mode section for details). The fifth data field is the last status that was assigned to the web site. When adding a new record to the file for use with the command line interface schedule mode, leave this data field blank.