Here is what you need to be able to produce the movie page. ----------------------------------------------------------- afs access to /afs/cs/user/clamen/misc/movies/scheds/ a working installation of Ruby a working installation of the RexML library for Ruby (the tar ball for this is in /afs/cs/user/clamen/misc/movies/scheds/rexml.tgz just unpack and run rexml/install.rb it installed in /usr/local/lib/site_ruby/1.6/rexml by default for me) a working installation of lynx a copy of city paper is very useful Here are instructions for producing the movie page for the week [old method]. ---------------------------------------------------------------- * modify start_month,start_day,end_month,end_day near the top of the gen/grab_data.rb script to match the dates for the week (these dates are inclusive) * Make a new directory to put all of the files in. mkdir html/- (ex. mkdir html/11-16) * change to that directory cd html/- * grab the data with the following command (this will make a _bunch_ of html files in the current directory using lynx) source ../../gen/lynx_lines This _MUST_ be done on Friday (around noon seems to work) so that you pick up the right days from yahoo. * make sure you have afs tokens * run the script ../../gen/grab_data.rb --grab It should take about 15-20 seconds to run on a 400MHz PII and should be outputting things as it goes about what movie theater/source combo it is parsing. It will produce 2 files, output_com and output_ind. output_ind - this file contains the parsed data for all of the theaters formatted to be put on the movie page. The data source for each theater is listed in [] next to the name (yh=yahoo, cp=city paper, cm=cmu [afs file]) output_com - this file contains two entries for each theater The first entry is the scripts best guess at the times for this theater and is labelled with all of the sources that went into forming this guess. Missing sources usually means out of date source data. The second entry contains all of the differences between the common best guess and each of the individual sources. It is labelled as [diff]. Each line is proceeded by a source name and either adds/dels. Adds means that these times are present in this source but not in the common best guess. Dels means these times are not present in this source but are present in the common best guess. Times are considered different if they have different days associated with them or different parsing notes. * ??check diffs in output_com for problems with days of week for city paper sources It seems like the city paper sources on the web don't always have the right days of the week associated with *,bold,parenthesis times. You can override this in the script if necessary. * rerun script if you override day classes * copy output_com to output_f cp output_com output_f * fix output_f For each theater you need to look at the diffs and decide which version should be incorporated into the common output. Parse errors are indicated with ERROR in the output. These need to be resolved as well. output_com and output_ind can be useful references here. Delete the differences and difference sections as they are resolved. The times are output with days in ()s, f=fri, s=sat, u=sun, m=mon, t=tue, w=wed, h=thu. Notes from the parser are put in {}s. You can add notes for times by putting the notes in []. You can add notes for movies by adding 'note:blah' before the affected movie. * run 'grab_data.rb --format' to produce movies2.ht Update done to show the movies done. * fix headers in movies2.ht Fix the expire date, the date range, and the currently listed theaters at the top of movies2.ht. Fix the updater at the bottom of movies2.ht. If you don't plan on updating anymore for this week, uncomment the This is the last planned update line in movies2.ht. *** The rest of the process is currently the same as it used to be. *** * make the web pages make * check movies2.html/reverse2.html for formatting Checking movies2.html: (change in movies2.ht) Reformat lines that don't fit in a 1024 pixel wide netscape window using ^| so that they fit. Reformat lines with movie names that are long enough to spill into the times using ^|. Checking reverse2.html: (change in movies2.ht) Check for the same movie being referred to by multiple names (this is pretty likely). Search and replace to fix problems. You will need to 'make' after making changes to see the results of your edits and possibly need to shift-reload the page in netscape. * push the page out make push Here are instructions for producing the movie page for the week [old method]. ---------------------------------------------------------------- * modify start_month,start_day,end_month,end_day near the top of the gen/grab_data.rb script to match the dates for the week (these dates are inclusive) * Make a new directory to put all of the files in. mkdir html/- (ex. mkdir html/11-16) * change to that directory cd html/- * grab the data with the following command (this will make a _bunch_ of html files in the current directory using lynx) source ../../gen/lynx_lines This _MUST_ be done on Friday (around noon seems to work) so that you pick up the right days from yahoo. * make sure you have afs tokens * run the script ../../gen/grab_data.rb It should take about 15-20 seconds to run on a 400MHz PII and should be outputting things as it goes about what movie theater/source combo it is parsing. It will produce 2 files, output_com and output_ind. output_ind - this file contains the parsed data for all of the theaters formatted to be put on the movie page. The data source for each theater is listed in [] next to the name (yh=yahoo, cp=city paper, cm=cmu [afs file]) output_com - this file contains two entries for each theater The first entry is the scripts best guess at the times for this theater and is labelled with all of the sources that went into forming this guess. Missing sources usually means out of date source data. The second entry contains all of the differences between the common best guess and each of the individual sources. It is labelled as [diff]. Each line is proceeded by a source name and either adds/dels. Adds means that these times are present in this source but not in the common best guess. Dels means these times are not present in this source but are present in the common best guess. Times are considered different if they have different days associated with them or different parsing notes. * ??check diffs in output_com for problems with days of week for city paper sources It seems like the city paper sources on the web don't always have the right days of the week associated with *,bold,parenthesis times. You can override this in the script if necessary. * rerun script if you override day classes * copy output_com to output_f cp output_com output_f * fix output_f For each theater you need to look at the diffs and decide which version should be incorporated into the common output. Parse errors are indicated with ERROR in the output. These need to be resolved as well. output_com and output_ind can be useful references here. Delete the differences and difference sections as they are resolved. * copy data to movies2.ht Copy the formatted data to movies2.ht. The data in the file is in the same order as the movies in movies2.ht so this is just a bunch of cutting and pasting. Update done to show the movies copied. * fix headers in movies2.ht Fix the expire date, the date range, and the currently listed theaters at the top of movies2.ht. Fix the updater at the bottom of movies2.ht. If you don't plan on updating anymore for this week, uncomment the This is the last planned update line in movies2.ht. *** The rest of the process is currently the same as it used to be. *** * make the web pages make * check movies2.html/reverse2.html for formatting Checking movies2.html: (change in movies2.ht) Reformat lines that don't fit in a 1024 pixel wide netscape window using ^| so that they fit. Reformat lines with movie names that are long enough to spill into the times using ^|. Checking reverse2.html: (change in movies2.ht) Check for the same movie being referred to by multiple names (this is pretty likely). Search and replace to fix problems. You will need to 'make' after making changes to see the results of your edits and possibly need to shift-reload the page in netscape. * push the page out make push