Dave Eckhardt's RSS-monitoring-via-cron page




How can I monitor an RSS feed using a cron job?

Here is a sample line for your crontab:

02 12 * * * /usr0/home/XXXXXXX/bin/watchPolicies

And here is watchPolicies (quick & dirty; depends on the server returning ETag and/or Last-Modified):

#!/usr/bin/env python
# pip || apt-get install python-pip
# pip install --user feedparser

url = 'https://www.cmu.edu/policies/news/rss-feeds/news-rss.rss'

# This uses json... could use pickle, shelve, klepto, ...

import sys
import os
import feedparser
import json
import textwrap

statefilename = os.path.join(os.environ['HOME'],'.watchPolicies.json')

try:
    with open(statefilename, 'r') as f:
        predicates = json.load(f)
except:
    predicates = {}

feed = feedparser.parse(url, request_headers={'Cache-control': 'max-age=14400'}, **predicates)

if feed.bozo:
    print "***** trouble with %s" % url
    if 'status' in feed:
        print "***** HTTP status %d" % feed.status
    sys.exit(9)

if feed.status == 304:
    # "not modified", so we are done
    sys.exit(0)

newest = len(feed.entries) - 1
entry = feed.entries[newest]

print ''
print '***** Most-recent item *****'
print ''
print entry.title
print ''
print textwrap.fill(entry.summary)
print ''
print entry.links[0]['href']
print ''
print entry.published
print ''
#print feed

wanted_predicate_keys = ['etag', 'modified']
newpredicates = dict((k, feed[k]) for k in wanted_predicate_keys if k in feed)

with open(statefilename, 'w') as f:
    json.dump(newpredicates, f, 0)

sys.exit(2)

# https://packaging.python.org/tutorials/installing-packages/
# https://docs.python.org/3/installing/index.html
# https://pip.pypa.io/en/stable/reference/pip_install/
# https://pythonprogramming.net/python-pickle-module-save-objects-serialization/
# https://stackoverflow.com/a/19201448
# https://stackoverflow.com/a/26057360


Best viewed with any browser Proud Donor
davide+receptionist@cs.cmu.edu