Do HTML je extrahovaný prvý článok a podobné odkazy zo stránky http://news.bbc.co.uk/. Výstup sa obnovuje každých 15 minút.
Výstup:
Polling stations close on the first of two days of Egypt's first free presidential election, 15 months after Hosni Mubarak was ousted.
Zdrojový kód skriptu:
# File: bbc_main.w
# Name: BBC News live headlines
# Description: HTML output retrieves first article from www.bbcnews.com
# Input: URL [http://news.bbc.co.uk]
# Output format: HTML file
# Output fields: Source URL, Link, Title, Description
#<Logger File>
# Global
# FileName bbc_log.log
# Level debug
#</Logger>
<Section>
Name bbc_main
Define $output_file bbc_output.html
# clean output file
<Action Print>
FileName {$output_file}
FileMode Write
</Action>
# define variable $url and assign it value
Define $url http://news.bbc.co.uk
# load content
<Action ContentURL>
URL {$url}
RemoveNewLine
TagsToStrip br,nobr,b
</Action>
<Section>
Name pattern-articles
<Section Or>
NoContext
# match top headline with image
<Pattern>
RegExp <h2 class="top-story-header ">*<a class="story" rel="{:re([^"]*)}" href="{$link:re([^"]*)}">{$title}<img*</a>*</h2>
Trim
Compact
</Pattern>
# match top headline without image
<Pattern>
RegExp <h2 class="top-story-header ">*<a class="story" rel="{:re([^"]*)}" href="{$link:re([^"]*)}">{$title}</a>*</h2>
Trim
Compact
</Pattern>
# match top splash headline
<Pattern>
RegExp <h2 class=" splash-header">*<a class="story" rel="{:re([^"]*)}" href="{$link:re([^"]*)}">{$title}</a>*</h2>
Trim
Compact
</Pattern>
</Section>
# match description for top headline
<Pattern>
RegExp <p>{$desc}
Trim
Compact
</Pattern>
# print parsed data
<Action Print>
FileName {$output_file}
Text <p><h1><a href="{$url}{$link}">{$title}</a></h1></p>\n<p>{$desc}</p>\n<p><h1>Related articles:<h1></p>\n<p><ul>\n
</Action>
# find all newa-references
<Section While>
Optional
Name news-references
EndAt </ul>
# match news references
<Pattern>
RegExp <li{:re([^>]*)}>*<a class="story" rel="{:re([^"]*)}" href="{$link_url:re([^"]*)}">{$link_title}<
Trim
Compact
</Pattern>
<Action Print>
FileName {$output_file}
Text <li><a href="{$main_url}{$link_url}">{$link_title}</a></li>\n
</Action>
</Section>
# print html footer
<Action Print>
FileName {$output_file}
Text </ul><p>\n
</Action>
</Section>
</Section>
Main bbc_main