Príklad - Články z BBC news

Do HTML je extrahovaný prvý článok a podobné odkazy zo stránky http://news.bbc.co.uk/. Výstup sa obnovuje každých 15 minút.

 

Výstup:

Egyptians vote in landmark poll

Polling stations close on the first of two days of Egypt's first free presidential election, 15 months after Hosni Mubarak was ousted.

Related articles:

Zdrojový kód skriptu:

# File: bbc_main.w
# Name: BBC News live headlines
# Description: HTML output retrieves first article from www.bbcnews.com
# Input: URL [http://news.bbc.co.uk]
# Output format: HTML file
# Output fields: Source URL, Link, Title, Description

#<Logger File>
#	Global
#	FileName bbc_log.log
#	Level debug
#</Logger>

<Section>
    Name bbc_main
	
    Define $output_file bbc_output.html

    # clean output file
    <Action Print>
        FileName {$output_file}
 	 FileMode Write  
    </Action>
    
    # define variable $url and assign it value
    Define $url http://news.bbc.co.uk
    
    # load content
    <Action ContentURL>
        URL {$url}
        RemoveNewLine
        TagsToStrip br,nobr,b
    </Action>

    <Section>
        Name pattern-articles
    
	<Section Or>
		NoContext
		
		# match top headline with image
		<Pattern>
			RegExp <h2 class="top-story-header ">*<a class="story" rel="{:re([^"]*)}" href="{$link:re([^"]*)}">{$title}<img*</a>*</h2>		  
			Trim
			Compact
      		</Pattern>

		# match top headline without image
		<Pattern>
			RegExp <h2 class="top-story-header ">*<a class="story" rel="{:re([^"]*)}" href="{$link:re([^"]*)}">{$title}</a>*</h2>		  
			Trim
			Compact
      		</Pattern>

		# match top splash headline
		<Pattern>
			RegExp <h2 class=" splash-header">*<a class="story" rel="{:re([^"]*)}" href="{$link:re([^"]*)}">{$title}</a>*</h2>		  
			Trim
			Compact
      		</Pattern>
	</Section>

	# match description for top headline
	<Pattern>
		RegExp <p>{$desc}
		Trim
		Compact
       </Pattern>

        # print parsed data
        <Action Print>
            FileName {$output_file}
            Text <p><h1><a href="{$url}{$link}">{$title}</a></h1></p>\n<p>{$desc}</p>\n<p><h1>Related articles:<h1></p>\n<p><ul>\n
        </Action>

        # find all newa-references
        <Section While>
            Optional
            Name news-references
	    EndAt </ul>
			
            # match news references
            <Pattern>
                RegExp <li{:re([^>]*)}>*<a class="story" rel="{:re([^"]*)}" href="{$link_url:re([^"]*)}">{$link_title}<
		Trim
		Compact
            </Pattern>

            <Action Print>
                FileName {$output_file}
                Text <li><a href="{$main_url}{$link_url}">{$link_title}</a></li>\n
            </Action>
			
        </Section>

        # print html footer
        <Action Print>
            FileName {$output_file}
            Text </ul><p>\n
        </Action>
    </Section>

</Section>

Main bbc_main  
© 2004-2012 QualityUnit.com, All rights reserved