PHP Web Grabber

PHP Web Grabber

Advanced PHP tag based web extractor engine

Grabbing a Website Template List from FreeWebsiteTemplates.com

The Need

Supposing you are web programmer and want to provide your customers a list of website templates to choose from.
The templates list has to be displayed within your own website.
This can be done in many ways such as:

  • displaying the list using a self made engine,
    not so good solution: short and long therm hard work to do: you have to develop, mantain the engine and feed the list by yourself
  • by providing links to some sites from where your customer can choose a design,
    not so good solution: your commercial image might suffer a drop down because you were not able to display that list onto your site in a centralized manner along with the other services you provide
  • or, you can use a service type API that can help you to grab, extract and embed a HTML content from a public website into your own site (even a website templates list)
    ideal solution: you don't have to develop and mantain nothing, fedding the templates list is not your responsibility, and the template list is displayed within your own site

The Solution

Grabbing, extractig and embedding HTML content form websites are the things that WiseLoop PHP Web Grabber is about.
WiseLoop PHP Web Grabber is a set of PHP classes designed to extract HTML content form the web.
This package allows complex content extraction in a flexible manner, by using only a few lines of code.
The extraction can be made from any web URL or local file; the desired content to be grabbed can be a full web page or a set of tags that can even have incomplete specifications.
For more information please checkout the product page at http://wiseloop.com/product/php-web-grabber

The Implementation

Step 1: Install the WiseLoop PHP Web Grabber package

  • Step 1.1: make a folder named /php-web-grabber on your Web server;
  • Step 1.2: copy entire /bin and /cache folders to the new created /php-web-grabber folder;
  • Step 1.3: make sure that the /php-web-grabber/cache directory is writable;
  • Step 1.4: include /bin/wlWg.php in your application.

Step 2: Checkout the FreeWebsiteTemplates.com site and see what content has to be grabbed and displayed

The FreeWebsiteTemplates.com site shows:

tut-fwtdc-screenshot.jpg
FreeWebsiteTemplates.com screenshot
tut-fwtdc-source.jpg
FreeWebsiteTemplates.com view-source

After a careful study of the source code, you are able to identify the tag that contains the actual list of templates that you want to extract and display it inside your website.
The targeted tag will be:

 <div id="leftside">

Obviously this tag contains a lots of other stuff that you don't need and don't want to be displayed in your website such as:

  • the "Free Website Templates" header text,
  • the upper paragraph starting with "Website templates are pre-designed websites ...",
  • the upper pages links,
  • the Template of the day header,
  • the bottom commercial banner,
  • the bottom Previous and Next links,
  • the bottom paragraph starting with "All free website templates have been coded ..."

Also, there are some styles and CSS clases that will not make any sense once you have displayed that content into your website; most likely you have not defined such CSS styles into your site CSS:

  • class="h2"
  • class="ss"
  • class="download"
  • class="preview"
  • class="getwix"
  • class="templateleft"
  • class="templateright"

You will use the PHP Web Grabber tag based extraction engine to grab the targeted tag contents, the tag removal feature to remove the unwanted content and the string replacement feature to modify the CSS styles accordingly to your website CSS styles.

tut-fwtdc-source-map.jpg
Green: what to extract; Red: what to remove; Blue: what to replace

Step 3: Use the WiseLoop PHP Web Grabber in your application

All you need to do is to create a wlWgProcessor object and pass to it some needed parameters consisting of:

  • the url page address to be processed,
  • the tag to be extracted,
  • the tags to be removed from the extracted content,
  • the string replacements to be made on the extracted content
  • the caching time: for how long the extracted content will be loaded from a local cache instead of reloading it from the real url (this is a feature that saves bandwidth and improves speed)

Your code should be something like this:

$grabber = new wlWgProcessor(
    "http://www.freewebsitetemplates.com/",                     //the targeted real url
    new wlWgParam(
        '<div id="leftside">',                                  //the desired tag to be extracted
        array(
            "search" => array(                                  //needles ...
                'class="title"',
                'class="ss"',
                'class="download"',
                'class="preview"',
                'class="getwix"',
                'class="templateleft"',
                'class="templateright"',
            ),
            "replace" => array(                                 //replaces ...
                'class="your-title-class"',
                'class="your-ss-class"',
                'class="your-download-class"',
                'class="your-preview-class"',
                'class="your-getwix-class"',
                'class="your-templateleft-class"',
                'class="your-templateright-class"',
            )
        ),
        array(                                                  //remove tags and their contents that contains ...
            '<h1>',                                             //all the <h1> tags including the Free Website Templates header text
            '<div class="pages"',                               //the pages links
            '<div class="about">',                              //the upper paragraph starting with "Website templates are pre-designed websites ..."
            '<div style="clear:',                               //some empty div tag: note that this tag is incomplete, it will remove <div style="clear:both;"> and <div style="clear">
            '<div class="clear">',                              //some empty div tag
            '<div style="margin-left:31px;display:block;">',    //the Previousa, Next links and the bottom paragraph starting with "All free website templates have been coded ..."
            '<div class="templatedaily">'                       //the Template of the day header
        )
    ),
    wlWgConfig::CACHE_TIME_1_WEEK                               //the caching time (expressed in minutes)
);
$grabber->draw();                                               //print out the extracted processed content

Fine Tunning and Future Developments

At this point you can realize that WiseLoop PHP Web Grabber is a powerfull tool that can help you develop complex API-s for various service-type websites.
You could wrap the code above into a nice class that can behave like a native API for FreeWesiteTemplates.com.

class wlWgFreeWebsiteTemplatesDotCom extends wlWgProcessor {
    public function __construct() {
        parent::__construct(
            "http://www.freewebsitetemplates.com/",                     //the targeted real url
            new wlWgParam(
                '<div id="leftside">',                                  //the desired tag to be extracted
                array(
                    "search" => array(                                  //needles ...
                        'class="title"',
                        'class="ss"',
                        'class="download"',
                        'class="preview"',
                        'class="getwix"',
                        'class="templateleft"',
                        'class="templateright"',
                    ),
                    "replace" => array(                                 //replaces ...
                        'class="your-title-class"',
                        'class="your-ss-class"',
                        'class="your-download-class"',
                        'class="your-preview-class"',
                        'class="your-getwix-class"',
                        'class="your-templateleft-class"',
                        'class="your-templateright-class"',
                    )
                ),
                array(                                                  //remove tags and their contents that contains ...
                    '<h1>',                                             //all the <h1> tags including the Free Website Templates header text
                    '<div class="pages"',                               //the pages links
                    '<div class="about">',                              //the upper paragraph starting with "Website templates are pre-designed websites ..."
                    '<div style="clear:',                               //some empty div tag: note that this tag is incomplete, it will remove <div style="clear:both;"> and <div style="clear">
                    '<div class="clear">',                              //some empty div tag
                    '<div style="margin-left:31px;display:block;">',    //the Previousa, Next links and the bottom paragraph starting with "All free website templates have been coded ..."
                    '<div class="templatedaily">'                       //the Template of the day header
                )
            ),
            wlWgConfig::CACHE_TIME_1_WEEK                               //the caching time (expressed in minutes)
        );
    }
}
$grabber = new wlWgFreeWebsiteTemplatesDotCom();                        //create the FreeWebsiteTemplates API object
$grabber->draw();                                                       //print out the extracted processed content

Please chekout the full example provided in the package that supports the pagination feature also.

Regular License $10.00
Use by you or one client, in a single end product which end users are not charged for.

Extended License $50.00
Use by you or one client, in a single end product which end users can be charged for.

Short Information

If you want to retrieve various contents from a public web site and display them on your site, PHP Web Grabber is the perfect tool that will help you do that.
No frames or iframes involved! Real HTML content, grabbed form the web and displayed in whithin your webpage just like it was generated by your site.

Buyer rating:
401 Sales