We need to create an image gallery hosted on our server but using the pictures provided by flickr.
Both the thumbs and the full size images will be grabbed from the flickr and downloaded to our localhost using PHP Web Image Grabber class from the WiseLoop PHP Web Media Grabber package.
The image gallery will be displayed in within our website and nobody will know that the pictures are coming from flickr.
Grabbing and downloading media web resources form websites are the things that WiseLoop PHP Web Media Grabber is about.
WiseLoop PHP Web Media Grabber is a set of PHP classes designed to grab, extract or even download web media files the web such as images, videos, audios, flash files, documents, javascript sources, css stylesheet files etc.
This package allows complex media extraction in a flexible manner, just by using only a few lines of code.
The extraction is made from any given web URL that contains or refers media files using web links (a href tags), various tag attributes (src, embed, param, movie etc.) or even inline css styling attributes (such as background images); also, the media grabbing engine is able to identify more than the obvious media resources having the most common file extensions - it will find the media generated dynamically by the servers or media files that have no valid extensions or no extensions at all (such as images generated at runtime by the web servers); the identification is made by checking the server response header when pinging the tested media resource.
For more information please checkout the product page at http://wiseloop.com/product/php-web-media-grabber
Lets say that we want our picture gallery to be about school; after doing an image search with the keyword school flickr.com shows:
By doing so, the web browser reveals the url to be parsed to the grabbing engine in order to extract the pictures:
http://www.flickr.com/search/?q=school&f=hp
We want to grab and download to our host all the pictures presented here: their thumbs and their full size image that hides behind the thumbs links.
We will use the PHP Web Image Grabber class to grab and download the targeted url images, and then some short PHP custom code to nicely display the gallery.
All we need to do is to create a wlWmgImageGrabber object and pass to it the url page address to be processed.
For the start, our code should be something like this (we will now focus only on grabbing and leave the gallery displaying for later - we will print out the results as an array for now):
require_once dirname(__FILE__)."/../bin/wlWmg.php"; //including the WiseLoop PHP Web Media Grabber Package: use your installation path here $url= 'http://www.flickr.com/search/?q=school&f=hp'; //the url $ig = new wlWmgImageGrabber($url); //creating the image grabber object $ig->grab(); //the grab command echo '<pre>'.print_r($ig->getValidMediaTable(true), true).'</pre>'; //displaying the result array
After running the code above we obtain:
Quite nice, but there are some issues:
Ok, so let's start with first issue:
If we do not need all media to be extracted from a page we need to setup some filters and add them to the grabber filter list. To create a filter we must instantiate the wlWmgFilter class by calling its constructor with some arguments (see the class documentation). Also, some filters can be automatically added to the grabber object by passing some additional sets of values to the grabber constructor.
Back to our case, we notice that all the needed picture files have ".jpg" extension. So let's use that in order to filter the results:
require_once dirname(__FILE__)."/../bin/wlWmg.php"; //including the WiseLoop PHP Web Media Grabber Package: use your installation path here $url= 'http://www.flickr.com/search/?q=school&f=hp'; //the url $ig = new wlWmgImageGrabber($url, '.jpg'); //creating the filtered(by .jpg extension) image grabber object $ig->grab(); //the grab command echo '<pre>'.print_r($ig->getValidMediaTable(true), true).'</pre>'; //displaying the result array
or, if we want more than one extension (use an array):
require_once dirname(__FILE__)."/../bin/wlWmg.php"; //including the WiseLoop PHP Web Media Grabber Package: use your installation path here $url= 'http://www.flickr.com/search/?q=school&f=hp'; //the url $ig = new wlWmgImageGrabber($url, array('.jpg', '.jpeg')); //creating the filtered(by .jpg and .jpeg extensions) image grabber object $ig->grab(); //the grab command echo '<pre>'.print_r($ig->getValidMediaTable(true), true).'</pre>'; //displaying the result array
or, if we want to use the wlWmgFiter objects explicitly:
require_once dirname(__FILE__)."/../bin/wlWmg.php"; //including the WiseLoop PHP Web Media Grabber Package: use your installation path here $url= 'http://www.flickr.com/search/?q=school&f=hp'; //the url $ig = new wlWmgImageGrabber($url); //creating the image grabber object $extensionFilter = new wlWmgFilter( //creating the extensions filter object array('.jpg', 'jpeg'), wlWmgFilter::TYPE_URL, wlWmgFilter::OPERATOR_CONTAINS ); $ig->addFilter($extensionFilter); //adding the filter to the grabber $ig->grab(); //the grab command echo '<pre>'.print_r($ig->getValidMediaTable(true), true).'</pre>'; //displaying the result array
Yes its better, but there are still some unwanted pictures in our grab results (some of the buddy icons from the right are jpegs).
Ok, let's try something stronger then:
This is a very strong feature: the grabbing engine is able to search for images (or any media) only inside a designated HTML area specified by a tag; in this way you can skip grabbing from the start any unwanted pictures by narrowing the full HTML target page to a smaller area consisting of a tag content; an incomplete tag (tag slice) can be specified also, the tag will auto-complete depending on the contextual HTML content.
In order to do HTML area searching, we need to know a little bit about the HTML DOM structure of our target page.
The page source shows:
After just one minute checking on the page source, we notice that the needed area to be searched is inside the following tag:
<div class="ResultsThumbs" id="ResultsThumbsDiv">
So let's search inside that area only for images:
require_once dirname(__FILE__)."/../bin/wlWmg.php"; //including the WiseLoop PHP Web Media Grabber Package: use your installation path here $url= 'http://www.flickr.com/search/?q=school&f=hp'; //the url $ig = new wlWmgImageGrabber($url, array('.jpg', '.jpeg')); //creating the filtered(by .jpg and .jpeg extensions) image grabber object $ig->setGrabOnlyFromTagSlice('photo-display-container'); //limiting the searching area: only images founded inside that tag will be grabbed $ig->grab(); //the grab command echo '<pre>'.print_r($ig->getValidMediaTable(true), true).'</pre>'; //displaying the result array
Quite smart. Isn't it? Notice that we did'nt specify the full tag definition: we used only a slice: 'photo-display-container'; the grabbing engine was able to identify the full tag depending on the full page HTML source code. This can be very helpful when even the tag that delineates the designated searching HTML area is dynamically generated, but it has some static properties that can lead to its unique identification (such as CSS class, or ID).
Just for fun, let's pretend that we do not need all the images, we need only the first 10 pictures - we'll have a 10 picture counting gallery only.
If we want to limit the number of grabbed media we'll need to setup the limiter like this:
require_once dirname(__FILE__)."/../bin/wlWmg.php"; //including the WiseLoop PHP Web Media Grabber Package: use your installation path here $url= 'http://www.flickr.com/search/?q=school&f=hp'; //the url $ig = new wlWmgImageGrabber($url, array('.jpg', '.jpeg')); //creating the filtered(by .jpg and .jpeg extensions) image grabber object $ig->setGrabOnlyFromTagSlice('photo-display-container'); //limiting the searching area: only images founded inside that tag will be grabbed $ig->setLimit(10); //limiting the grabbed images count: only first 10 images will be grabbed $ig->grab(); //the grab command echo '<pre>'.print_r($ig->getValidMediaTable(true), true).'</pre>'; //displaying the result array
Now we are sure that we have grabbed the needed thumbs.
How about the full size images that hides behind the thumbs?
Let's click on the first image. The web browser shows:
Oops; this is not very good for us. Of course, we (human beings) know what is the full size picture of the followed thumb: obviously it is the bus. But how about a computer program (our PHP script)? Does it know that? There are many images in that pages: the flckr logo, the bus, the buddy icons from the right, some commercial banners, the buddy icons from bottom etc.). How the computer will identify the full size image that it has to extract?
Simple. Assuming this:
The largest picture (in bytes) will be the needed picture. It will check every picture contained by this sub-page and the largest (in bytes) will most likely be the one that we need.
Let's now activate the sub page links following:
require_once dirname(__FILE__)."/../bin/wlWmg.php"; //including the WiseLoop PHP Web Media Grabber Package: use your installation path here $url= 'http://www.flickr.com/search/?q=school&f=hp'; //the url $ig = new wlWmgImageGrabber($url, array('.jpg', '.jpeg')); //creating the filtered(by .jpg and .jpeg extensions) image grabber object $ig->setGrabOnlyFromTagSlice('photo-display-container'); //limiting the searching area: only images founded inside that tag will be grabbed $ig->setLimit(10); //limiting the grabbed images count: only first 10 images will be grabbed $ig->setFollowSubpagesLinks(true); //activating the sub-pages links following $ig->grab(); //the grab command echo '<pre>'.print_r($ig->getValidMediaTable(true), true).'</pre>'; //displaying the result array
Wow! that was real sloooooow! Why is that?
Because the script had to check every sub-page hidden behind every thumb, and for every subpage it had to check every picture contained by that sub-page in order to find the largest (in bytes) one. That can lead up to thousands of url checkings (pages and/or images) that can even reach the PHP timeout limit error.
So, let's fix that ok?
Ok, you already know what this is about.
What if we limit the searched are in the followed sub-pages like we did for the main page at Step 5?
Yes if we know what tag holds the full size image in the followed sub-page the problem is solved.
After under one minute checking on the sub-page source, we notice that the full size image (the bus) is inside the following tag:
<div class="photo-div">
So our PHP script becomes:
require_once dirname(__FILE__)."/../bin/wlWmg.php"; //including the WiseLoop PHP Web Media Grabber Package: use your installation path here $url= 'http://www.flickr.com/search/?q=school&f=hp'; //the url $ig = new wlWmgImageGrabber($url, array('.jpg', '.jpeg')); //creating the filtered(by .jpg and .jpeg extensions) image grabber object $ig->setGrabOnlyFromTagSlice('photo-display-container'); //limiting the searching area: only images founded inside that tag will be grabbed $ig->setLimit(10); //limiting the grabbed images count: only first 10 images will be grabbed $ig->setFollowSubpagesLinks(true); //activating the sub-pages links following $ig->setFollowedSubpageGrabOnlyFromTagSlice('main-photo-container');//limiting the sub-page searching area for finding the full size images $ig->grab(); //the grab command echo '<pre>'.print_r($ig->getValidMediaTable(true), true).'</pre>'; //displaying the result array
The result:
Please note the parent/child image relations feature: when grabbing image galleries with link following enabled, the followed thumbs are set to be parents for the full size images founded underneath them; in this way you will know for every grabbed thumb the corresponding full size image and vice-versa.
require_once dirname(__FILE__)."/../bin/wlWmg.php"; //including the WiseLoop PHP Web Media Grabber Package: use your installation path here $url= 'http://www.flickr.com/search/?q=school&f=hp'; //the url $ig = new wlWmgImageGrabber($url, array('.jpg', '.jpeg')); //creating the filtered(by .jpg and .jpeg extensions) image grabber object $ig->setGrabOnlyFromTagSlice('photo-display-container'); //limiting the searching area: only images founded inside that tag will be grabbed $ig->setLimit(10); //limiting the grabbed images count: only first 10 images will be grabbed $ig->setFollowSubpagesLinks(true); //activating the sub-pages links following $ig->setFollowedSubpageGrabOnlyFromTagSlice('main-photo-container');//limiting the sub-page searching area for finding the full size images $ig->setDoDownload(true); //enabling the download feature $ig->grab(); //the grab command echo '<pre>'.print_r($ig->getValidMediaTable(true), true).'</pre>'; //displaying the result array
We will follow a classical recipe for displaying image galleries: in the main page we'll show the thumbs, and after clicking the thumb the full size picture corresponding to that thumb will be shown.
require_once dirname(__FILE__)."/../bin/wlWmg.php"; //including the WiseLoop PHP Web Media Grabber Package: use your installation path here $url= 'http://www.flickr.com/search/?q=school&f=hp'; //the url $ig = new wlWmgImageGrabber($url, array('.jpg', '.jpeg')); //creating the filtered(by .jpg and .jpeg extensions) image grabber object $ig->setGrabOnlyFromTagSlice('photo-display-container'); //limiting the searching area: only images founded inside that tag will be grabbed $ig->setLimit(10); //limiting the grabbed images count: only first 10 images will be grabbed $ig->setFollowSubpagesLinks(true); //activating the sub-pages links following $ig->setFollowedSubpageGrabOnlyFromTagSlice('main-photo-container');//limiting the sub-page searching area for finding the full size images $ig->setDoDownload(true); //enabling the download feature $ig->grab(); //the grab command $thumbs = $ig->getMediaRoots(); //get the images thumbs (media that have kids) foreach($thumbs as $thumb) { $images = $ig->getMediaKids($thumb); //get the thumb full size images (only one in this case) if(count($images)) { $image = $images[0]; //get the thumb full size image echo '<a href="'.$image->getGrabbedUrl().'" style="float:left; margin:10px; background-color:#d5d5d5; padding:5px;">'; //link to full size image echo '<img src="'.$thumb->getGrabbedUrl().'"/>'; //the thumb echo '</a>'; } }
The result:
If we want our gallery to be more attractive we can also use a very powerful inline image processor provided by WiseLoop.
For a full feature list you can visit the PHP Graphic Works product page at http://wiseloop.com/product/php-graphic-works
For our purpose we will use the live feature of the product to standardize the thumbs dimensions and to apply a rounded mask and a reflection effects over the thumbs:
require_once dirname(__FILE__)."/../bin/wlWmg.php"; //including the WiseLoop PHP Web Media Grabber Package: use your installation path here require_once dirname(__FILE__)."/../../php-graphic-works/bin/wlGw.php"; //including the WiseLoop PHP Graphic Works Package: use your installation path here $url= 'http://www.flickr.com/search/?q=school&f=hp'; //the url $ig = new wlWmgImageGrabber($url, array('.jpg', '.jpeg')); //creating the filtered(by .jpg and .jpeg extensions) image grabber object $ig->setGrabOnlyFromTagSlice('photo-display-container'); //limiting the searching area: only images founded inside that tag will be grabbed $ig->setLimit(10); //limiting the grabbed images count: only first 10 images will be grabbed $ig->setFollowSubpagesLinks(true); //activating the sub-pages links following $ig->setFollowedSubpageGrabOnlyFromTagSlice('main-photo-container');//limiting the sub-page searching area for finding the full size images $ig->setDoDownload(true); //enabling the download feature $ig->grab(); //the grab command $thumbs = $ig->getMediaRoots(); //get the images thumbs (media that have kids) foreach($thumbs as $thumb) { $images = $ig->getMediaKids($thumb); //get the thumb full size images (only one in this case) if(count($images)) { $image = $images[0]; //get the thumb full size image $fxChain = 'CropAlign(center-center, 100, 70);Mask(rounded, 20);Reflection();'; //the effects chain to be applied over the thumbs $fxThumbPath = './../../php-graphic-works/live/do.php?img='.$thumb->getGrabbedUrl().'&fx='.$fxChain; //path to the inline graphic processor (requires WiseLoop PHP Graphic Works) echo '<a href="'.$image->getGrabbedUrl().'" style="float:left; margin:10px; background-color:#ffffff; padding:5px;">'; //link to full size image echo '<img src="'.$fxThumbPath.'"/>'; //the thumb echo '</a>'; } }
The result:
You can buy WiseLoop PHP Graphic Works from here: http://codecanyon.net/item/php-graphic-works/177929
By this time you can realize that WiseLoop PHP Web Media Grabber is a powerful tool that can help you develop complex specialized media grabbers for various websites.
You could wrap the code above into a nice class that can behave like a native grabber in order to grab, extract and download from flickr with only 3 lines of code!
The class definition could look like:
class wlWmgFlickrGrabber extends wlWmgImageGrabber { public function __construct($search) { $url = 'http://www.flickr.com/search/?q='.$search.'&f=hp'; parent::__construct($url, array('.jpg', '.jpeg')); $this->setGrabOnlyFromTagSlice('photo-display-container'); $this->setFollowSubpagesLinks(true); $this->setFollowedSubpageGrabOnlyFromTagSlice('main-photo-container'); } }
Usage:
$flickr = new wlWmgFlickrGrabber('school'); //create the flickr grabber object $flickr->setDoDownload(true); $flickr->grab(); echo '<pre>'.print_r($flickr->getValidMediaTable(true), true).'</pre>'; //displaying the result array
If you want to grab, retrieve or download various media files (images, videos, audios, flash files, documents, javascript sources, css stylesheet files etc.) from a public web site and use them locally in within your site, PHP Web Media Grabber is the perfect tool that will help you do that.
Extract and download anything from anywhere to your localhost server! Use the downloaded media locally into your website without external hot-linking.