WiseLoop Web Image Grabber Processor class definition
This class is designed to retrieve images referred or contained by an url page and stores them in the $_validMedia array variable.
It uses the base class wlWmgProcessor capabilities to search an url page for images (referred into img src tags, a href tag links to full size, inline css background images).
WiseLoop Web Image Grabber main features:
- smart image recognition (all formats and extensions, all locations: under the img src tag attribute, under a href link tag, under inline css attribute or by content-type);
- default native support for most common web image extensions (jpg, jpeg, gif, png, bmp, tif, tiff, yuv, ai, eps, ps, svg, drw, ief, jfif, svg, cod, ras, cmx, ico, pnm, pbm, pgm, rgb, xbm, xpm, xwd);
- a href link following: the grabbing engine is capable of following a href link tags that can hide behind them another images - this is a very powerful feature that can help grabbing entire image galleries (thumbs and full size images) that are displaying only the thumbs on the starting page and those thumbs are linked with an a href tag to the real full size image;
- parent/child image relation-ing: when grabbing image galleries with A Href Link following enabled, the followed thumbs are set to be parents for the full size images founded underneath them; in this way you will know for every grabbed thumb the corresponding full size image and viceversa;
- inline CSS background image recognition: the grabbing engine is able to identify images that are referred inline inside the CSS background or background-image attributes;
- image search and identification by the HTML content-type response header: the grabbing engine is able to identify more than the obvious image resources having the most common image file extensions - it will find the images generated dynamically by the servers or images that have no valid image extensions or no extensions at all; the identification is made by checking the server response header when pinging the tested media resource;
- image extension filtering: only those images having the specified extensions will be included in the grabbing results;
- image dimensions filtering: only those images having the specified dimensions (width / height) will be included in the grabbing results;
- image format filtering: only those images having the specified format (portrait or landscape) will be included in the grabbing results;
- media url name (filename) filtering: only those images having or containing in their url names some specified strings will be included in the grabbing results;
- media size filtering: only those images having the specified size (in bytes) will be included in the grabbing results;
- image count limiter: number of grabbed images will be limited to a specified value;
- HTML area searching: the grabbing engine is able to search for images only inside a designated HTML area specified by a tag; in this way you can skip grabbing from the start any unwanted pictures by narrowing the full HTML target page to a smaller area consisting of a tag content; an incomplete tag (tag slice) can be specified also, the tag will auto complete depending on the contextual HTML content;
- downloading capability: the WiseLoop PHP Image Grabber is able to download the grabbed images to the local server, so those images can be referred or used as local resources in the future;
- WiseLoop takes no responsibility if the targeted url changes its tag structure or its HTML DOM tree, resulting in unexpected data retrieval; this will not be considered as malfunction or bug, and you should check the targeted url's HTML DOM tree for changes and modify the code that instantiates this class or any inherited classes.
Also, WiseLoop assumes no responsibility for any abusive use of this class and/or violation of terms of usage of the target url.
- See also: