If you want to retrieve various contents from a public web site and display them on your site, PHP Web Grabber is the perfect tool that will help you do that.
No frames or iframes involved! Real HTML content, grabbed form the web and displayed in whithin your webpage just like it was generated by your site.
WiseLoop PHP Web Grabber is a set of PHP classes designed to extract HTML content form the web.
This package allows complex content extraction in a flexible manner, by using only a few lines of code. The extraction can be made from any web URL or local file; the desired content to be grabbed can be a full web page or a set of tags that can even have incomplete specifications.
The smart proxy support will cycle through a proxy list so two consequent requests at the same address will use different proxies.
The WiseLoop PHP Web Grabber has some after-extraction processing capabilities also:
- tags removal from the raw grabbed contents, so you can get rid of any unwanted tags and their contents;
- tags stripping of the raw grabbed contents, so you can strip any tag and just leave its inner HTML content;
- string replacement (just like the str_replace function) in the raw grabbed contents, so you can alter the results in a personalized manner;
- charset conversion between grabbed content and localhost to be able to show special characters such as diacritics in within a local page that is using the grabber to display foreign content;
These capabilities can enhance the results, saves storage space, enhances speed and really can help to fulfill the terms of usage of the web page that is grabbed.
The caching feature improves speed, saves bandwidth, prevents useless parsing and processing of the grabbed web pages, by storing in the cache the resulting processed contents for a given URL and set of tags and after-extraction settings.
The chosen programming model allows the development of a personalized web grabbers library and offers the possibility to develop powerful web API-s based on its simple, but yet smart tag HTML DOM parser and processor.
- grab any resource having HTML content (web or local);
- HTTP proxy support;
- tag based search and extraction (incomplete tags accepted);
- tag identification by unique characteristics (attributes), occurrence index in HTML DOM or contents;
- tag auto-complete based on the parsed HTML content;
- simple, but yet smart tag HTML DOM parser and processor;
- tags removal from the raw grabbed content capability;
- tags stripping of the raw grabbed content capability;
- string replacement on the raw grabbed content capability;
- charset conversion between grabbed content and localhost;
- smart caching for fast processing;
- easy development of any web API-s;
- easy development and extension of a personalized grabbers library (5 examples included);
- lightweight due to the Auto-loader feature;
- exhaustive documentation;
- Apache Web Server 2.0.0 or above
- PHP 5.0.0 or above
- Step 1: make a folder named /php-web-grabber on your Web server;
- Step 2: copy entire /bin and /cache folders to the new created /php-web-grabber folder;
- Step 3: make sure that the /php-web-grabber/cache directory is writable;
- Step 4: include /bin/wlWg.php in your application.
- Project Name: WiseLoop PHP Web Grabber
- Project Website: http://wiseloop.com/product/phpwebgrabber
- Online Tutorial: http://wiseloop.com/tutorial/phpwebgrabber
- Online Demonstration: http://wiseloop.com/demo/phpwebgrabber
- Author: WiseLoop, http://www.wiseloop.com/contact/phpwebgrabber
- Tags: web grabber, web extractor, web scrapper, web harvester, web ripper, web processor, html processor, html grabber, html extractor, html ripper, tag extractor, tag ripper, tag processor, HTML DOM parser
WiseLoop assumes no responsibility for any abusive use of this software product and/or violation of any terms of usage of the grabbed web pages.
If you decide to use this software product, do it with responsibility and make sure that you are allowed to display the grabbed HTML contents from the desired web page by checking its terms of usage.