PHP Web Grabber

PHP Web Grabber

Advanced PHP tag based web extractor engine

FAQ

Frequently asked questions

Q: Are nginx, IIS servers supported?

A: Probably yes.
Although the script have not been tested on other servers, it should work fine if PHP requirements are fulfilled.


Q: Can I get my own HTML page refreshed/updated automatically when the grabbed URL has changed?

A: Yes.
You just need to set the cache time to zero.


Q: When I access a my.domain.com/php-web-grabber, I get just a blank screen. Why?

A: PHP Web Grabber is not a standalone application.
It is a component designed to be used inside application that you want to develop.
If you want to create an application using PHP Web Grabber, in your app php source code file you need to include a reference to the wlWg.php file.
Please check the demos and see how to create such an application.


Q: Can this software grab contents and store them in a database or a file?

A: Not really.
The script deals only with the grabbing procedure, storing the results inside filesystem, databases etc. is beyond its design.
Anyway, after grabbing saving the results is a quite an easy and straightforward task that can be accomplished with some very basic knowledge of PHP.


Q: Can I grab pages that requires login?

A: No.
The grabber is not able to submit login information before actual grabbing.


Q: Can I grab facebook?

A: No.
Facebook requires login and the grabber is not able to submit login information before actual grabbing.


Q: Can I grab content that was loaded in the source page URL using AJAX?

A: No.
AJAX loaded content is not available when serving the URL for the very first request done by the grabbing engine.
The grabber is not able to run JavaScript like a regular browser, so it is only able to process the content loaded after the first request before the AJAX calls.


Q: Is there a way to grab/process a custom content (i.e. string coming from curl) and not only an URL?

A: Yes.
Please refer to wlWgContentProcessor class that allows grabbing and parsing custom content.


Q: Can I extract information from more websites?

A: Yes, but not at the same time.
The grabber was designed to grab and process only one website at a time.
Just create one grabber instance for each URL.


Q: Are there any specific grabbers available?

A: No.
Currently we do not have any specific grabbers for various websites and we have no intention to develop any.
But there are some good samples inside the package and we do believe that with some basic PHP skills anybody can create such grabbers tailored for their needs.


Q: Can I download a whole website?

A: Yes, but this feature is not offered out-of-the-box.
WiseLoop PHP Web Grabber is designed for and is mostly used to extract portions of a single target web page URL, and following links is beyond its scope.
Following links should have a custom implementation and for each link a new grabber instance should be used.


Q: Can I give the extracted data a CSS style to match my actual site?

A: Yes.
No CSS files are imported during the grabbing procedure, therefore your local CSS styles will be applied unless the actual grabbed content has inline styling.
If this is the case, post-extraction string processing capabilities should be used in order to get rid of those inline styles.


Q: Special characters such as diacritics will be grabbed and displayed correctly on my website?

A: They should be just fine.
The charset conversion feature is able to apply charset conversion between grabbed content and localhost in order to show special characters such as diacritics.


Q: Can I extract multiple div elements from the same page?

A: Yes.
To extract more than one tag from a page you just have to pass an array of more wlWgParam objects to the wlWgProcessor.


Q: I have some problems with the proxy feature. I tried several and they did not work for me. What can I do?

A: The proxy feature is working just fine.
It's just a matter of finding a good valid proxy (only HTTP type proxies are supported).
The script is not responsible how certain proxies deal with incoming requests.
Most of the free proxy lists found on on internet are not really reliable and they can reject requests based on some inner rules such as IPs, traffic load, location etc.


Q: When the caching time expires and we load the new version of grabbing page, what happens to the old cache file?

A: It will be reused.
One will never have more cache files for an URL and a set of grabbing parameters.
Of course, it is very possible that somebody will never use the same URL and parameters and the cache file will just stay there.
In order to control this, the developer can decide based on own judgement to use the wlWgUtils::clearCache() method that will clear the entire cache directory.
This method will clear entire cache but if a specific logic to handle cache is needed, the developer can use wlWgUtils::getCacheDir() in combination with his own algorithm how to handle cache clearing.


Q: Which are the supported browsers?

A: All of them.
It is a server-side solution and it is independent of the browser used.


Q: I buy an extended license, can I use it in a product that will be sold on codecanyon?

A: Licensing terms are imposed by envato.
For more information please visit: http://codecanyon.net/licenses or please ask envato directly.


Q: Is there a possibility to test your framework with a demo version with limited features or time?

A: No. But, you are free to play as long as you want with the online demo and also we will more than happy to answer you to any pre-sale questions.

Regular License $10.00
Use by you or one client, in a single end product which end users are not charged for.

Extended License $50.00
Use by you or one client, in a single end product which end users can be charged for.

Short Information

If you want to retrieve various contents from a public web site and display them on your site, PHP Web Grabber is the perfect tool that will help you do that.
No frames or iframes involved! Real HTML content, grabbed form the web and displayed in whithin your webpage just like it was generated by your site.

Buyer rating:
394 Sales