Web Grabber WordPress Plugin

Web Grabber WordPress Plugin

Tag based web extractor WordPress plugin

The [webgrab] Shortcode

The sortcode feature allows insertion of web grabbed contents into your posts or pages.

The Shortcode Syntax

The following syntax can be used inside posts, pages or custom fields:

 [webgrab url='http://the-url.com' tag='the-tag-to-be-extracted' idx='index-in-dom' contains='filter-content' cache='minutes' proxy='xxx.xxx.xxx.xxx:port,' charset='the-grabbed-page-charset' rtag0='remove-tag' stag0='strip-tag' srch0='search' repl0='replace']

Shortcode Attributes

  • url: the target URL to be grabbed; this attribute is mandatory;
  • tag: the tag to extract; the grabbing engine will return the contents of the tag specified here; an incomplete tag can be specified also, the tag will autocomplete depending on the searched HTML content;
    To avoid breaks in WYSIWYG editor, use '{' and '}' symbols instead of classical '<' and '>' HTML tag symbols.
    This attribute is optional, if not specified entire contents of the target URL will be grabbed;
  • idx: use this if you want to grab the contents of a tag that cannot be uniquely identified but you know its index in the HTML DOM;
    Let's say that in the target page there are more <p> tags having no other distinct marks (like ID attributes or inline CSS styles) to uniquely identify them; you can set this value to 2 is you want to grab the 3rd paragraph contents (the index is zero-based).
    This attribute is optional, the default value is 0 (zero).
  • contains: filters the tag that will be extracted based in its content;
    Use this if you want to grab the contents of a tag that cannot be uniquely identified but you know for sure that it contains this value;
    This attribute is optional.
  • cache: cache time measured in minutes; the default value is 1 day (1440 minutes);
  • proxy: list of proxies separated by comma, (in xxx.xxx.xxx.xxx:port format) used by the grabbing engine. Use space as an entry to use the localhost server's IP. The grabber engine will cycle through the list and for same request the next available proxy will be used.
    Only HTTP proxy types are supported!
    This attribute is optional.
     [ ... proxy='123.123.123.123:8080, 321.321.321.321:81'] 
    
  • charset: specifies the charset of the target URL to be grabbed to apply charset conversion. The conversion might be needed to be able to show special characters such as diacritics in within a local page that is using the [webgrab] shortcode to grab and display foreign content.
    The charset of localhost must be specified in the wlWgConfig.php file under engine/bin directory and it must match the WordPress charset setting. By default it is set to UTF-8 and probably it's just fine like this.
    This attribute is optional and if no charset is specified in the shortcode, no charset conversion will take place.
     [ ... charset='ISO-8859-1'] 
    
  • rtag: remove tag: indicates a tag to be removed completely from the result;
    The extracion engine will remove these tags and their contents from the result; incomplete tags can be specified also, the tags will autocomplete depending on the contextual HTML content;
    Use this if you want to get rid of any unwanted tags from the result (ie. following some targeted site terms of usage conditions). For example, there are sites that allow to use their text contents in other sites, but does not allow to unse 'in-line' image reffering, that is, you cannot display images hosted on the target site on your site by using in your site tags like this: <img src='target_site.com/image.jpg'/> wich can exists in the grabbed content.
    If you live the grabbed content like this, you will break targeted site terms of usage conditions.
    To avoid this, you should specify the tags to be removed like this:
     [ ... rtag0='a-tag', rtag1='another-tag', rtag2='another-tag' etc. 
    
    This will ensure that all the images hosted on the target site will not be used and displayed on your site and so, you will not break any terms of usage conditions.
  • stag: strip tag: indicates a tag that will be stripped from the HTML code, and and leave only its inner contents;
     [ ... stag0='a-tag', stag1='another-tag', stag2='another-tag' etc. 
    
    Incomplete tags can be specified also, the tags will autocomplete depending on the contextual HTML content.
  • srch: search for replace: indicates a search string sequence that will be replaced by its corresponding repl (replace) string;
  • repl: replace: indicates the replacement for the corresponding search string sequence specified by src (search) attribute;
    The following code,
     [ ... srch0='a-search', repl0='a-replacement', srch1='another-search' repl1='another-replacement' etc. 
    
    will replace inside the grabbed contents all occurences of 'a-search' with 'a-replacemet', all occurences of 'another-search' with 'another-replacement' etc.

Examples

Example 1: Very simple grabbing of first paragraph from the WordPress website

The shortcode:

 [webgrab url='http://wordpress.org/about/' tag='{p class="intro"}']

The result:

wlwg-wpp-example1.jpg
[webgrab] shortcode sample: simple web grabbing

Example 2: A little more complex grabbing with some tags removal

The shortcode:

 [webgrab url='http://www.freewebsitetemplates.com' tag='{div id="leftside"}' rtag1='{h1}' rtag2='{div class="pages"' rtag3='{div class="about"}' rtag4='{div class="templatedaily"}']

You can add unlimited number of tags to be removed from the output: use shortcode attributes that starts with rtag.
The result:

wlwg-wpp-example2.jpg
[webgrab] shortcode sample: web grabbing with tag removals

Example 3: Grabbing with some tags removal and some tags stripping

The shortcode:

 [webgrab url='http://www.freewebsitetemplates.com' tag='{div id="leftside"}' rtag1='{h1}' rtag2='{div class="pages"' rtag3='{div class="about"}' rag4='{div class="templatedaily"}' stag1='{a' stag2='{span class="title"}']

You can add unlimited number of tags to be stripped of the output: use shortcode attributes that starts with stag.
The result:

wlwg-wpp-example3.jpg
[webgrab] shortcode sample: web grabbing with tag removals and tag striping

Example 4: Grabbing with tags removal, tags stripping and string replacements

The shortcode:

 [webgrab url='http://www.freewebsitetemplates.com' tag='{div id="leftside"}' rtag1='{h1}' rtag2='{div class="pages"' rtag3='{div class="about"}' rtag4='{div class="templatedaily"}' stag1='{a' stag2='{span class="title"}' srch1='class="ss"' repl1='style="display:block;"' srch2='class="title"' repl2='style="display:block;"']

You can add unlimited number of search/replace pairs to be passed to processor: for needle, use shortcode attributes that starts with srch, and for replace, use shortcode attributes that starts with repl.
The result:

wlwg-wpp-example4.jpg
[webgrab] shortcode sample: web grabbing with tag removals, tag striping and string processing

Example 5: Grabbing with tags removal, tags stripping and string replacements - advanced

The shortcode:

 [webgrab url='http://www.freewebsitetemplates.com' tag='{div id="leftside"}' rtag1='h1' rtag2='{div class="pages"' rtag3='{div class="about"}' rtag4='{div style="clear:' rtag5='{div class="clear"}' rtag6='{div style="margin-left:31px;display:block;"}' rtag7='{div class="templatedaily"}' srch2='class="ss"' repl2='style="display:block;"'/]

The result:

wlwg-wpp-example5.jpg
[webgrab] shortcode sample: web grabbing with tag removals, tag striping and string processing

Example 6: Advanced grabbing with tags removal, tags stripping and string replacements - applying custom own design formatting

The shortcode:

 [webgrab url='http://www.freewebsitetemplates.com' tag='{div id="leftside"}' rtag1='{h1}' rtag2='{div class="pages"' rtag3='{div class="about"}' rtag4='{div style="clear:' rtag5='{div class="clear"}' rtag6='{div style="margin-left:31px;display:block;"}' rtag7='{div class="templatedaily"}' srch1='class="title"' repl1='style="display:block;font-size:16px;font-weight:bold;margin-bottom:5px;background-color:#dedede;padding:2px;border:1px solid #ababab;"' srch2='class="ss"' repl2='style="display:block;margin-bottom:10px;"' srch3='class="download"' repl3='style="margin-top:17px;padding:5px;font-weight:bold;text-decoration:none;border:1px solid #ababab;background-color:#dedede;color:#000000;"' srch4='class="preview"' repl4='style="margin-top:17px;padding:5px;font-weight:bold;text-decoration:none;border:1px solid #ababab;background-color:#dedede;color:#000000;"' srch5='class="getwix"' repl5='style="margin-top:17px;padding:5px;font-weight:bold;text-decoration:none;border:1px solid #ababab;background-color:#dedede;color:#000000;"' srch6='class="templateleft"' repl6='style="padding:10px 0 10px 0;border-bottom:1px solid #ababab;"' srch7='class="templateright"' repl7='style="padding:10px 0 10px 0;border-bottom:1px solid #ababab;"'/]

The result:

wlwg-wpp-example6.jpg
[webgrab] shortcode sample: web grabbing with tag removals, tag striping and string processing
Note:
Please observe the incomplete tag specifications in examples above; they behave like some kind of markers that selects the tags that contains in their definitions that specifications.
Regular License $18.00
Use by you or one client, in a single end product which end users are not charged for.

Extended License $90.00
Use by you or one client, in a single end product which end users can be charged for.

Short Information

If you want to embed various contents from public web sites into your WordPress blog, then the Web Grabber WordPress Plugin is the perfect tool that will help you do that.
No frames or iframes involved! Real HTML content, grabbed form the web and displayed in whithin your blog just like you wrote it by yourself.

Buyer rating:
424 Sales