boilerpipe - Fulltext Extraction from HTML

Property Value
Distribution RPM Universal
Repository JPackage 6.0 all
Package name boilerpipe
Package version 1.1.0
Package release 1.jpp6
Package architecture noarch
Package type rpm
Installed size 98.41 KB
Download size 89.69 KB
Official Mirror
The boilerpipe library provides algorithms to detect and
remove the surplus "clutter" (boilerplate, templates) around
the main textual content of a web page.
The library already provides specific strategies for common
tasks (for example: news article extraction) and may also be
easily extended for individual problem settings.
Extracting content is very fast (milliseconds), just needs
the input document (no global or site-level information
required) and is usually quite accurate.


Package Version Architecture Repository
boilerpipe - - -


Name Value
java >= 1.6.0
jpackage-utils >= 1.7.5
nekohtml -


Name Value
boilerpipe = 1.1.0-1.jpp6


Type URL
Binary Package boilerpipe-1.1.0-1.jpp6.noarch.rpm
Source Package boilerpipe-1.1.0-1.jpp6.src.rpm

Install Howto

Fedora, CentOS, RHEL:
  1. Download latest jpackage-release rpm from
  2. Install jpackage-release rpm:
    # rpm -Uvh jpackage-release*rpm
  3. Install boilerpipe rpm package:
    # yum install boilerpipe
  1. Add the JPackage 6.0 repository:
    # zypper addrepo jpackage-6.0
  2. Install boilerpipe rpm package:
    # zypper install boilerpipe
Mandriva, Mageia:
  1. Add the JPackage 6.0 repository:
    # urpmi.addmedia jpackage-6.0 with
  2. Update packages list:
    # urpmi.update -a
  3. Install boilerpipe rpm package:
    # urpmi boilerpipe




2012-02-11 - Ralph Apel <r.apel at> - 0:1.1.0-1
- First release

See Also

Package Description
boilerpipe-javadoc-1.1.0-1.jpp6.noarch.rpm Javadoc for boilerpipe
bonecp-0.8.0-1.jpp6.noarch.rpm BoneCP JDBC pool
bonecp-javadoc-0.8.0-1.jpp6.noarch.rpm Javadoc for bonecp
bouncycastle-1.46-4.jpp6.noarch.rpm Bouncy Castle Crypto Package for Java
bouncycastle-javadoc-1.46-4.jpp6.noarch.rpm Javadoc for bouncycastle
brazil-2.3-9.jpp6.noarch.rpm Extremely small footprint Java HTTP stack
brazil-demo-2.3-9.jpp6.noarch.rpm Demos for brazil
brazil-javadoc-2.3-9.jpp6.noarch.rpm Javadocs for brazil
brazil-repolib-2.3-9.jpp6.noarch.rpm Artifacts to be uploaded to a repository library
bsf-2.4.0-11.jpp6.noarch.rpm Bean Scripting Framework
bsf-javadoc-2.4.0-11.jpp6.noarch.rpm Javadoc for bsf
bsf-manual-2.4.0-11.jpp6.noarch.rpm Manual for bsf
bsf-repolib-2.4.0-11.jpp6.noarch.rpm Artifacts to be uploaded to a repository library
bsf3-3.1-1.jpp6.noarch.rpm Bean Scripting Framework
bsf3-javadoc-3.1-1.jpp6.noarch.rpm Javadoc for bsf3