tagsoup - SAX-compliant parser written in Java that parses HTML as it is found in the wild: nasty and brutish

Distribution: RPM Universal
Repository: JPackage 5.0 all
Package name: tagsoup
Package version: 1.0.1
Package release: 3.jpp5
Package architecture: noarch
Package type: rpm
Installed size: 70.20 KB
Download size: 60.74 KB
Official Mirror: mirrors.dotsrc.org
TagSoup is a SAX-compliant parser written in Java that, instead of parsing well-formed or valid XML, parses HTML as it is found in the wild: nasty and brutish, though quite often far from short. TagSoup is designed for people who have to process this stuff using some semblance of a rational application design. By providing a SAX interface, it allows standard XML tools to be applied to even the worst HTML.



  • tagsoup = 1.0.1-3.jpp5


    Install Howto

    Fedora, CentOS, RHEL:
    1. Download the latest jpackage-release rpm from
    2. Install jpackage-release rpm:
      # rpm -Uvh jpackage-release*rpm
    3. Install tagsoup rpm package:
      # yum install tagsoup
    1. Add the JPackage 5.0 repository:
      # zypper addrepo http://mirrors.dotsrc.org/jpackage/5.0/generic/free/ jpackage-5.0
    2. Install tagsoup rpm package:
      # zypper install tagsoup
    Mandriva, Mageia:
    1. Add the JPackage 5.0 repository:
      # urpmi.addmedia jpackage-5.0 http://mirrors.dotsrc.org/jpackage/5.0/generic/free/ with hdlist.cz
    2. Update packages list:
      # urpmi.update -a
    3. Install tagsoup rpm package:
      # urpmi tagsoup


    • /usr/share/doc/tagsoup-1.0.1/CHANGES
    • /usr/share/doc/tagsoup-1.0.1/README
    • /usr/share/java/tagsoup-1.0.1.jar
    • /usr/share/java/tagsoup.jar


    2008-07-26 - David Walluck <dwalluck@redhat.com> 0:1.0.1-3 - set CLASSPATH and OPT_JAR_LIST - BuildRequires: ant-trax - set transformer.factory during build

    2008-07-26 - David Walluck <dwalluck@redhat.com> 0:1.0.1-2 - don't use %ghost for javadoc - update License - update BuildRoot - BuildRequires: java-javadoc - don't BuildRequires: /bin/bash

    2007-01-20 - Sebastiano Vigna <vigna@dsi.unimi.it> 0:1.0.1-1jpp - Upgraded to 1.0.1

    2006-02-27 - Fernando Nasser <fnasser@redhat.com> 0:1.0rc-2jpp - First JPP 1.7 version

    2005-01-28 - Sebastiano Vigna <vigna@acm.org> 0:1.0rc-1jpp - First JPackage version