SGMLS - class for postprocessing the output from the B and B parsers.


NAME

SGMLS - class for postprocessing the output from the sgmls and nsgmls parsers.


SUPPORTED PLATFORMS

This module is not included with the standard ActivePerl distribution. It is available as a separate download using PPM.

SYNOPSIS

  use SGMLS;
  my $parse = new SGMLS(STDIN);
  my $event = $parse->next_event;
  while ($event) {
    SWITCH: {
      ($event->type eq 'start_element') && do {
        my $element = $event->data;    # An object of class SGMLS_Element
        [[your code for the beginning of an element]]
        last SWITCH;
      };
      ($event->type eq 'end_element') && do {
        my $element = $event->data;    # An object of class SGMLS_Element
        [[your code for the end of an element]]
        last SWITCH;
      };
      ($event->type eq 'cdata') && do {
        my $cdata = $event->data;      # A string
        [[your code for character data]]
        last SWITCH;
      };
      ($event->type eq 'sdata') && do {
        my $sdata = $event->data;      # A string
        [[your code for system data]]
        last SWITCH;
      };
      ($event->type eq 're') && do {
        [[your code for a record end]]
        last SWITCH;
      };
      ($event->type eq 'pi') && do {
        my $pi = $event->data;         # A string
        [[your code for a processing instruction]]
        last SWITCH;
      };
      ($event->type eq 'entity') && do {
        my $entity = $event->data;     # An object of class SGMLS_Entity
        [[your code for an external entity]]
        last SWITCH;
      };
      ($event->type eq 'start_subdoc') && do {
        my $entity = $event->data;     # An object of class SGMLS_Entity
        [[your code for the beginning of a subdoc entity]]
        last SWITCH;
      };
      ($event->type eq 'end_subdoc') && do {
        my $entity = $event->data;     # An object of class SGMLS_Entity
        [[your code for the end of a subdoc entity]]
        last SWITCH;
      };
      ($event->type eq 'conforming') && do {
        [[your code for a conforming document]]
        last SWITCH;
      };
      die "Internal error: unknown event type " . $event->type . "\n";
    }
    $event = $parse->next_event;
  }


DESCRIPTION

The SGMLS package consists of several related classes: see SGMLS, SGMLS_Event, SGMLS_Element, SGMLS_Attribute, SGMLS_Notation, and SGMLS_Entity. All of these classes are available when you specify

  use SGMLS;

Generally, the only object which you will create explicitly will belong to the SGMLS class; all of the others will then be created automatically for you over the course of the parse. Much fuller documentation is available in the .sgml files in the DOC/ directory of the SGMLS.pm distribution.

The SGMLS class

This class holds a single parse. When you create an instance of it, you specify a file handle as an argument (if you are reading the output of sgmls or nsgmls from a pipe, the file handle will ordinarily be STDIN):

  my $parse = new SGMLS(STDIN);

The most important method for this class is next_event, which reads and returns the next major event from the input stream. It is important to note that the SGMLS class deals with most ESIS events itself: attributes and entity definitions, for example, are collected and stored automatically and invisibly to the user. The following list contains all of the methods for the SGMLS class:

next_event(): Return an SGMLS_Event object containing the next major event from the SGML parse.
element(): Return an SGMLS_Element object containing the current element in the document.
file(): Return a string containing the name of the current SGML source file (this will work only if the -l option was given to sgmls or nsgmls).
line(): Return a string containing the current line number from the source file (this will work only if the -l option was given to sgmls or nsgmls).
appinfo(): Return a string containing the APPINFO parameter (if any) from the SGML declaration.
notation(NNAME): Return an SGMLS_Notation object representing the notation named NNAME. With newer versions of nsgmls, all notations are available; otherwise, only the notations which are actually used will be available.
entity(ENAME): Return an SGMLS_Entity object representing the entity named ENAME. With newer versions of nsgmls, all entities are available; otherwise, only external data entities and internal entities used as attribute values will be available.
ext(): Return a reference to an associative array for user-defined extensions.

The SGMLS_Event class

This class holds a single major event, as generated by the next_event method in the SGMLS class. It uses the following methods:

type(): Return a string describing the type of event: ``start_element'', ``end_element'', ``cdata'', ``sdata'', ``re'', ``pi'', ``entity'', ``start_subdoc'', ``end_subdoc'', and ``conforming''. See SYNOPSIS, above, for the values associated with each of these.
data(): Return the data associated with the current event (if any). For ``start_element'' and ``end_element'', returns an SGMLS_ELement object; for ``entity'', ``start_subdoc'', and ``end_subdoc'', returns an SGMLS_Entity object; for ``cdata'', ``sdata'', and ``pi'', returns a string; and for ``re'' and ``conforming'', returns the empty string. See SYNOPSIS, above, for an example of this method's use.
key(): Return a string key to the event, such as an element or entity name (otherwise, the same as data()).
file(): Return the current file name, as in the SGMLS class.
line(): Return the current line number, as in the SGMLS class.
element(): Return the current element, as in the SGMLS class.
parse(): Return the SGMLS object which generated the event.
entity(ENAME): Look up an entity, as in the SGMLS class.
notation(ENAME): Look up a notation, as in the SGMLS class.
ext(): Return a reference to an associative array for user-defined extensions.

The SGMLS_Element class

This class is used for elements, and contains all associated information (such as the element's attributes). It recognises the following methods:

name(): Return a string containing the name, or Generic Identifier, of the element, in upper case.
parent(): Return the SGMLS_Element object for the element's parent (if any).
parse(): Return the SGMLS object for the current parse.
attributes(): Return a reference to an associative array of attribute names and SGMLS_Attribute structures. Attribute names will be all in upper case.
attribute_names(): Return an array of strings containing the names of all attributes defined for the current element, in upper case.
attribute(ANAME): Return the SGMLS_Attribute structure for the attribute ANAME.
set_attribute(ATTRIB): Add the SGMLS_Attribute object ATTRIB to the current element, replacing any other attribute structure with the same name.
in(GI): Return true (ie. 1) if the string GI is the name of the current element's parent, or false (ie. 0) if it is not.
within(GI): Return true (ie. 1) if the string GI is the name of any of the ancestors of the current element, or false (ie. 0) if it is not.
ext(): Return a reference to an associative array for user-defined extensions.

The SGMLS_Attribute class

Each instance of an attribute for each SGMLS_Element is an object belonging to this class, which recognises the following methods:

name(): Return a string containing the name of the current attribute, all in upper case.
type(): Return a string containing the type of the current attribute, all in upper case. Available types are ``IMPLIED'', ``CDATA'', ``NOTATION'', ``ENTITY'', and ``TOKEN''.
value(): Return the value of the current attribute, if any. This will be an empty string if the type is ``IMPLIED'', a string of some sort if the type is ``CDATA'' or ``TOKEN'' (if it is ``TOKEN'', you may want to split the string into a series of separate tokens), an SGMLS_Notation object if the type is ``NOTATION'', or an SGMLS_Entity object if the type is ``ENTITY''. Note that if the value is ``CDATA'', it will not have escape sequences for 8-bit characters, record ends, or SDATA processed -- that will be your responsibility.
is_implied(): Return true (ie. 1) if the value of the attribute is implied, or false (ie. 0) if it is specified in the document.
set_type(TYPE): Change the type of the attribute to the string TYPE (which should be all in upper case). Available types are ``IMPLIED'', ``CDATA'', ``NOTATION'', ``ENTITY'', and ``TOKEN''.
set_value(VALUE): Change the value of the attribute to VALUE, which may be a string, an SGMLS_Entity object, or an SGMLS_Notation subject, depending on the attribute's type.
ext(): Return a reference to an associative array available for user-defined extensions.

The SGMLS_Notation class

All declared notations appear as objects belonging to this class, which recognises the following methods:

name(): Return a string containing the name of the notation.
sysid(): Return a string containing the system identifier of the notation, if any.
pubid(): Return a string containing the public identifier of the notation, if any.
ext(): Return a reference to an associative array available for user-defined extensions.

The SGMLS_Entity class

All declared entities appear as objects belonging to this class, which recognises the following methods:

name(): Return a string containing the name of the entity, in mixed case.
type(): Return a string containing the type of the entity, in upper case. Available types are ``CDATA'', ``SDATA'', ``NDATA'' (external entities only), ``SUBDOC'', ``PI'' (newer versions of nsgmls only), or ``TEXT'' (newer versions of nsgmls only).
value(): Return a string containing the value of the entity, if it is internal.
sysid(): Return a string containing the system identifier of the entity (if any), if it is external.
pubid(): Return a string containing the public identifier of the entity (if any), if it is external.
filenames(): Return an array of strings containing any file names generated from the identifiers, if the entity is external.
notation(): Return the SGMLS_Notation object associated with the entity, if it is external.
data_attributes(): Return a reference to an associative array of data attribute names (in upper case) and the associated SGMLS_Attribute objects for the current entity.
data_attribute_names(): Return an array of data attribute names (in upper case) for the current entity.
data_attribute(ANAME): Return the SGMLS_Attribute object for the data attribute named ANAME for the current entity.
set_data_attribute(ATTRIB): Add the SGMLS_Attribute object ATTRIB to the current entity, replacing any other data attribute with the same name.
ext(): Return a reference to an associative array for user-defined extensions.


AUTHOR AND COPYRIGHT

Copyright 1994 and 1995 by David Megginson, dmeggins@aix1.uottawa.ca. Distributed under the terms of the Gnu General Public License (version 2, 1991) -- see the file COPYING which is included in the SGMLS.pm distribution.


SEE ALSO:

the SGMLS::Output manpage and the SGMLS::Refs manpage.

 SGMLS - class for postprocessing the output from the B and B parsers.