Set::IntSpan - Manages sets of integers |
Set::IntSpan - Manages sets of integers
use Set::IntSpan qw(grep_set map_set);
$Set::IntSpan::Empty_String = $string;
$set = new Set::IntSpan $set_spec; $valid = valid Set::IntSpan $run_list; $set = copy $set $set_spec;
$run_list = run_list $set; @elements = elements $set;
$u_set = union $set $set_spec; $i_set = intersect $set $set_spec; $x_set = xor $set $set_spec; $d_set = diff $set $set_spec; $c_set = complement $set;
equal $set $set_spec; equivalent $set $set_spec; superset $set $set_spec; subset $set $set_spec;
$n = cardinality $set;
empty $set; finite $set; neg_inf $set; pos_inf $set; infinite $set; universal $set;
member $set $n; insert $set $n; remove $set $n;
$min = min $set; $max = max $set;
$subset = grep_set { ... } $set; $mapset = map_set { ... } $set;
for ($element=$set->first; defined $element; $element=$set->next) { ... } for ($element=$set->last ; defined $element; $element=$set->prev) { ... }
$element = $set->start($n); $element = $set->current;
Perl 5.004, Exporter
@EXPORT
Nothing
@EXPORT_OK
grep_set
, map_set
Set::IntSpan
manages sets of integers.
It is optimized for sets that have long runs of consecutive integers.
These arise, for example, in .newsrc files, which maintain lists of articles:
alt.foo: 1-21,28,31 alt.bar: 1-14192,14194,14196-14221
Sets are stored internally in a run-length coded form. This provides for both compact storage and efficient computation. In particular, set operations can be performed directly on the encoded representation.
Set::IntSpan
is designed to manage finite sets.
However, it can also represent some simple infinite sets, such as
{x | x>n}.
This allows operations involving complements to be carried out consistently,
without having to worry about the actual value of INT_MAX on your machine.
Many of the methods take a set specification. There are four kinds of set specifications.
If a set specification is omitted, then the empty set is assumed. Thus,
$set = new Set::IntSpan;
creates a new, empty set. Similarly,
copy $set;
removes all elements from $set.
If an object reference is given, it is taken to be a Set::IntSpan
object.
If an array reference is given, then the elements of the array are taken to be the elements of the set. The array may contain duplicate elements. The elements of the array may be in any order.
If a string is given, it is taken to be a run list. A run list specifies a set using a syntax similar to that in newsrc files.
A run list is a comma-separated list of runs. Each run specifies a set of consecutive integers. The set is the union of all the runs.
Runs may be written in any of several forms.
The empty set is consistently written as '' (the null string). It is also denoted by the special form '-' (a single dash).
The runs in a run list must be disjoint, and must be listed in increasing order.
Valid characters in a run list are 0-9, '(', ')', '-' and ','. White space and underscore (_) are ignored. Other characters are not allowed.
Each set has a single iterator,
which is shared by all calls to
first
, last
, start
, next
, prev
, and current
.
At all times,
the iterator is either an element of the set,
or undef
.
first
, last
, and start
set the iterator;
next
, and prev
move it;
and current
returns it.
Calls to these methods may be freely intermixed.
Using next
and prev
,
a single loop can move both forwards and backwards through a set.
Using start
, a loop can iterate over portions of an infinite set.
new
Set::IntSpan
$set_specSet::IntSpan
object.
The initial contents of the set are given by $set_spec.
valid
Set::IntSpan
$run_listcopy
$set $set_speccopy
returns $set.
run_list
$setBy default, the empty set is formatted as '-';
a different string may be specified in $Set::IntSpan::Empty_String
.
elements
$set
union
$set $set_specintersect
$set $set_specxor
$set $set_specdiff
$set $set_speccomplement
$setFor all set operations,
a new Set::IntSpan
object is created and returned.
The operands are not affected.
equal
$set $set_specequivalent
$set $set_specsuperset
$set $set_specsubset
$set $set_spec
cardinality
$setempty
$setfinite
$setneg_inf
$setpos_inf
$setinfinite
$set
member
$set $ninsert
$set $nremove
$set $n
min
$setundef
if there is none.
max
$setundef
if there is none.
first
undef
.
Returns the iterator.
last
undef
.
Returns the iterator.
start
($n)undef
.
Returns the iterator.
next
undef
.
Returns the iterator.
next
will return undef
only once;
the next call to next
will reset the iterator to
the smallest element of $set.
prev
undef
.
Returns the iterator.
prev
will return undef
only once;
the next call to prev
will reset the iterator to
the largest element of $set.
current
grep_set
{ ... } $set$_
to each integer)
and returns a Set::IntSpan
object containing those integers
for which the BLOCK returns TRUE.
Returns undef
if $set is infinite.
map_set
{ ... } $set$_
to each integer)
and returns a Set::IntSpan
object containg
all the integers returned as results of all those evaluations.
Evaluates the BLOCK in list context, so each element of $set may produce zero, one, or more elements in the returned set.
Returns undef
if $set is infinite.
$Set::IntSpan::Empty_String
$Set::IntSpan::Empty_String
contains the string that is returned when
run_list
is called on the empty set.
$Empty_String
is initially '-';
alternatively, it may be set to ''.
Other values should be avoided,
to ensure that run_list
always returns a valid run list.
run_list
accesses $Empty_String
through a reference
stored in $set->{empty_string
}.
Subclasses that wish to override the value of $Empty_String
can
reassign this reference.
Any method (except valid
) will die
if it is passed an invalid run list.
Set::IntSpan::_copy_run_list: Bad syntax:
$runListSet::IntSpan::_copy_run_list: Bad order:
$runListSet::IntSpan::elements: infinite set
elements
.
elements
$set can generate an ``Out of memory!''
message on sufficiently large finite sets.
Beware of forms like
union $set [1..5];
This passes an element of @set to union, which is probably not what you want. To force interpretation of $set and [1..5] as separate arguments, use forms like
union $set +[1..5];
or
$set->union([1..5]);
You cannot use the obvious comparison routine
{ $a->cardinality <=> $b->cardinality }
to sort sets by size,
because cardinality
returns -1 for infinte sets.
(All the non-negative integers were taken. Sorry.)
Instead, you have to write something like
{ my $a_card = $a->cardinality; my $b_card = $b->cardinality;
$a_card == $b_card and return 0; $a_card < 0 and return 1; $b_card < 0 and return -1; $a_card <=> $b_card }
grep_set
and map_set
make it easy to construct
sets for which the internal representation used by Set::IntSpan
is not small. Consider:
$billion = new Set::IntSpan '0-1_000_000_000'; # OK $odd = grep_set { $_ & 1 } $billion; # trouble $even = map_set { $_ * 2 } $billion; # double trouble
There are two common approaches to error handling:
exceptions and return codes.
There seems to be some religion on the topic,
so Set::IntSpan
provides support for both.
To catch exceptions, protect method calls with an eval:
$run_list = <STDIN>; eval { $set = new Set::IntSpan $run_list }; $@ and print "$@: try again\n";
To check return codes, use an appropriate method call to validate arguments:
$run_list = <STDIN>; if (valid Set::IntSpan $run_list) { $set = new Set::IntSpan $run_list } else { print "$@ try again\n" }
Similarly, use finite
to protect calls to elements
:
finite $set and @elements = elements $set;
Calling elements
on a large, finite set can generate an ``Out of
memory!'' message, which cannot (easily) be trapped.
Applications that must retain control after an error can use intersect
to
protect calls to elements
:
@elements = elements { intersect $set "-1_000_000 - 1_000_000" };
or check the size of $set first:
finite $set and cardinality $set < 2_000_000 and @elements = elements $set;
Although Set::IntSpan
can represent some infinite sets,
it does not perform infinite-precision arithmetic.
Therefore,
finite elements are restricted to the range of integers on your machine.
The sets implemented here are based on a Macintosh data structure called a region. See Inside Macintosh for more information.
Steven McDougall, swmcd@world.std.com
Copyright 1996-1998 by Steven McDougall. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
Set::IntSpan - Manages sets of integers |