B::Graph - Perl compiler backend to produce graphs of OP trees |
B::Graph - Perl compiler backend to produce graphs of OP trees
perl -MO=Graph,-text prog.pl >graph.txt
perl -MO=Graph,-vcg prog.pl >graph.vcg xvcg graph.vcg
perl -MO=Graph,-dot prog.pl | dot -Tps >graph.ps
This module is a backend to the perl compiler (B::*) which, instead of
outputting bytecode or C based on perl's compiled version of a program,
writes descriptions in graph-description languages specifying graphs that
show the program's structure. It currently generates descriptions for the
VCG tool (http://www.cs.uni-sb.de/RW/users/sander/html/gsvcg1.html
) and
Dot (part of the graph visualization toolkit from AT&T:
http://www.research.att.com/sw/tools/graphviz/
). It also can produce
plain text output (which is more useful for debugging the module itself than
anything else, though you might be able to make cut the nodes out and make
a mobile or something similar).
Like any other compiler backend, this module needs to be invoked using the
O
module to run correctly:
perl -MO=Graph,-opt,-opt,-opt program.pl OR perl -MO=Graph,-opt,obj -e 'BEGIN {$obj = ["hi"]}; print $obj' OR EVEN perl -e 'use O qw(Graph -opt obj obj); print "hi!\n";'
Obj
is the name of a perl variable whose contents will be examined.
It can't be a my()
variable, and it shouldn't have a prefix symbol
('$@^*'), though you can specify a package -- the name will be used to
look up a GV, whose various fields will lead to the scalar, array, and
other values that correspond to the named variable. If no object is
specified, the whole main program, including the CV that points to its
pad, will be displayed.
Each of the the opt
s can come from one of the following (each set is
mutually exclusive; case and underscores are insignificant):
Produce output of the appropriate type. The default is '-text', which isn't useful for much of anything (it does draw some nice ASCII boxes, though).
Each of the nodes on the graph produced corresponds to a C structure that has an address and includes pointers to other structures. The module uses these addresses to decide how to draw edges, but it makes the graph more compact if they aren't printed. The default is '-no_addrs'.
The collection of OPs that perl compiles a script into has two different layers of structure. It has a tree structure which corresponds roughly to the synactic nesting of constructs in the source text, and a roughly linked-list representation, essentially a postorder traversal of this tree, which is used at runtime to decide what to do next. The graph can be drawn to emphasize one structure or the other. The former, 'compile_order', is the default, as it tends to lead to graphs with aspect ratios close to those of standard paper.
If OPs represent a program's compiled code, SVs represent its data. This
includes literal numbers and strings (IVs, NVs, PVs, PVIVs, and PVNVs),
regular arrays, hashes, and references (AVs, HVs, and RVs), but also the
structures that correspond to individual variables (special HVs for symbol
tables and GVs to represent values within them, and special AVs that hold
my()
variables (as well as compiler temporaries)), structures that keep
track of code (CVs), and a variety of others. The default is to display
all these too, to give a complete picture, but if you aren't in a holistic
mood, you can make them disappear.
The module tries to give the nodes representing SVs a different shape from those of OPs. OPs are usually rectangular, so two obvious shapes for SVs are ellipses and rhombuses (stretched diamonds). This option currently only makes a difference for VCG (ellipse is the default).
The hashes that perl uses to represent symbol tables are called 'stashes'. Since every GV has a pointer back to its stash, it's virtually inevitable for the links in a graph to lead to the main stash. Unfortunately stashes, especially the main one, can be quite big, and lead to forests of other structures -- there's one GV and another SV for each magic variable, plus all of @INC and %ENV, and so on. To prevent information overload, then, the display of stashes is disabled by default.
Another kind graph element that can be annoying are the pointers from every GV and COP (a kind of OP that occurs for every statement) to the GV that represents the file from which that code came (used for error messages). By default, these links aren't shown, to keep them from cluttering the graph.
As it is visited in the peephole optimization phase, each OP gets a sequence number, which is currently used by anything (except the peephole optimizer, to avoid visiting OPs twice). If you want to see these, ask for them. (COPs have their own sequence numbers too, but they're more generally useful).
B::Graph always gives the type of each OP symbolically ('entersub'), but it can also print the numeric value of the type field, if you want. The default is no_types.
Almost every OP has an op_next and an op_sibling pointer, and B::Graph colors them distinctively (pink and light blue, respectively). Because of this, it isn't strictly necessary to 'anchor' the arrow on a line in the OP's box saying 'op_next'. To avoid these extra lines, you can use the 'float' option. Unlabeled arrows can be confusing, though, so the default is not to float.
Lexical (my()) variables and temporary values used by individual OPs are stored in 'pads', per-code arrays linked to the CV. OPs store indexes into these arrays in the 'op_targ' field, but B::Graph can often also draw links directly from the OP to the SV that stores the name of the variable. These links don't correspond to any real pointers, however, and they can make the graph more complicated, so they are disabled by default.
Pb SVs_PADBUSY reserved for tmp or my already Pt SVs_PADTMP in use as tmp Pm SVs_PADMY in use a "my" variable T SVs_TEMP string is stealable? O SVs_OBJECT is "blessed" Mg SVs_GMG has magical get method Ms SVs_SMG has magical set method Mr SVs_RMG has random magical methods I SVf_IOK has valid public integer value N SVf_NOK has valid public numeric (float) value P SVf_POK has valid public pointer (string) value R SVf_ROK has a valid reference pointer F SVf_FAKE glob or lexical is just a copy L SVf_OOK has valid offset value (mnemonic: lvalue) B SVf_BREAK refcnt is artificially low Ro SVf_READONLY may not be modified i SVp_IOK has valid non-public integer value n SVp_NOK has valid non-public numeric value p SVp_POK has valid non-public pointer value S SVp_SCREAM has been studied? V SVf_AMAGIC has magical overloaded methods
V OPf_WANT_VOID Want nothing (void context) S OPf_WANT_SCALAR Want single value (scalar context) L OPf_WANT_LIST Want list of any length (list context) K OPf_KIDS There is a firstborn child. P OPf_PARENS This operator was parenthesized. (Or block needs explicit scope entry.) R OPf_REF Certified reference. (Return container, not containee). M OPf_MOD Will modify (lvalue). T OPf_STACKED Some arg is arriving on the stack. * OPf_SPECIAL Do something weird for this op (see op.h)
VCG has a problem with boxes that have more than about 55 arrows coming out of them, so with large arrays and hashes B::Graph will stop outputting edges and some boxes may be disconnected.
Stephen McCamant <alias@mcs.com>
dot(1), xvcg(1), perl(1), perlguts(1).
If you like B::Graph, you might also be interested in Gisle Aas's
PerlGuts Illustrated, at http://home.sol.no/~aas/perl/guts/
.
B::Graph - Perl compiler backend to produce graphs of OP trees |