Discussion:
Compiling pattern matches
(too old to reply)
Jon Harrop
2010-04-27 21:24:13 UTC
Permalink
HLVM now includes an example pattern match compiler in the third example
compiler front-end that can compile some quite sophisticated ML-like
programs:

http://flyingfrogblog.blogspot.com/2010/04/variant-types-and-pattern-matching-in.html

However, there is a perf bug in my pattern match compiler where the amount
of HLVM IR generated grows exponentially with the number of match cases.
This seems unavoidable when compiling to nested "if" expressions so my first
thought is to pull the pattern match compiler into HLVM itself and generate
shared LLVM basic blocks directly. However, I believe most pattern match
compilers in ML compilers don't do this and just compile to a higher-level
IR so I'm wondering how they get around this redundant code issue. I can
think of two alternative solutions. One is to convey sharing some other way
and have HLVM detect them and reuse the same basic block. The other is to
factor the shared code into separate functions.

Is this a fair assessment?
--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com
Torben Ægidius Mogensen
2010-05-03 10:50:13 UTC
Permalink
Post by Jon Harrop
HLVM now includes an example pattern match compiler in the third
example compiler front-end that can compile some quite sophisticated
http://flyingfrogblog.blogspot.com/2010/04/variant-types-and-pattern-matching-in.html
However, there is a perf bug in my pattern match compiler where the
amount of HLVM IR generated grows exponentially with the number of
match cases. This seems unavoidable when compiling to nested "if"
expressions so my first thought is to pull the pattern match compiler
into HLVM itself and generate shared LLVM basic blocks
directly. However, I believe most pattern match compilers in ML
compilers don't do this and just compile to a higher-level IR so I'm
wondering how they get around this redundant code issue. I can think
of two alternative solutions. One is to convey sharing some other way
and have HLVM detect them and reuse the same basic block. The other is
to factor the shared code into separate functions.
Is this a fair assessment?
Generally, you can divide ML/Haskell pattern-match compilers into thre
groups:

1. Trivial. These compile each pattern in isolation and so may cause a
lot of repeated testing, but the code is guaranteed to be linear in
the size of the pattern.

2. Optimal. These compile a set of patterns to code that never
compares the same constructor in the value twice against the same
constructor-pattern constant. These may even exploit that once you
have compared against N-1 possibilities, there is no need to compare
to the Nth possibility. These will some times cause exponential
explosion (same reason that DFAs can be exponentially larger than
NFAs). The algorithm in Peter Sestofts article ML Pattern Match
Compilation and Partial Evaluation (LNCS 1110) is in this category.

3. Linear code. These compile a set of patterns to code that is linear
in the size of the patterns and tries to avoid comparing the sme
things twice, but give up optimality for code size. The algoritm
described by Wadler in Simon Peyton Jones' book The Implementation
of Functional Languages.

In practice, exponential explosion occurs rarely, and even when it does,
pattern sets are typically small enough that you can live with it. So
it is, IMO, acceptable to compile to optimal code always.

Optimal code will often have several places in the code where the same
RHS is selected. You should avoid copying the RHS code in all of these
places and, instead, jump to the RHS from all of these places.

Torben
Jon Harrop
2010-05-06 23:49:44 UTC
Permalink
Post by Torben Ægidius Mogensen
Generally, you can divide ML/Haskell pattern-match compilers into thre
Great! Thanks for the references.

Cheers,
Jon.

Continue reading on narkive:
Loading...