This file contains documentation of the transformation phase.

Contents
  - Overview
  - The analysation phase
  - The transformation phase
      - Overview
      - Generating a new function
  - Status

Originally the transformation phase was designed to solve three tasks:
 - specialisation of overloaded functions (in the following just
   "overloading specialisation"):
 - inlining of higher order arguments in functions that are
   defined locally in a macro (Nijmegen slang: "fold optimalisation")
 - deforestation

All these optimisations have some things in common (more or less): they
are all a kind of specialisation. Specialisation involves creating new
versions of functions. That's why it was decided to implement one algorithm
that can handle all three optimalisations at the same time.
We call this algorithm the "fusion" algorithm.

Overview
********

In fact there are two phases involved: 
 - the analysation phase
 - the transformation phase

Goal of the analysys is to assign two properties to each icl function argument
(or: function argument position): The consumer classification (::Int)
and the linearity (::Bool). These values are returned by analyseGroups in the
{! ConsClasses} value. Furtheron so called "active" cases are marked as such (these
markings are stored in the ExpressionHeap). The consumer classification can be
one of these four values:

  cPassive: 
    if none of the following
  cActive: 
    Only these function arguments can be specialized. The
    variable ("x" below) appears at least in one of the following positions:
      - x.member (record selection: could be overloading specialised)
      - x @ ... (higher order app: fold optimalisation)
      - case x of .. (could be deforested)
      - in a call to a function that's also cActive in that argument
  cAccumulating: 
    function argument can't be specialized because there
    is a recursive call where the argument is not a variable, which
    could lead into non termination of that algorithm.
  cVarOfMultimatchCase:
    A multimatch case is a case where one constructor could match
    twice, e.g.:

      g x = case x of
              Cons h t | p1 h -> t
              Cons h t | p2 h -> t

    Such cases were exluded from deforestation in a first
    approximation because they were difficult to handle (e.g. linearity
    analysys)

The linearity classification indicates, well, whether a function argument
is used linearily. Only linear variables can be substituted with expressions
(otherwise work could be duplicated).

The transformation phase uses this information to generate new functions.
The transformation algorithm will run over all icl functions, beginning
with the leafs of the component tree. For every icl function he will run
through the body expression (postorder!) and when he finds a function
appliation that can be specialised
he will do so (if the function wasn't specialised in that way already).

The criteria to specialize a function "f" in an argument "A" are the following:
  - f must be cActive in that argument.
  - A must be an application
  - If A is an application of a dictionary constructor: YEAH, go for it!
  - If A is a curried application and f is defined as a local macro function:
    YEAH!!!
  - If A is a function application (g E1 .. En) and f is linear in that
    argument: YEAH! Deforest! Cut all trees.  
Otherwise nothing happens.

E.g. if (f E1 (g E2 E3) E4) meets the above condition then a new function
f_g will be created and that function application will become

  f_g E1 E2 E3 E4  (if g is a function)
  f_g E1 E4        (if g is a dictionary constructor) (it's not always that simple)

f is called the _consumer_ and g the _producer_.

Of course a new type for f_g has to be derived.

How is f_g created? In case g was
  - a dictionary:the formal argument in f is substituted the dictionary application
  - a curried function: the formal (higher order) argument in f is substituted
    with "g"
  - an uncurried function: the formal argument in f is substituted with the
    _body_ of g.


The analysation phase
*********************

The analysys happens component wise, beginning with the leafs.

The fundamental function here is the consumerRequirements class whose
instances run through an expression. It's main data structure it is working
on is the *ConsClassSubst:=={#Int}. Each variable that is used in a function
that is currently analysed is assigned a number. This number is an index into
the ConsClassSubst array. So the ConsClassSubst assigns a consumer classification
to every variable. The classification process reminds on typing. If an array
element is negative then the corresponding variable currently has been assigned
to one of the four classifications:

  cPassive              :== -1
  cActive               :== -2
  cAccumulating         :== -3
  cVarOfMultimatchCase  :== -4

If it is non negative (say i) then this states that the classification is the same
as the variable which corresponds to i.

The instances of consumerRequirements return a boolean that indicates whether
the expression is unsafe, e.g. in

  case x of
    Cons h t -> case p h of True -> t
    ... 

(assumed that these cases result from patterns and guards) the
consumerRequirements instance would return True for the inner case,
because this case could fail so that control would GOTO the outer case.
In this case the boolean would be used to classify x as 
cVarOfMultimatchCase.

The ai_cur_ref_counts field of the AnalyseInfo record is used to
determine the linearity of all variables, the arguments and the local
variables.

The ai_cases_of_vars_for_function field of the AnalyseInfo record 
accumulates all cases of which the case_expr is a variable. At the end
of analyzing a component the case_info_ptr is marked (in function
set_case_expr_info) with an EEI_ActiveCase tag, if the corresponding
variable is active. So after the analysation phase every case is
either "active" or not.

The information whether a case is active is stored with the
case_info_ptr. But there is already type information stored in the
ExpressionHeap. For every case the case_info_ptr points to an 
EI_CaseType entry. For every "let" expression the let_info_ptr
points to an EI_LetType entry. These entries assign the variables
that are introduced by a pattern or let bind a type. This information
is needed for lifting (also in the following phases). So I invented
the EI_Extended constructor which allows to store several independent
pieces of information for one pointer. At the end of the transformation
phase the original heap contents has to be restored again
(function "cleanup_attributes") (the advantages of functional
programming shine through here very clearly).

The transformation phase
************************

The transformation phase - Overview
************************^^^^^^^^^^^

There are two basic transformations that are applied for deforestation
(overloading specialisation and the fold optimalisation are just simple
inlining!):

 - The "case in case" transformation:

     case (case x of
             P1 -> A1
             P2 -> A2)
       P3 -> A3
       P4 -> A4
   
   transforms into

     case x of
       P1 -> case A1 of
               P3 -> A3
               P4 -> A4
       P2 -> case A2 of
               P3 -> A3
               P4 -> A4
       
   This happens in function lift_case. Note that this transformation
   duplicates code.

 - The "match" transformation:

     case (Cons A1 A2) of
       Cons h t -> A3
       ..

   transforms into

     let h=A1
         t=A2
     in A3

   If "h" or "t" are used linearily in A3 the can be inlined, too.
   The match transformation reduces the amount of code (all the other patterns).
   Note that this transformation is only valid, if the case is not
   a multimatch case 

Let's consider some examples first:

  sum l =
    case l of
      Nil -> 0
      Cons h t -> +I h (sum t) // +I: plus on integers

  every_second toggle l2 =
    case l2 of
      Cons h2 t2
          -> case toggle of
               True -> Cons h2 (every_second (not toggle) t2)
               False -> (every_second (not toggle) t2)
      Nil -> Nil

The analysys phase will assign l (the argument of sum) "cActive" and "linear".
"cActive" because 
 - the recursive call has a variable (t) as argument (in case of a
   non variable it would be cAccumulating)
 - l is used as a case_expression (case l of) (otherwise it would be cPassive
   or cVarOfMultiMatchCase
 - the case is not a multimatch case
"linear" because l appears only once

The case in the sum function will be marked as "active case of 'sum' in 'l'":

  sum l =
    case<*sum*> l of ...

Indeed, sum is a typical list consumer!

The arguments of every_second are classified as:
 - toggle:cAccumulating, because there is a recursive call (in fact two)
     with a non variable argument (not toggle).
 - l2: cActive (for the same reasons as "l" in function sum)

Now if we transform the following function

  Start = sum (every_second True [42,42,42])

we see that we have an uncurried function application in the active
consumer position of sum. We fuse these two functions by replacing
"l" in sum with the body of every_second (happens in "generateFunction"):

  Start = sum_every_second True [42,42,42]

  sum_every_second toggle l2 =
    case<*sum*> (case l2 of
                   Cons h2 t2
                    -> case toggle of
                         True -> Cons h2 (every_second (not toggle) t2)
                         False -> (every_second (not toggle) t2)
                   Nil -> Nil) of
      Nil -> 0
      Cons h t -> +I h (sum t)

This newly generated function will be stored in the FunctionHeap.

Now we transform sum_every_second. First we apply the case in case
transformation. 

  sum_every_second toggle l2 =
    case l2 of
      Cons h2 t2
          -> case<*sum*> (case toggle of
                            True -> Cons h2 (every_second (not toggle) t2)
                            False -> (every_second (not toggle) t2))
               Nil -> 0
               Cons h t -> +I h (sum t)
      Nil -> case<*sum*> Nil of
               Nil -> 0
               Cons h t -> +I h (sum t)

Now we apply the case in case transformation once more in the
Cons case and in the Nil case we do the match:

  sum_every_second toggle l2 =
    case l2 of
      Cons h2 t2
        -> case toggle of
             True  -> case<*sum*> Cons h2 (every_second (not toggle) t2) of
                        Nil -> 0
                        Cons h t -> +I h (sum t)
             False -> case<*sum*> every_second (not toggle) t2 of
                        Nil -> 0
                        Cons h t -> +I h (sum t)
      Nil -> 0

Finally we apply the match in the True case and we do the FOLD step:

  sum_every_second toggle l2 =
    case l2 of
      Cons h2 t2
        -> case toggle of
             True  -> +I h2 (sum (every_second (not toggle) t2))
             False -> sum_every_second (not toggle) t2
      Nil -> 0

The fold step can be applied, because we know that this particular
case is the root case of the sum function, hence 

  case<*sum*> every_second (not toggle) t2 of
    Nil -> 0
    Cons h t -> +I h (sum t)

is equivalent to 

  sum (every_second (not toggle) t2)

which in turn is known to be specialised already:

  sum_every_second (not toggle) t2

So we can apply the fold step, if the root case comes together with the
producer again. What do we do when during transformation a non-root case
comes together with the producer? Answer: We generate a new function.
Comment: This is bullshit. It would be better to generate functions for
_all_ non root cases in a seperate phase before,
because currently after the transformation
phase (while interfacing with the backend) this is done anyway! BTW, the 
every_second function has no non root case.

Generating a new function would be necessary in the following case:

  consumer l = 
    1 + (case l of
           Cons h t -> h + consumer t
           Nil -> 0)

  filter p l2 =
    case l2 of
      Nil -> Nil 
      Cons h2 t2 -> case p @ h2 of
                      True -> Cons h2 (filter p t2)
                      False -> filter p t2

Fusing these two would proceed like follows:

  consumer_filter p l2 =
    1 + (case<*consumer*> (case l2 of
                             Nil -> Nil 
                             Cons h2 t2 -> case p h2 of
                                             True -> Cons h2 (filter p t2)
                                             False -> filter p t2) of
           Cons h t -> h + consumer t
           Nil      -> 0 )

  consumer_filter p l2 =
    1 + (case l2 of
           Nil        -> case<*consumer*> Nil of
                           Cons h t -> h + consumer t
                           Nil      -> 0
           Cons h2 t2 -> case<*consumer*> (case p h2 of
                                             True -> Cons h2 (filter p t2)
                                             False -> filter p t2) of
                           Cons h t -> h + consumer t
                           Nil      -> 0) 

  consumer_filter p l2 =
    1 + (case l2 of
           Nil        -> 0
           Cons h2 t2 -> case p h2 of
                           True  -> case<*consumer*> Cons h2 (filter p t2)
                                      Cons h t -> h + consumer t
                                      Nil      -> 0
                           False -> case<*consumer*> filter p t2 of
                                      Cons h t -> h + consumer t
                                      Nil      -> 0) 

  consumer_filter p l2 =
    1 + (case l2 of
           Nil        -> 0
           Cons h2 t2 -> case p h2 of
                           True  -> h2 + (consumer_filter p t2)
                           False -> case<*consumer*> filter p t2 of
                                      Cons h t -> h + consumer t
                                      Nil      -> 0) 

Now we should _not_ apply the fold step like

  consumer_filter p l2 =
    1 + (case l2 of
           Nil        -> 0
           Cons h2 t2 -> case p h2 of
                           True  -> h2 + (consumer_filter p t2)
                           False -> consumer_filter p t2

Then (consumer_filter isPositive [-1]) would evaluate to 2 while
(consumer (filter isPositive [-1])) would evaluate to 1!

Instead we do so as if consumer was defined like

  consumer l = 1 + (consumer_case l)
  consumer_case l = case l of
                      Cons h t -> h + consumer t
                      Nil -> 0

Now we can do the fold step. So the result would be

  consumer_filter p l2  = 1+(consumer_case_filter p l2)
  consumer_case_filter p l2
    = case l2 of
           Nil        -> 0
           Cons h2 t2 -> case p h2 of
                           True  -> h2 + (consumer_filter p t2)
                           False -> consumer_case_filter p t2

The additional functions that are generated for non root cases
are created in function "generate_case_function".

To finish this overview I give an example for fold optimalisation:

  foldl f l init =
    case l of Nil -> init
              Cons h t -> foldl f t (f @ h init)

  Start = foldl (+) [] 0

Classification:
   foldl f:(cActive, not linear) l:(cActive, linear) init:(cAccumulating, linear)

The first arg is active because it appears on the RHS of @.

I don't fuse the constructor []! (Sjaak does -> code explosion)

But I specialize in the first arg. Being not linear doesn't matter
for higher order arguments, because there is no calculation 
duplicated (same with dictionaries)

  foldl_plus l init =
    case l of Nil -> init
              Cons h t -> foldl_plus t (h + init)

  Start = foldl_plus [] 0

And I give example for overloading specialisation

The programmer writes:

  gimme_da_first :: (a e) -> e | Array a e
  gimme_da_first a = a.[0]

  gimme_a_strict_array :: {!Int}
  gimme_a_strict_array = {42}
  
  Start = gimme_da_first gimme_a_strict_array

The compiler makes from this while typing:

  :: ArrayDictionary a e =
      { select :: (a e) Int -> e
      , update :: (a e) Int e -> (a e)
      , size   :: (a e) -> Int
      ..  

  gimme_da_first :: (ArrayDictionary a e) (a e) -> e
  gimme_da_first dictionary a = dictionary.select @ a 0

  Start = gimme_da_first { select = strictArraySelect, update=strictArrayUpdate,
                           size = strictArraySize .. }
                         gimme_a_strict_array

We see that the first dictionary argument of "gimme_da_first" is classified as
a consumer (cActive because of the selection and even linear). So we fuse
"gimme_da_first" with the record constructor:

  gimme_da_first_StrictArray :: {!e} -> e
  gimme_da_first_StrictArray a = strictArraySelect a 0

  Start = gimme_da_first_StrictArray gimme_a_strict_array

In this example we can see very clear that through fusion (be it 
overloading specialisation, deforestation or fold optimalisation) the type
of the newly created function has become specialised, too. In this case
for (a e) the a has been substituted with {!} yielding {!e}

This happens by unifying the producer's result type with the type of the
argument of the producer. In this case we unify

       (ArrayDictionary {!} e) // producer { select = strictArraySelect, ...
  with (ArrayDictionary a   e) // consumer gimme_da_first  
  
yielding the substitution a->{!}


The transformation phase - Generating a new function
************************^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The function "generateFunction" is called when a function is specialised.

As the important parameters it gets the consumers function definition, the
classification of it's arguments and an array "prods" that contains a producer
description for each consumer argument. So generateFunction indeed
specialises one consumer with possibly multiple producers. A producer
description of type "Producer" indicates, whether the producer is a
dictionary, a curried application or a function.

First some trouble with types happens. As stated before we have to apply
type unification. Therefore we reuse the infrastructure from the typing
phase. Unfortunately this infrastructure makes certain assumptions about
the given types for which some work has to be done to fulfill them.

The unification algorithm assumes type variables in "integer form" (TempV) 
instead of "pointer form" (TV). Furtheron we have to take care about
uniqueness information. We have to add propagation information (see function
"add_propagation_attributes). This will transform a type

  consumer :: [*{Int}] -> Int

into

  consumer :: *[*{Int}] -> Int

After giving every involved type variable a number we create an initial
type variable substitution "subst". This array assigns every type variable
(number) a type. With every producer this substitution can be further specialised.

The "determine_args" function is the called. To explain what this function
does here is an example (I write (List a) instead of [a]):

  consumer :: (List a) (List b) (a, b) -> (Int, Int, Int)
  consumer x y z = (case x of ..., case y of ..., fun0 z z z)

  producer :: (List c) T2 -> (List Int)
  producer1 a b = fun1 a b

  producer2 :: T3 -> (List Real)
  producer2 c = fun2 c
  
Our application is:

  Start = consumer (producer1 E1 E2) (producer E3) E4

Because we are generating a new functions all variables in that function
should be fresh. The generated function should look like this:
 
  consumer` a1 b1 c1 z1 = (case fun1 a1 b1 of ..., case fun2 c1 of ..., fun0 z1 z1 z1)

The RHS expression is later created by calling the "unfold" function. "unfold" substitutes
all local variables of a expression with fresh ones, but the free variables (here x, y and z)
are replaced with an expression on which the free variable has been bound before.
So determine_args establishes the following bindings (stored in the var_heap):

  x -> fun1 a1 b1
  y -> fun2 c1
  z -> z1

As it's first result value determine_args delivers the new function args: [a1, b1, c1, z1]

What about the types of these arguments? The second result value of determine args
is an array whose elements either contain the producers argument types or (in case there
was no producer for that argument) the consumers original type. This would be in the
upper example:

  {[List c, T2],[T3],[(a, b)]}

But the following type would be wrong:

  consumer` :: (List c) T2 T3 (a, b) -> (Int, Int, Int)

The right type is

  consumer` :: (List c1) T2 T3 (Int, Real) -> (Int, Int, Int) // c1 "fresh"

Fortunately we can infer this type correctly because "determine_args" corrects
our type variable substitution "subst" with every producer. So the 
substitution would look like this

  a -> Int      // from producer1
  b -> Real     // from producer2
  c -> c1       // fresh

Furtheron determine_args also accumulates uniqueness requirements: inequalities between
attributed types. Later the inference of the attributes of the type of the generated
function happens like in the typing phase: the (attribute less) substitution is lifted
and attribute coercions are derived from the expanded uniqueness requirements,
the coercion graph is partitionated, unused attribute variables are removed.

Finally, the "integer" type representation is again copied into a "pointer" type
representation.

After we have let "unfold" create the RHS of our new function, we of course transform
it.

There is a lot of type information stored in the ExpressionHeap. This information has
to be kept up to date when creating the RHS expression. Especially the type
information for case and let expressions has to be refreshened and specialized.
E.g. when in the upper original consumer for some local variable the polymorph
type "a" was stored then this should be "Int" in the specialized consumer`.

Status  (June 2001)
******

The following issues are suboptimal:

- The algorithm does not terminate, e.g. when transforming:

    zipWith f l1 l2 :== [f e1 e2 \\ e1<-l1 & e2<-l2]

    l = [1 : zipWith (+) [1] l]

    t [] = []
    t [e:l] = [e: t l]

  Start = t l

  One could try to find criteria that ensure that fusion will terminate. Alternatively
  one could also decide while fusing, whether this fusion should be aborted using
  ad hoc methods (e.g. a counter that counts keeps track of the recursion depth, or
  whatever). It can be quite nice if a compiler is known to terminate always.

- It is possible that through fusion some functions can get unused. Imagine a function
  that is called only once. When this call is specialised the original won't be
  called anymore. But this superflous function still remains in the functions array,
  doesn't get garbage collected and is further processed (e.g. in the convertcases
  phase). One could infer the components once more after the transformation phase
  to deal with that issue.

- Local definitions don't get inlined, e.g. let "c" be a consumer and "p" a producer.
  The follwing would be fused:

    c p

  while the following would not:

    let pp = p in c pp

  This could be improved.

- Only two functions are fused together, but a third one is not getting involved.
  E.g. when fusing the follwing:

    i x = x
    producer x = i [x:producer x]
    consumer l = case l of ...

  the algorithm stops at

    consumer_producer x = case (i [x:producer x]) of

- Multimatch cases are ignored (see above). The compiler generates lots of
  multimatch cases (I remember the zip2 function).

- Only function arguments get deforested, but not e.g. the following

    case [] of [] -> 1

The following are real bugs:

- Strict expressions get optimised away although they shouldn't
    
    :: T = C !Int
    producer = C (abort "ABORT")
    consumer x = case x of C _ -> 1
    Start = consumer producer

  fusion result:

    Start = consumer_producer
    consumer_producer = 1
  
  (no abort)

- It can happen that while fusing one sees that a case will never match:

    producer = []
    consumer [_:_] = 1
    Start = consumer producer

  fusion result:

    Start = consumer_producer
    consumer_producer = <never_matching_case>

- Deforestation should better be done _after_ strictness analysys. Look at:

    consumer :: ![a] -> [Int]
    consumer l
      = [case l of
          [] -> 0
          [_:t] -> 1
        ]

    producer = undef

    Start = consumer producer

  In this case "consumer" and "producer" may not be fused here although this
  happens with the current implementation. The problem here is that the
  argument of "consumer" has been made artificially strict. Possible solutions:
  - VERBIEDEN!!!!
    Never fuse consumer arguments that have been marked strict.
  - If a consumer argument has been marked strict, ensure that it is indeed
    strict. Therefore one has to reason about strictness of expressions.
    A primitive strictness analysys could be part of the transformation
    phase (prima!).

- The algorithm makes no distinction between cases that come from patterns/guards
  and cases that were specified by the programmer. The algorithm assumes that no
  case was specified by the programmer. I think two places where this leads into
  wrong results are:
  - the second return value of the consumerRequirements class
  - the removeNeverMatchingSubcases function