Your personal theory is wrong. Yes, typically GHC optimizes simple cases well, but introduce a single obstacle and GHC won’t save you.
Here’s a simple program:
data NonEmpty a = One a | Cons a (NonEmpty a)
deriving (Show)
instance Semigroup (NonEmpty a) where
xs0 <> ys = go xs0 where
go (One x) = Cons x ys
go (Cons x xs) = Cons x (go xs)
{-# INLINE (<>) #-}
instance Functor NonEmpty where
fmap f = go where
go (One x) = One (f x)
go (Cons x xs) = Cons (f x) (go xs)
{-# INLINE fmap #-}
instance Applicative NonEmpty where
pure = One
{-# INLINE pure #-}
fs0 <*> xs = go fs0 where
go (One f) = fmap f xs
go (Cons f fs) = fmap f xs <> go fs
{-# INLINE (<*>) #-}
instance Monad NonEmpty where
xs0 >>= f = go xs0 where
go (One x) = f x
go (Cons x xs) = f x <> go xs
{-# INLINE (>>=) #-}
loop :: Int -> NonEmpty String
loop 0 = pure "b"
loop n = pure "a" >> loop (n - 1)
main = print $ loop (3 * 10 ^ (7 :: Int))
Running it with -O2
and -sstderr
gives me
51,128 bytes allocated in the heap
3,272 bytes copied during GC
44,328 bytes maximum residency (1 sample(s))
25,304 bytes maximum slop
6 MiB total memory in use (0 MiB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 0 colls, 0 par 0.000s 0.000s 0.0000s 0.0000s
Gen 1 1 colls, 0 par 0.001s 0.001s 0.0006s 0.0006s
INIT time 0.001s ( 0.001s elapsed)
MUT time 0.021s ( 0.021s elapsed)
GC time 0.001s ( 0.001s elapsed)
EXIT time 0.000s ( 0.008s elapsed)
Total time 0.023s ( 0.031s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 2,403,063 bytes per MUT second
Productivity 93.0% of total user, 69.5% of total elapsed
Now if I replace >>
with *>
, I get
1,689,186,232 bytes allocated in the heap
3,342,376,848 bytes copied during GC
828,730,056 bytes maximum residency (9 sample(s))
260,908,344 bytes maximum slop
2151 MiB total memory in use (0 MiB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 395 colls, 0 par 1.107s 1.108s 0.0028s 0.0057s
Gen 1 9 colls, 0 par 2.070s 2.070s 0.2300s 0.7351s
INIT time 0.001s ( 0.000s elapsed)
MUT time 0.901s ( 0.890s elapsed)
GC time 3.178s ( 3.178s elapsed)
EXIT time 0.001s ( 0.002s elapsed)
Total time 4.080s ( 4.070s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 1,874,759,220 bytes per MUT second
Productivity 22.1% of total user, 21.9% of total elapsed
This is an example of *>
being leaky when >>
isn’t, however that’s not the only issue.
Even if both of them have to allocate, defining <*>
in terms of >>=
the default way is pretty common and appears even in base
– and guess what, *>
defined in terms of <*>
defined in terms of >>=
is very likely gonna be slower than >>
defined in terms of >>=
directly.
We can take the same NonEmpty
example, but use the definition from base
:
import Data.List.NonEmpty
loop :: Int -> NonEmpty String
loop 0 = "c" :| []
loop n = ("a" :| ["b"]) >> loop (n - 1)
main = print . Prelude.length $ loop 23
Running it with -O2
and -sstderr
gives me
939,613,912 bytes allocated in the heap
449,564,896 bytes copied during GC
107,699,184 bytes maximum residency (7 sample(s))
1,729,552 bytes maximum slop
254 MiB total memory in use (0 MiB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 218 colls, 0 par 0.141s 0.141s 0.0006s 0.0035s
Gen 1 7 colls, 0 par 0.259s 0.259s 0.0370s 0.1055s
INIT time 0.001s ( 0.000s elapsed)
MUT time 0.149s ( 0.147s elapsed)
GC time 0.400s ( 0.400s elapsed)
EXIT time 0.000s ( 0.003s elapsed)
Total time 0.550s ( 0.551s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 6,296,104,197 bytes per MUT second
Productivity 27.1% of total user, 26.8% of total elapsed
Now if I replace >>
with *>
, I get
4,697,714,080 bytes allocated in the heap
1,498,159,192 bytes copied during GC
320,043,184 bytes maximum residency (11 sample(s))
4,331,152 bytes maximum slop
727 MiB total memory in use (0 MiB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 1116 colls, 0 par 0.542s 0.543s 0.0005s 0.0018s
Gen 1 11 colls, 0 par 0.815s 0.815s 0.0741s 0.2862s
INIT time 0.000s ( 0.000s elapsed)
MUT time 0.448s ( 0.442s elapsed)
GC time 1.357s ( 1.358s elapsed)
EXIT time 0.000s ( 0.010s elapsed)
Total time 1.806s ( 1.810s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 10,479,510,041 bytes per MUT second
Productivity 24.8% of total user, 24.4% of total elapsed
I.e. using *>
instead of >>
makes the program 3.3x slower and makes it consume 2.9x more memory.
Please give up this futile idea of ditching >>
while still defining *>
in terms of <*>
by default. It’s a nice idea in theory, but in practice it would be punishing industrial users for choosing Haskell – and nothing more than that.