Discussion:
Syntax Disambiguation in Functional Language
(too old to reply)
Paulo J. Matos
2010-11-23 10:54:53 UTC
Permalink
Hi,

I have been thinking about a syntax of programming languages and I was
wondering that if I were to create a new functional programming language
I would like that, like in Scheme, everything is an expression
(everything as a value) and when I have a sequence of expressions, I want
all of them to be evaluated and the last one be returned, like so:
1
(x + y)
f(x)

This would be an example of a sequence of 3 expressions, where all of
them are evaluated (possibly for side-effects) and the last one returned.
In this specific case 1 and (x + y) are redundant. However, how does one
disambiguate between a function call IDENT '(' IDENT ') and IDENT '('
IDENT ')' where these are two expressions, one IDENT and '(' IDENT ')'?

The previous case showed no problem but:
f
(x + y)
f(x)

does show the problem. One way is to add ';' after each expression but
how does python and sml for example handle this issue?

Cheers,
--
PMatos
Pascal J. Bourguignon
2010-11-23 12:28:09 UTC
Permalink
Post by Paulo J. Matos
Hi,
I have been thinking about a syntax of programming languages and I was
wondering that if I were to create a new functional programming language
I would like that, like in Scheme, everything is an expression
(everything as a value) and when I have a sequence of expressions, I want
1
(x + y)
f(x)
This would be an example of a sequence of 3 expressions, where all of
them are evaluated (possibly for side-effects) and the last one returned.
In this specific case 1 and (x + y) are redundant. However, how does one
disambiguate between a function call IDENT '(' IDENT ') and IDENT '('
IDENT ')' where these are two expressions, one IDENT and '(' IDENT ')'?
Trivially: put the function inside the parentheses!
Post by Paulo J. Matos
f
(x + y)
f(x)
(begin
f
(+ x y)
(f x))
Post by Paulo J. Matos
does show the problem. One way is to add ';' after each expression but
how does python and sml for example handle this issue?
Also, have a look at Liskell.

http://www.liskell.org/
--
__Pascal Bourguignon__ http://www.informatimago.com/
A bad day in () is better than a good day in {}.
Paulo J. Matos
2010-11-23 13:37:51 UTC
Permalink
Post by Pascal J. Bourguignon
Trivially: put the function inside the parentheses!
That would be a possible solution if I was ok with having a lisp-style
syntax but I was specifically thinking about cases where this doesn't
happen like sml and python.
Post by Pascal J. Bourguignon
Also, have a look at Liskell.
http://www.liskell.org/
I know liskell does the way you said it but the point of my question was:
assumming I don't want to do it the lisp way... how can you sort this
ambiguity out?
--
PMatos
Pascal J. Bourguignon
2010-11-23 14:17:27 UTC
Permalink
Post by Paulo J. Matos
Post by Pascal J. Bourguignon
Trivially: put the function inside the parentheses!
That would be a possible solution if I was ok with having a lisp-style
syntax but I was specifically thinking about cases where this doesn't
happen like sml and python.
Post by Pascal J. Bourguignon
Also, have a look at Liskell.
http://www.liskell.org/
assumming I don't want to do it the lisp way... how can you sort this
ambiguity out?
If there is an ambiguity, by definition, you cannot sort it out.

If you don't want ambiguities, then don't put them in! Ie. design your
grammar so that there's no ambiguity. Anything does, as long as there
is no ambiguity.
--
__Pascal Bourguignon__ http://www.informatimago.com/
A bad day in () is better than a good day in {}.
Paulo J. Matos
2010-11-23 16:01:21 UTC
Permalink
Post by Pascal J. Bourguignon
If there is an ambiguity, by definition, you cannot sort it out.
If you don't want ambiguities, then don't put them in! Ie. design your
grammar so that there's no ambiguity. Anything does, as long as there
is no ambiguity.
Excuse me but either you didn't understand my point or you just like to
post things that do not answer the question.

afaik python and sml allow things that _look like_:

f
(x)

how do they know if this is a function call or two separate expressions?
--
PMatos
Jussi Piitulainen
2010-11-23 16:35:11 UTC
Permalink
Post by Paulo J. Matos
f
(x)
how do they know if this is a function call or two separate
expressions?
Python ends the expression at the end of line if there are no
parentheses pending. On the line, it seems to prefer the
interpretation where f (x) is a function call. Makes sense.

This:

from math import acos

def f(x): print(acos(x))
x = -1

print(1)

f
(-1)

print(2)

f ; (-1)

print(3)

f (-1)

print(4)

(f
(-1))

print(5)

f \
(-1)

print(6)

in python 2.3.4 (old but default on this server) prints this:

1
2
3
3.14159265359
4
3.14159265359
5
3.14159265359
6

You could just have a binary operator that returns the value of its
right operand. The usual symbol for that operator is ;, with expected
order of evaluation left to right.
Paulo J. Matos
2010-11-23 17:14:13 UTC
Permalink
Post by Jussi Piitulainen
You could just have a binary operator that returns the value of its
right operand. The usual symbol for that operator is ;, with expected
order of evaluation left to right.
Thanks for your take on this. It was interesting to see your working
example in python. I never did much python but the code must look crazy
when you start getting long lines...
--
PMatos
Tim Harig
2010-11-23 16:53:48 UTC
Permalink
Post by Paulo J. Matos
Post by Pascal J. Bourguignon
If there is an ambiguity, by definition, you cannot sort it out.
If you don't want ambiguities, then don't put them in! Ie. design your
grammar so that there's no ambiguity. Anything does, as long as there
is no ambiguity.
Excuse me but either you didn't understand my point or you just like to
post things that do not answer the question.
Mr. Bourguignon is a LISP fanboi. If you ask him a question, he will give
you an answer in LISP whether you want it, speak LISP, or not.
Pascal J. Bourguignon
2010-11-23 17:10:56 UTC
Permalink
Post by Paulo J. Matos
Post by Pascal J. Bourguignon
If there is an ambiguity, by definition, you cannot sort it out.
If you don't want ambiguities, then don't put them in! Ie. design your
grammar so that there's no ambiguity. Anything does, as long as there
is no ambiguity.
Excuse me but either you didn't understand my point or you just like to
post things that do not answer the question.
There is no such thing as "look like" in programming languages.

What there is, is a formal grammar, which defines the syntax of the
language.
Post by Paulo J. Matos
f
(x)
how do they know if this is a function call or two separate expressions?
They use the newline as a token in that grammar, instead of ignoring it
(as most other programming languages do, for good reasons).
--
__Pascal Bourguignon__ http://www.informatimago.com/
A bad day in () is better than a good day in {}.
Nobody
2010-11-23 17:11:27 UTC
Permalink
Post by Paulo J. Matos
f
(x)
how do they know if this is a function call or two separate expressions?
In languages which use layout, a newline isn't the same as a space.
"f (x)" would be a function call, "f<newline>(x)" would be two expressions,
"f" and "(x)".
George Neuner
2010-11-23 18:04:00 UTC
Permalink
On Tue, 23 Nov 2010 16:01:21 +0000 (UTC), "Paulo J. Matos"
Post by Paulo J. Matos
Post by Pascal J. Bourguignon
If there is an ambiguity, by definition, you cannot sort it out.
If you don't want ambiguities, then don't put them in! Ie. design your
grammar so that there's no ambiguity. Anything does, as long as there
is no ambiguity.
Excuse me but either you didn't understand my point or you just like to
post things that do not answer the question.
f
(x)
how do they know if this is a function call or two separate expressions?
Sorry, but there isn't any *simple* way to disambiguate without using
some kind of punctuation. Pascal is simply recommending Lisp's
method.

A complex way would be to use a GLR parser which forks when
encountering ambiguity - the parser then follows all possible syntax
paths in parallel until one emerges as correct (or until all paths end
in failure).

However, even GLR can get hopelessly confused by a grammar that
permits composition of ambiguity and GLR parsers in general have
*TERRIBLE* diagnostics - the fact that the grammar is expected to be
ambiguous means that almost no parsing errors are possible ...
virtually all meaningful checking has to be done on the parse trees.

It might be better to take this question over to comp.compilers. I'm
pretty well versed in parsing technology, but there are some real
experts over there that might know some tricks you could use.

George
Chris F Clark
2010-11-24 19:13:40 UTC
Permalink
Post by George Neuner
On Tue, 23 Nov 2010 16:01:21 +0000 (UTC), "Paulo J. Matos"
Post by Paulo J. Matos
Post by Pascal J. Bourguignon
If there is an ambiguity, by definition, you cannot sort it out.
If you don't want ambiguities, then don't put them in! Ie. design your
grammar so that there's no ambiguity. Anything does, as long as there
is no ambiguity.
Excuse me but either you didn't understand my point or you just like to
post things that do not answer the question.
f
(x)
how do they know if this is a function call or two separate expressions?
Sorry, but there isn't any *simple* way to disambiguate without using
some kind of punctuation. Pascal is simply recommending Lisp's
method.
A complex way would be to use a GLR parser which forks when
encountering ambiguity - the parser then follows all possible syntax
paths in parallel until one emerges as correct (or until all paths end
in failure).
However, even GLR can get hopelessly confused by a grammar that
permits composition of ambiguity and GLR parsers in general have
*TERRIBLE* diagnostics - the fact that the grammar is expected to be
ambiguous means that almost no parsing errors are possible ...
virtually all meaningful checking has to be done on the parse trees.
It might be better to take this question over to comp.compilers. I'm
pretty well versed in parsing technology, but there are some real
experts over there that might know some tricks you could use.
George
Topic 1:

As has been pointed out in this thread, many of the languages that
disambiguate these lines do so by respecting newlines (and often
indentation). Neither feature is particularly well suited to
currently existing formal grammars. I'm not sure why no one has
bothered to formalize them, except that from a theoretical perspective
they don't add much expressive power (you can write everything you can
write with an indentation oriented language with a parenthesized
language) and from a practical perspective they require "counting"
things and counting is non-regular, so it doesn't axiomatize well.

Thus, while it is easy to say an increase in the indentation level is
equivalent to an open parenthesis (or possibly a set of them) and a
descrease in the indentation is equivalent to a close parenthesis (or
set of them). The practical issues in getting that formalized
precisely right is not as trivial as it sounds and their aren't tools
that automate writing grammars in that style (that I know of).

Topic 2:

George's solution of using GLR parsing and them his pointing out its
shortcomings is essentially spot-on. GLR parsing allows too much
ambiguity, especially in cases like these, because as long as there
are a set of legal parses, the parser will keep going hoping that at
some point they get disambiguated. However, without newline and
indentation style hints, there really isn't anything to disambiguate
them. Thus, the result is simply an undisambiguated parse forest.

An alternative that is a little more practical is the use of PEGs
(parsing expression grammars). In this case, the grammar effectively
defines a preferred method of disambiguating the cases where two
parses are possible. The result is that the parse forest generated by
a GLR type tool is pruned automatically down to one result. The rules
for doing such are generally simple and intuitive (at least if one can
see the grammar). PEGs do at time still produce counter-intuitive
results and the diagnostic tools are still quite limited. If one
tries to be too subtle or too sophisticated, one can quite easily
produce something unintelligible.

To my mind even better is the close sibling predicated grammars. They
share the characteristic that the ambiguity is resolved by parsing
order. However, they limit the ambiguity to where the grammar writer
has specifically allowed it. Thus, one can think of predicated
grammars as being primarily unambiguous with some cwertain limited
ambiguities allowed and disambiguated.

However, as I mentioned, I'm not aware of tools in any of these
grammar classes that have "operators" that comprehend indentation
levels. Without that hook, you will never quite get something that
looks like Python or Haskell from a grammar based tool (unless you
build it by yourself).

Hope this helps,
-Chris

******************************************************************************
Chris Clark email: ***@compiler-resources.com
Compiler Resources, Inc. Web Site: http://world.std.com/~compres
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: @intel_chris

Lauri Alanko
2010-11-24 00:53:14 UTC
Permalink
Post by Paulo J. Matos
f
(x)
how do they know if this is a function call or two separate expressions?
In SML, a sequence expression always requires a semicolon separator.
Hence, two juxtaposed expressions like in your example (the newline is
just ordinary whitespace) are always unambiguously an application.

(However, SML does not require separators between _declarations_. The
beginning of a new declaration can still be unambiguously recognized
because all declarations begin with a keyword.)


Lauri
Tim Harig
2010-11-23 17:01:05 UTC
Permalink
Post by Paulo J. Matos
f
(x + y)
f(x)
does show the problem. One way is to add ';' after each expression but
how does python and sml for example handle this issue?
I see two ways of handing this. One is to use a statement terminator.
C like languages use semicolons, Python uses a newline. You could use
whatever seems apropriate to you.

I second possiblity is to define your grammer such that a function call
ends if there is whitespace after the name such that:

f(x+y)

means call function f using function (x+y) as an argument while:

f (x+y)

means call function f followed by calling function (x+y).
Paulo J. Matos
2010-11-23 17:10:43 UTC
Permalink
I see two ways of handing this. One is to use a statement terminator. C
like languages use semicolons, Python uses a newline. You could use
whatever seems apropriate to you.
ahah.. that's something I was missing... so python does really require
the newline as statement terminator.
I second possiblity is to define your grammer such that a function call
f(x+y)
f (x+y)
means call function f followed by calling function (x+y).
Thanks for your insight.

Cheers,
--
PMatos
Tim Harig
2010-11-23 17:25:26 UTC
Permalink
Post by Paulo J. Matos
I see two ways of handing this. One is to use a statement terminator. C
like languages use semicolons, Python uses a newline. You could use
whatever seems apropriate to you.
ahah.. that's something I was missing... so python does really require
the newline as statement terminator.
I should point out that it also allows a semicolon as an optional statement
terminator so that it is possible to place two statements on a single line
should you wish to do so.
Richard
2010-11-23 17:07:52 UTC
Permalink
Post by Paulo J. Matos
f
(x + y)
f(x)
One way to parse these statements would be to use semantical
information in addition to the syntax defined in the language grammar.

In this example, after scanning the symbol "f" the compiler would look
up the data type of "f" in its symbol table. If "f" was a function, it
would continue parsing the function parameters; otherwise, it would
start with the next expression.

Of course, this would only work in a programming language where
functions always have to be defined before use and function names can
not be expressions themselves.

Richard
Torben Ægidius Mogensen
2010-11-24 10:34:16 UTC
Permalink
Post by Paulo J. Matos
I have been thinking about a syntax of programming languages and I was
wondering that if I were to create a new functional programming language
I would like that, like in Scheme, everything is an expression
(everything as a value) and when I have a sequence of expressions, I want
1
(x + y)
f(x)
This would be an example of a sequence of 3 expressions, where all of
them are evaluated (possibly for side-effects) and the last one returned.
In this specific case 1 and (x + y) are redundant. However, how does one
disambiguate between a function call IDENT '(' IDENT ') and IDENT '('
IDENT ')' where these are two expressions, one IDENT and '(' IDENT ')'?
how does python and sml for example handle this issue?
SML requires ';' between expressions that are evaluated in sequence.
The above would be written as

(1;
x+y;
f x)

Note that arguments to functions do not need to be in parentheses
(unless they use infix operators, as these have lower precedence than
function application).

An alternative is making newlines and indentation significant. Haskell
and Python both do that. In functional languages, you often need long
expressions, so you should allow multiline expressions. It is common to
use indentation to indicate if a new line continues an expression or
starts a new expression, such as: Every extra leading space is read as a
'(' and every less leading space is read as a ')'. If there is no
difference in leading spaces, you can add a ';' or similar symbol. This
is, basically, what Haskell does, except (IIRC) that it uses indentation
only in declarations, so the bracketing symbols are '{' and '}'.
Expressions in Haskell don't have side effects, so it doesn't make sense
to evaluate a sequence of expressions and only use the value of the
last.

Torben
Continue reading on narkive:
Loading...