Thant Tessman
2009-01-23 20:32:02 UTC
I've been thinking about graphics file formats on and off for many years
now. Somewhere along the way I figured out that people were thinking
about the problem from the wrong point of view.
The problem is that graphics file formats invariably have to let the
semantics of the data they represent drive the syntax of the format. For
example, geometric vertex position data usually--but not
necessarily--takes the form of homogeneous arrays of 3-dimensional
coordinates. Applications are forced to work within the semantic
confines of the data structures the format can faithfully represent. If
the format hopes to allow for the efficient parsing of data directly
into memory, its design is forced to anticipate every kind of data its
designers think might need to be stored.
Invariably, designers are forced to create complex formats that never
quite transcend an overly-specific approach to the development of
computer graphics applications.
What we need is not a file format, but a programming language--or at
least something like a programming language. This programming language's
job is merely to read in a 'program' and build the data structures
described. It doesn't need to evaluate anything. It doesn't need control
structures. It doesn't need functions. But it does need a type system.
The end goal was always to provide the services of this 'language' as a
highly-optimized C++ library, but on my third or fourth attempt to
create just such a 'language' I decided to do a 'reference'
implementation in Standard ML. What this bought me was a huge amount of
confidence that there were no holes in the semantics or un-accounted-for
corner cases in the design.
The language is described here:
http://www.thant.com/projects/dl/dl090117.pdf
The SML 'reference' implementation is available here:
http://www.thant.com/projects/dl/dl_sml_090122.tar.gz
It builds for both SML/NJ and MLton.
Data Language can be thought of as playing the role that XML plays in
the Collada standard, only it's more efficient, easier to implement,
safer, and can be described in a dozen pages instead of a hundred. And I
can't help but think that even my SML implementation is far more
efficient than any XML library written in C. The Collada standard
explicitly states that it is not a "run-time delivery format" or
"streaming-friendly." But my C++ implementation of my "Data Language" is
already serving as exactly that.
Although I do have some experience with XML, I'm no expert. What am I
missing? What is the use of XML for these kinds of things buying people?
-thant
now. Somewhere along the way I figured out that people were thinking
about the problem from the wrong point of view.
The problem is that graphics file formats invariably have to let the
semantics of the data they represent drive the syntax of the format. For
example, geometric vertex position data usually--but not
necessarily--takes the form of homogeneous arrays of 3-dimensional
coordinates. Applications are forced to work within the semantic
confines of the data structures the format can faithfully represent. If
the format hopes to allow for the efficient parsing of data directly
into memory, its design is forced to anticipate every kind of data its
designers think might need to be stored.
Invariably, designers are forced to create complex formats that never
quite transcend an overly-specific approach to the development of
computer graphics applications.
What we need is not a file format, but a programming language--or at
least something like a programming language. This programming language's
job is merely to read in a 'program' and build the data structures
described. It doesn't need to evaluate anything. It doesn't need control
structures. It doesn't need functions. But it does need a type system.
The end goal was always to provide the services of this 'language' as a
highly-optimized C++ library, but on my third or fourth attempt to
create just such a 'language' I decided to do a 'reference'
implementation in Standard ML. What this bought me was a huge amount of
confidence that there were no holes in the semantics or un-accounted-for
corner cases in the design.
The language is described here:
http://www.thant.com/projects/dl/dl090117.pdf
The SML 'reference' implementation is available here:
http://www.thant.com/projects/dl/dl_sml_090122.tar.gz
It builds for both SML/NJ and MLton.
Data Language can be thought of as playing the role that XML plays in
the Collada standard, only it's more efficient, easier to implement,
safer, and can be described in a dozen pages instead of a hundred. And I
can't help but think that even my SML implementation is far more
efficient than any XML library written in C. The Collada standard
explicitly states that it is not a "run-time delivery format" or
"streaming-friendly." But my C++ implementation of my "Data Language" is
already serving as exactly that.
Although I do have some experience with XML, I'm no expert. What am I
missing? What is the use of XML for these kinds of things buying people?
-thant