ComparisonWithPygments » History » Version 3
Version 2 (Kornelius Kalnbach, 12/05/2010 03:39 AM) → Version 3/6 (Kornelius Kalnbach, 12/05/2010 03:54 AM)
h1. Comparison with Pygments
h2. General sifferences
* CodeRay is a Ruby library, Pygments is written in Python.
* CodeRay supports 19 languages, while Pygments supports over 90.
* CodeRay has handwritten scanners. In Pygments, scanners are defined with a scanner DSL.
h2. Handwritten vs. DSL, Pro & Contra
The last two differences in the list above are very much related.
h3. Pro: handwritten scanners (CodeRay)
* faster
** lots of fine tuning is possible
** no overhead for DSL transformation and interpretation
* more flexible
Contra:
* writing scanners is a lot of work
* almost nobody understands how to create good scanners
h3. Scanner Pro: scanner definition (Pygments)
(Note: In Pygments, scanners are called "lexers".)
Pro:
* easier to write, read, and maintain
** less code
** even beginners can write decent scanners
* DSL interpreter can be optimized/changed independently
* porting scanners is easier
* use of higher-level features (like token groups or stacks) is simple
Contra:
* may need hacks for complex languages (eg. the "ExtendedRegexLexer":http://pygments.org/docs/lexerdevelopment/#the-extendedregexlexer-class)
h3. Thoughts: LexDL
A common scanner/lexer definition language, which can be read by both Pygments and a hypothetical ports in other languages, would be most useful. The definitions could be maintained in a common code repository.
Here's a spontaneous example of a possible JSON representation:
{{{
{
"name": "Diff",
"aliases": ["diff"],
"filenames": ["*.diff"],
"tokens": {
"root": [
[" .*\n", "Text"],
["\+.*\n", "Generic.Inserted"],
["-.*\n", "Generic.Deleted"],
["@.*\n", "Generic.Subheading"],
["Index.*\n", "Generic.Heading"],
["=.*\n", "Generic.Heading"],
[".*\n", "Text"]
],
...
}
}
}}}
h2. Other differences
h3. Regular expressions engine
Python's regexps are more powerful than the regexps of Ruby 1.8, and less powerful than the new Ruby 1.9 ones. However, most expressions used in the scanners can be interpreted by all engines. Ruby's StringScanner has some limitations in the use of regexps.
h3. Token kinds vs. token types
CodeRay represents tokens with a Token Kind (see #122), which is just a Ruby symbol ("source":http://redmine.rubychan.de/projects/coderay/repository/entry/trunk/lib/coderay/token_classes.rb?rev=452).
Pygments uses a hierarchical token type/subtype system ("source":http://bitbucket.org/birkenfeld/pygments-main/src/f90ec0252e78/pygments/token.py#cl-47), which is more complex to implement (and slower), but more flexible and easier to understand for authors of new language definitions.
h3. Token groups
CodeRay supports token groups, which map nicely to SPANs in the HTML output. A token group has a token kind and can contain tokens and other token groups. The final color of a token depends on the group nesting it is in (for example, @string/delimiter@ has a different color than @regexp/delimiter@.) Groups are represented with special @:open@ and @:close@ tokens.
Token groups allow CSS-style color definitions, which are most useful for HTML output. Pygments doesn't have a comparable feature; you can see that strings are usually a single token in Pygments, while the delimiting quotes are usually separate tokens in CodeRay.
CodeRay is optimized for HTML/CSS output. The concept of token groups may be ported to LaTeX or console output, but it's not trivial.
h3. Filters
Pygments has "filters":http://pygments.org/docs/filters/#builtin-filters, which manipulate the token stream in some way. You can do some cool tricks with these. CodeRay currently lacks such a feature.
h3. Plugins
Pygments and CodeRay allow extension via plugins. The specific details are different, but it's simple.
h2. General sifferences
* CodeRay is a Ruby library, Pygments is written in Python.
* CodeRay supports 19 languages, while Pygments supports over 90.
* CodeRay has handwritten scanners. In Pygments, scanners are defined with a scanner DSL.
h2. Handwritten vs. DSL, Pro & Contra
The last two differences in the list above are very much related.
h3. Pro: handwritten scanners (CodeRay)
* faster
** lots of fine tuning is possible
** no overhead for DSL transformation and interpretation
* more flexible
Contra:
* writing scanners is a lot of work
* almost nobody understands how to create good scanners
h3. Scanner Pro: scanner definition (Pygments)
(Note: In Pygments, scanners are called "lexers".)
Pro:
* easier to write, read, and maintain
** less code
** even beginners can write decent scanners
* DSL interpreter can be optimized/changed independently
* porting scanners is easier
* use of higher-level features (like token groups or stacks) is simple
Contra:
* may need hacks for complex languages (eg. the "ExtendedRegexLexer":http://pygments.org/docs/lexerdevelopment/#the-extendedregexlexer-class)
h3. Thoughts: LexDL
A common scanner/lexer definition language, which can be read by both Pygments and a hypothetical ports in other languages, would be most useful. The definitions could be maintained in a common code repository.
Here's a spontaneous example of a possible JSON representation:
{{{
{
"name": "Diff",
"aliases": ["diff"],
"filenames": ["*.diff"],
"tokens": {
"root": [
[" .*\n", "Text"],
["\+.*\n", "Generic.Inserted"],
["-.*\n", "Generic.Deleted"],
["@.*\n", "Generic.Subheading"],
["Index.*\n", "Generic.Heading"],
["=.*\n", "Generic.Heading"],
[".*\n", "Text"]
],
...
}
}
}}}
h2. Other differences
h3. Regular expressions engine
Python's regexps are more powerful than the regexps of Ruby 1.8, and less powerful than the new Ruby 1.9 ones. However, most expressions used in the scanners can be interpreted by all engines. Ruby's StringScanner has some limitations in the use of regexps.
h3. Token kinds vs. token types
CodeRay represents tokens with a Token Kind (see #122), which is just a Ruby symbol ("source":http://redmine.rubychan.de/projects/coderay/repository/entry/trunk/lib/coderay/token_classes.rb?rev=452).
Pygments uses a hierarchical token type/subtype system ("source":http://bitbucket.org/birkenfeld/pygments-main/src/f90ec0252e78/pygments/token.py#cl-47), which is more complex to implement (and slower), but more flexible and easier to understand for authors of new language definitions.
h3. Token groups
CodeRay supports token groups, which map nicely to SPANs in the HTML output. A token group has a token kind and can contain tokens and other token groups. The final color of a token depends on the group nesting it is in (for example, @string/delimiter@ has a different color than @regexp/delimiter@.) Groups are represented with special @:open@ and @:close@ tokens.
Token groups allow CSS-style color definitions, which are most useful for HTML output. Pygments doesn't have a comparable feature; you can see that strings are usually a single token in Pygments, while the delimiting quotes are usually separate tokens in CodeRay.
CodeRay is optimized for HTML/CSS output. The concept of token groups may be ported to LaTeX or console output, but it's not trivial.
h3. Filters
Pygments has "filters":http://pygments.org/docs/filters/#builtin-filters, which manipulate the token stream in some way. You can do some cool tricks with these. CodeRay currently lacks such a feature.
h3. Plugins
Pygments and CodeRay allow extension via plugins. The specific details are different, but it's simple.