Project

General

Profile

ComparisonWithPygments » History » Version 5

Kornelius Kalnbach, 12/05/2010 03:56 AM

1 1 Kornelius Kalnbach
h1. Comparison with Pygments
2 1 Kornelius Kalnbach
3 5 Kornelius Kalnbach
h2. General differences
4 1 Kornelius Kalnbach
5 1 Kornelius Kalnbach
* CodeRay is a Ruby library, Pygments is written in Python.
6 1 Kornelius Kalnbach
* CodeRay supports 19 languages, while Pygments supports over 90.
7 1 Kornelius Kalnbach
* CodeRay has handwritten scanners. In Pygments, scanners are defined with a scanner DSL.
8 1 Kornelius Kalnbach
9 1 Kornelius Kalnbach
h2. Handwritten vs. DSL, Pro & Contra
10 1 Kornelius Kalnbach
11 1 Kornelius Kalnbach
The last two differences in the list above are very much related.
12 1 Kornelius Kalnbach
13 1 Kornelius Kalnbach
h3. Pro: handwritten scanners (CodeRay)
14 1 Kornelius Kalnbach
15 1 Kornelius Kalnbach
* faster
16 1 Kornelius Kalnbach
** lots of fine tuning is possible
17 1 Kornelius Kalnbach
** no overhead for DSL transformation and interpretation
18 1 Kornelius Kalnbach
* more flexible
19 1 Kornelius Kalnbach
20 3 Kornelius Kalnbach
Contra:
21 1 Kornelius Kalnbach
22 3 Kornelius Kalnbach
* writing scanners is a lot of work
23 3 Kornelius Kalnbach
* almost nobody understands how to create good scanners
24 3 Kornelius Kalnbach
25 3 Kornelius Kalnbach
h3. Scanner definition (Pygments)
26 3 Kornelius Kalnbach
27 3 Kornelius Kalnbach
(Note: In Pygments, scanners are called "lexers".)
28 3 Kornelius Kalnbach
29 3 Kornelius Kalnbach
Pro:
30 3 Kornelius Kalnbach
31 1 Kornelius Kalnbach
* easier to write, read, and maintain
32 1 Kornelius Kalnbach
** less code
33 3 Kornelius Kalnbach
** even beginners can write decent scanners
34 1 Kornelius Kalnbach
* DSL interpreter can be optimized/changed independently
35 1 Kornelius Kalnbach
* porting scanners is easier
36 2 Kornelius Kalnbach
* use of higher-level features (like token groups or stacks) is simple
37 1 Kornelius Kalnbach
38 3 Kornelius Kalnbach
Contra: 
39 3 Kornelius Kalnbach
40 3 Kornelius Kalnbach
* may need hacks for complex languages (eg. the "ExtendedRegexLexer":http://pygments.org/docs/lexerdevelopment/#the-extendedregexlexer-class)
41 3 Kornelius Kalnbach
42 3 Kornelius Kalnbach
h3. Thoughts: LexDL
43 3 Kornelius Kalnbach
44 3 Kornelius Kalnbach
A common scanner/lexer definition language, which can be read by both Pygments and a hypothetical ports in other languages, would be most useful. The definitions could be maintained in a common code repository.
45 3 Kornelius Kalnbach
46 3 Kornelius Kalnbach
Here's a spontaneous example of a possible JSON representation:
47 3 Kornelius Kalnbach
48 4 Kornelius Kalnbach
<pre><code class="json">
49 3 Kornelius Kalnbach
  {
50 3 Kornelius Kalnbach
    "name": "Diff",
51 3 Kornelius Kalnbach
    "aliases": ["diff"],
52 3 Kornelius Kalnbach
    "filenames": ["*.diff"],
53 3 Kornelius Kalnbach
    "tokens": {
54 3 Kornelius Kalnbach
      "root": [
55 3 Kornelius Kalnbach
        [" .*\n", "Text"],
56 3 Kornelius Kalnbach
        ["\+.*\n", "Generic.Inserted"],
57 3 Kornelius Kalnbach
        ["-.*\n", "Generic.Deleted"],
58 3 Kornelius Kalnbach
        ["@.*\n", "Generic.Subheading"],
59 3 Kornelius Kalnbach
        ["Index.*\n", "Generic.Heading"],
60 3 Kornelius Kalnbach
        ["=.*\n", "Generic.Heading"],
61 3 Kornelius Kalnbach
        [".*\n", "Text"]
62 3 Kornelius Kalnbach
      ],
63 3 Kornelius Kalnbach
      ...
64 3 Kornelius Kalnbach
    }
65 3 Kornelius Kalnbach
  }
66 4 Kornelius Kalnbach
</code></pre>
67 3 Kornelius Kalnbach
68 1 Kornelius Kalnbach
h2. Other differences
69 3 Kornelius Kalnbach
70 3 Kornelius Kalnbach
h3. Regular expressions engine
71 3 Kornelius Kalnbach
72 3 Kornelius Kalnbach
Python's regexps are more powerful than the regexps of Ruby 1.8, and less powerful than the new Ruby 1.9 ones. However, most expressions used in the scanners can be interpreted by all engines. Ruby's StringScanner has some limitations in the use of regexps.
73 2 Kornelius Kalnbach
74 2 Kornelius Kalnbach
h3. Token kinds vs. token types
75 2 Kornelius Kalnbach
76 2 Kornelius Kalnbach
CodeRay represents tokens with a Token Kind (see #122), which is just a Ruby symbol ("source":http://redmine.rubychan.de/projects/coderay/repository/entry/trunk/lib/coderay/token_classes.rb?rev=452).
77 2 Kornelius Kalnbach
Pygments uses a hierarchical token type/subtype system ("source":http://bitbucket.org/birkenfeld/pygments-main/src/f90ec0252e78/pygments/token.py#cl-47), which is more complex to implement (and slower), but more flexible and easier to understand for authors of new language definitions.
78 2 Kornelius Kalnbach
79 2 Kornelius Kalnbach
h3. Token groups
80 2 Kornelius Kalnbach
81 2 Kornelius Kalnbach
CodeRay supports token groups, which map nicely to SPANs in the HTML output. A token group has a token kind and can contain tokens and other token groups. The final color of a token depends on the group nesting it is in (for example, @string/delimiter@ has a different color than @regexp/delimiter@.) Groups are represented with special @:open@ and @:close@ tokens.
82 2 Kornelius Kalnbach
83 2 Kornelius Kalnbach
Token groups allow CSS-style color definitions, which are most useful for HTML output. Pygments doesn't have a comparable feature; you can see that strings are usually a single token in Pygments, while the delimiting quotes are usually separate tokens in CodeRay.
84 2 Kornelius Kalnbach
85 2 Kornelius Kalnbach
CodeRay is optimized for HTML/CSS output. The concept of token groups may be ported to LaTeX or console output, but it's not trivial.
86 2 Kornelius Kalnbach
87 2 Kornelius Kalnbach
h3. Filters
88 2 Kornelius Kalnbach
89 2 Kornelius Kalnbach
Pygments has "filters":http://pygments.org/docs/filters/#builtin-filters, which manipulate the token stream in some way. You can do some cool tricks with these. CodeRay currently lacks such a feature.
90 2 Kornelius Kalnbach
91 2 Kornelius Kalnbach
h3. Plugins
92 2 Kornelius Kalnbach
93 2 Kornelius Kalnbach
Pygments and CodeRay allow extension via plugins. The specific details are different, but it's simple.