Project

General

Profile

ComparisonWithPygments » History » Version 6

Kornelius Kalnbach, 12/05/2010 03:56 AM

1 1 Kornelius Kalnbach
h1. Comparison with Pygments
2 1 Kornelius Kalnbach
3 5 Kornelius Kalnbach
h2. General differences
4 1 Kornelius Kalnbach
5 1 Kornelius Kalnbach
* CodeRay is a Ruby library, Pygments is written in Python.
6 1 Kornelius Kalnbach
* CodeRay supports 19 languages, while Pygments supports over 90.
7 1 Kornelius Kalnbach
* CodeRay has handwritten scanners. In Pygments, scanners are defined with a scanner DSL.
8 1 Kornelius Kalnbach
9 1 Kornelius Kalnbach
h2. Handwritten vs. DSL, Pro & Contra
10 1 Kornelius Kalnbach
11 1 Kornelius Kalnbach
The last two differences in the list above are very much related.
12 1 Kornelius Kalnbach
13 6 Kornelius Kalnbach
h3. Handwritten scanners (CodeRay)
14 6 Kornelius Kalnbach
15 6 Kornelius Kalnbach
Pro:
16 1 Kornelius Kalnbach
17 1 Kornelius Kalnbach
* faster
18 1 Kornelius Kalnbach
** lots of fine tuning is possible
19 1 Kornelius Kalnbach
** no overhead for DSL transformation and interpretation
20 1 Kornelius Kalnbach
* more flexible
21 1 Kornelius Kalnbach
22 3 Kornelius Kalnbach
Contra:
23 1 Kornelius Kalnbach
24 3 Kornelius Kalnbach
* writing scanners is a lot of work
25 3 Kornelius Kalnbach
* almost nobody understands how to create good scanners
26 3 Kornelius Kalnbach
27 3 Kornelius Kalnbach
h3. Scanner definition (Pygments)
28 3 Kornelius Kalnbach
29 3 Kornelius Kalnbach
(Note: In Pygments, scanners are called "lexers".)
30 3 Kornelius Kalnbach
31 3 Kornelius Kalnbach
Pro:
32 3 Kornelius Kalnbach
33 1 Kornelius Kalnbach
* easier to write, read, and maintain
34 1 Kornelius Kalnbach
** less code
35 3 Kornelius Kalnbach
** even beginners can write decent scanners
36 1 Kornelius Kalnbach
* DSL interpreter can be optimized/changed independently
37 1 Kornelius Kalnbach
* porting scanners is easier
38 2 Kornelius Kalnbach
* use of higher-level features (like token groups or stacks) is simple
39 1 Kornelius Kalnbach
40 3 Kornelius Kalnbach
Contra: 
41 3 Kornelius Kalnbach
42 3 Kornelius Kalnbach
* may need hacks for complex languages (eg. the "ExtendedRegexLexer":http://pygments.org/docs/lexerdevelopment/#the-extendedregexlexer-class)
43 3 Kornelius Kalnbach
44 3 Kornelius Kalnbach
h3. Thoughts: LexDL
45 3 Kornelius Kalnbach
46 3 Kornelius Kalnbach
A common scanner/lexer definition language, which can be read by both Pygments and a hypothetical ports in other languages, would be most useful. The definitions could be maintained in a common code repository.
47 3 Kornelius Kalnbach
48 3 Kornelius Kalnbach
Here's a spontaneous example of a possible JSON representation:
49 3 Kornelius Kalnbach
50 4 Kornelius Kalnbach
<pre><code class="json">
51 3 Kornelius Kalnbach
  {
52 3 Kornelius Kalnbach
    "name": "Diff",
53 3 Kornelius Kalnbach
    "aliases": ["diff"],
54 3 Kornelius Kalnbach
    "filenames": ["*.diff"],
55 3 Kornelius Kalnbach
    "tokens": {
56 3 Kornelius Kalnbach
      "root": [
57 3 Kornelius Kalnbach
        [" .*\n", "Text"],
58 3 Kornelius Kalnbach
        ["\+.*\n", "Generic.Inserted"],
59 3 Kornelius Kalnbach
        ["-.*\n", "Generic.Deleted"],
60 3 Kornelius Kalnbach
        ["@.*\n", "Generic.Subheading"],
61 3 Kornelius Kalnbach
        ["Index.*\n", "Generic.Heading"],
62 3 Kornelius Kalnbach
        ["=.*\n", "Generic.Heading"],
63 3 Kornelius Kalnbach
        [".*\n", "Text"]
64 3 Kornelius Kalnbach
      ],
65 3 Kornelius Kalnbach
      ...
66 3 Kornelius Kalnbach
    }
67 3 Kornelius Kalnbach
  }
68 4 Kornelius Kalnbach
</code></pre>
69 3 Kornelius Kalnbach
70 1 Kornelius Kalnbach
h2. Other differences
71 3 Kornelius Kalnbach
72 3 Kornelius Kalnbach
h3. Regular expressions engine
73 3 Kornelius Kalnbach
74 3 Kornelius Kalnbach
Python's regexps are more powerful than the regexps of Ruby 1.8, and less powerful than the new Ruby 1.9 ones. However, most expressions used in the scanners can be interpreted by all engines. Ruby's StringScanner has some limitations in the use of regexps.
75 2 Kornelius Kalnbach
76 2 Kornelius Kalnbach
h3. Token kinds vs. token types
77 2 Kornelius Kalnbach
78 2 Kornelius Kalnbach
CodeRay represents tokens with a Token Kind (see #122), which is just a Ruby symbol ("source":http://redmine.rubychan.de/projects/coderay/repository/entry/trunk/lib/coderay/token_classes.rb?rev=452).
79 2 Kornelius Kalnbach
Pygments uses a hierarchical token type/subtype system ("source":http://bitbucket.org/birkenfeld/pygments-main/src/f90ec0252e78/pygments/token.py#cl-47), which is more complex to implement (and slower), but more flexible and easier to understand for authors of new language definitions.
80 2 Kornelius Kalnbach
81 2 Kornelius Kalnbach
h3. Token groups
82 2 Kornelius Kalnbach
83 2 Kornelius Kalnbach
CodeRay supports token groups, which map nicely to SPANs in the HTML output. A token group has a token kind and can contain tokens and other token groups. The final color of a token depends on the group nesting it is in (for example, @string/delimiter@ has a different color than @regexp/delimiter@.) Groups are represented with special @:open@ and @:close@ tokens.
84 2 Kornelius Kalnbach
85 2 Kornelius Kalnbach
Token groups allow CSS-style color definitions, which are most useful for HTML output. Pygments doesn't have a comparable feature; you can see that strings are usually a single token in Pygments, while the delimiting quotes are usually separate tokens in CodeRay.
86 2 Kornelius Kalnbach
87 2 Kornelius Kalnbach
CodeRay is optimized for HTML/CSS output. The concept of token groups may be ported to LaTeX or console output, but it's not trivial.
88 2 Kornelius Kalnbach
89 2 Kornelius Kalnbach
h3. Filters
90 2 Kornelius Kalnbach
91 2 Kornelius Kalnbach
Pygments has "filters":http://pygments.org/docs/filters/#builtin-filters, which manipulate the token stream in some way. You can do some cool tricks with these. CodeRay currently lacks such a feature.
92 2 Kornelius Kalnbach
93 2 Kornelius Kalnbach
h3. Plugins
94 2 Kornelius Kalnbach
95 2 Kornelius Kalnbach
Pygments and CodeRay allow extension via plugins. The specific details are different, but it's simple.