ComparisonWithPygments » History » Version 6
Kornelius Kalnbach, 12/05/2010 03:56 AM
1 | 1 | Kornelius Kalnbach | h1. Comparison with Pygments |
---|---|---|---|
2 | 1 | Kornelius Kalnbach | |
3 | 5 | Kornelius Kalnbach | h2. General differences |
4 | 1 | Kornelius Kalnbach | |
5 | 1 | Kornelius Kalnbach | * CodeRay is a Ruby library, Pygments is written in Python. |
6 | 1 | Kornelius Kalnbach | * CodeRay supports 19 languages, while Pygments supports over 90. |
7 | 1 | Kornelius Kalnbach | * CodeRay has handwritten scanners. In Pygments, scanners are defined with a scanner DSL. |
8 | 1 | Kornelius Kalnbach | |
9 | 1 | Kornelius Kalnbach | h2. Handwritten vs. DSL, Pro & Contra |
10 | 1 | Kornelius Kalnbach | |
11 | 1 | Kornelius Kalnbach | The last two differences in the list above are very much related. |
12 | 1 | Kornelius Kalnbach | |
13 | 6 | Kornelius Kalnbach | h3. Handwritten scanners (CodeRay) |
14 | 6 | Kornelius Kalnbach | |
15 | 6 | Kornelius Kalnbach | Pro: |
16 | 1 | Kornelius Kalnbach | |
17 | 1 | Kornelius Kalnbach | * faster |
18 | 1 | Kornelius Kalnbach | ** lots of fine tuning is possible |
19 | 1 | Kornelius Kalnbach | ** no overhead for DSL transformation and interpretation |
20 | 1 | Kornelius Kalnbach | * more flexible |
21 | 1 | Kornelius Kalnbach | |
22 | 3 | Kornelius Kalnbach | Contra: |
23 | 1 | Kornelius Kalnbach | |
24 | 3 | Kornelius Kalnbach | * writing scanners is a lot of work |
25 | 3 | Kornelius Kalnbach | * almost nobody understands how to create good scanners |
26 | 3 | Kornelius Kalnbach | |
27 | 3 | Kornelius Kalnbach | h3. Scanner definition (Pygments) |
28 | 3 | Kornelius Kalnbach | |
29 | 3 | Kornelius Kalnbach | (Note: In Pygments, scanners are called "lexers".) |
30 | 3 | Kornelius Kalnbach | |
31 | 3 | Kornelius Kalnbach | Pro: |
32 | 3 | Kornelius Kalnbach | |
33 | 1 | Kornelius Kalnbach | * easier to write, read, and maintain |
34 | 1 | Kornelius Kalnbach | ** less code |
35 | 3 | Kornelius Kalnbach | ** even beginners can write decent scanners |
36 | 1 | Kornelius Kalnbach | * DSL interpreter can be optimized/changed independently |
37 | 1 | Kornelius Kalnbach | * porting scanners is easier |
38 | 2 | Kornelius Kalnbach | * use of higher-level features (like token groups or stacks) is simple |
39 | 1 | Kornelius Kalnbach | |
40 | 3 | Kornelius Kalnbach | Contra: |
41 | 3 | Kornelius Kalnbach | |
42 | 3 | Kornelius Kalnbach | * may need hacks for complex languages (eg. the "ExtendedRegexLexer":http://pygments.org/docs/lexerdevelopment/#the-extendedregexlexer-class) |
43 | 3 | Kornelius Kalnbach | |
44 | 3 | Kornelius Kalnbach | h3. Thoughts: LexDL |
45 | 3 | Kornelius Kalnbach | |
46 | 3 | Kornelius Kalnbach | A common scanner/lexer definition language, which can be read by both Pygments and a hypothetical ports in other languages, would be most useful. The definitions could be maintained in a common code repository. |
47 | 3 | Kornelius Kalnbach | |
48 | 3 | Kornelius Kalnbach | Here's a spontaneous example of a possible JSON representation: |
49 | 3 | Kornelius Kalnbach | |
50 | 4 | Kornelius Kalnbach | <pre><code class="json"> |
51 | 3 | Kornelius Kalnbach | { |
52 | 3 | Kornelius Kalnbach | "name": "Diff", |
53 | 3 | Kornelius Kalnbach | "aliases": ["diff"], |
54 | 3 | Kornelius Kalnbach | "filenames": ["*.diff"], |
55 | 3 | Kornelius Kalnbach | "tokens": { |
56 | 3 | Kornelius Kalnbach | "root": [ |
57 | 3 | Kornelius Kalnbach | [" .*\n", "Text"], |
58 | 3 | Kornelius Kalnbach | ["\+.*\n", "Generic.Inserted"], |
59 | 3 | Kornelius Kalnbach | ["-.*\n", "Generic.Deleted"], |
60 | 3 | Kornelius Kalnbach | ["@.*\n", "Generic.Subheading"], |
61 | 3 | Kornelius Kalnbach | ["Index.*\n", "Generic.Heading"], |
62 | 3 | Kornelius Kalnbach | ["=.*\n", "Generic.Heading"], |
63 | 3 | Kornelius Kalnbach | [".*\n", "Text"] |
64 | 3 | Kornelius Kalnbach | ], |
65 | 3 | Kornelius Kalnbach | ... |
66 | 3 | Kornelius Kalnbach | } |
67 | 3 | Kornelius Kalnbach | } |
68 | 4 | Kornelius Kalnbach | </code></pre> |
69 | 3 | Kornelius Kalnbach | |
70 | 1 | Kornelius Kalnbach | h2. Other differences |
71 | 3 | Kornelius Kalnbach | |
72 | 3 | Kornelius Kalnbach | h3. Regular expressions engine |
73 | 3 | Kornelius Kalnbach | |
74 | 3 | Kornelius Kalnbach | Python's regexps are more powerful than the regexps of Ruby 1.8, and less powerful than the new Ruby 1.9 ones. However, most expressions used in the scanners can be interpreted by all engines. Ruby's StringScanner has some limitations in the use of regexps. |
75 | 2 | Kornelius Kalnbach | |
76 | 2 | Kornelius Kalnbach | h3. Token kinds vs. token types |
77 | 2 | Kornelius Kalnbach | |
78 | 2 | Kornelius Kalnbach | CodeRay represents tokens with a Token Kind (see #122), which is just a Ruby symbol ("source":http://redmine.rubychan.de/projects/coderay/repository/entry/trunk/lib/coderay/token_classes.rb?rev=452). |
79 | 2 | Kornelius Kalnbach | Pygments uses a hierarchical token type/subtype system ("source":http://bitbucket.org/birkenfeld/pygments-main/src/f90ec0252e78/pygments/token.py#cl-47), which is more complex to implement (and slower), but more flexible and easier to understand for authors of new language definitions. |
80 | 2 | Kornelius Kalnbach | |
81 | 2 | Kornelius Kalnbach | h3. Token groups |
82 | 2 | Kornelius Kalnbach | |
83 | 2 | Kornelius Kalnbach | CodeRay supports token groups, which map nicely to SPANs in the HTML output. A token group has a token kind and can contain tokens and other token groups. The final color of a token depends on the group nesting it is in (for example, @string/delimiter@ has a different color than @regexp/delimiter@.) Groups are represented with special @:open@ and @:close@ tokens. |
84 | 2 | Kornelius Kalnbach | |
85 | 2 | Kornelius Kalnbach | Token groups allow CSS-style color definitions, which are most useful for HTML output. Pygments doesn't have a comparable feature; you can see that strings are usually a single token in Pygments, while the delimiting quotes are usually separate tokens in CodeRay. |
86 | 2 | Kornelius Kalnbach | |
87 | 2 | Kornelius Kalnbach | CodeRay is optimized for HTML/CSS output. The concept of token groups may be ported to LaTeX or console output, but it's not trivial. |
88 | 2 | Kornelius Kalnbach | |
89 | 2 | Kornelius Kalnbach | h3. Filters |
90 | 2 | Kornelius Kalnbach | |
91 | 2 | Kornelius Kalnbach | Pygments has "filters":http://pygments.org/docs/filters/#builtin-filters, which manipulate the token stream in some way. You can do some cool tricks with these. CodeRay currently lacks such a feature. |
92 | 2 | Kornelius Kalnbach | |
93 | 2 | Kornelius Kalnbach | h3. Plugins |
94 | 2 | Kornelius Kalnbach | |
95 | 2 | Kornelius Kalnbach | Pygments and CodeRay allow extension via plugins. The specific details are different, but it's simple. |