ScannerRequests » History » Version 3
Kornelius Kalnbach, 01/30/2009 05:29 AM
1 | 1 | Kornelius Kalnbach | h1. Scanner Requests |
---|---|---|---|
2 | 1 | Kornelius Kalnbach | |
3 | 1 | Kornelius Kalnbach | Scanners are the heart of CodeRay. They split input code into tokens and classify them. |
4 | 1 | Kornelius Kalnbach | |
5 | 1 | Kornelius Kalnbach | Each language has its own scanner: You can see what languages are currently supported in the "repository":http://code.licenser.net/repositories/browse/coderay/trunk/lib/coderay/scanners. |
6 | 1 | Kornelius Kalnbach | |
7 | 1 | Kornelius Kalnbach | h2. Why is the CodeRay language support list so short? |
8 | 1 | Kornelius Kalnbach | |
9 | 1 | Kornelius Kalnbach | CodeRay developing is a slow process, because the total number of active developers is 1 and he insists on high software quality. |
10 | 1 | Kornelius Kalnbach | |
11 | 2 | Kornelius Kalnbach | Special attention is paid to the scanners: every CodeRay scanner is being tested carefully against lots of example source code, and also randomized and junk code to make it safe. A CodeRay scanner is not officially released unless it highlights very, very well. |
12 | 1 | Kornelius Kalnbach | |
13 | 1 | Kornelius Kalnbach | h2. I need a new Scanner - What can I do? |
14 | 1 | Kornelius Kalnbach | |
15 | 1 | Kornelius Kalnbach | Here's what you can do to speed up the development of a new scanner: |
16 | 1 | Kornelius Kalnbach | |
17 | 1 | Kornelius Kalnbach | # Request it! File a "new ticket":http://code.licenser.net/projects/coderay/issues/new unless it already "exists":http://code.licenser.net/projects/coderay/issues?query_id=3; add a +1 or something to existing tickets to show your interest. |
18 | 1 | Kornelius Kalnbach | # Upload or link to *example code* in the ticket discussion. |
19 | 1 | Kornelius Kalnbach | #* Typical code in large quantities is very helpful, also for benchmarking. |
20 | 1 | Kornelius Kalnbach | #* But we also need the most *weird and strange code* you can find to make the scanner. |
21 | 1 | Kornelius Kalnbach | # Provide links to useful *information about the language lexic*, such as: |
22 | 1 | Kornelius Kalnbach | #* a list of reserved words (Did you know that "void" is a JavaScript keyword?) |
23 | 1 | Kornelius Kalnbach | #* rules for string and number literals (Can a double quoted string contain a newline?) |
24 | 1 | Kornelius Kalnbach | #* rules for comments and other token types (Does XYZ have a special syntax for multiline comments?) |
25 | 1 | Kornelius Kalnbach | #* a description of any unusual syntactic features (There's this weird %w() thing in Ruby...) |
26 | 1 | Kornelius Kalnbach | #* If there are different versions / implementations / dialects of this language: How do they differ? |
27 | 1 | Kornelius Kalnbach | # Give examples for *good and bad highlighters / syntax definitions* for the language (usually from editors or other libraries) |
28 | 1 | Kornelius Kalnbach | # Find *more example code*! |
29 | 1 | Kornelius Kalnbach | |
30 | 1 | Kornelius Kalnbach | Also, read the next paragraph. |
31 | 1 | Kornelius Kalnbach | |
32 | 1 | Kornelius Kalnbach | h2. I want to write a Scanner myself |
33 | 1 | Kornelius Kalnbach | |
34 | 1 | Kornelius Kalnbach | Wow, you're brave! Writing CodeRay scanners is not an easy task because: |
35 | 1 | Kornelius Kalnbach | |
36 | 1 | Kornelius Kalnbach | * You need excellent knowledge about the language you want to scan. Every language has a dark side! |
37 | 1 | Kornelius Kalnbach | * You need good knowledge of (Ruby) regular expressions. |
38 | 1 | Kornelius Kalnbach | * There's no documentation to speak of. |
39 | 1 | Kornelius Kalnbach | ** But this is a wiki ^hint hint^ ;o) |
40 | 1 | Kornelius Kalnbach | |
41 | 1 | Kornelius Kalnbach | But it has been done before, so go and try it! |
42 | 1 | Kornelius Kalnbach | |
43 | 1 | Kornelius Kalnbach | # You should still request the scanner (as described above) and announce that you are working on a patch yourself. |
44 | 1 | Kornelius Kalnbach | # Check out the "repository":http://code.licenser.net/wiki/coderay/Repository and try the test suite (@[lang=xyz] rake test:scanners@). |
45 | 1 | Kornelius Kalnbach | # Copy a scanner of your choice as a base. You would know what language comes closest. |
46 | 1 | Kornelius Kalnbach | # Create a test case directory in @test/scanners@. |
47 | 1 | Kornelius Kalnbach | # --- Advertisement --- (No, just kidding.) |
48 | 1 | Kornelius Kalnbach | # Write your scanner! |
49 | 1 | Kornelius Kalnbach | # Also, look into @lib/coderay/scanners/_map.rb@ and @lib/coderay/helpers/file_type.rb@. |
50 | 1 | Kornelius Kalnbach | # Make a patch (scanner, test cases and other changes) and upload it to the ticket. |
51 | 1 | Kornelius Kalnbach | # Follow the following discussion. |
52 | 1 | Kornelius Kalnbach | # Prepare to be added to the THX list. |
53 | 1 | Kornelius Kalnbach | |
54 | 1 | Kornelius Kalnbach | Contact me (murphy) if you have any questions. |
55 | 3 | Kornelius Kalnbach | |
56 | 3 | Kornelius Kalnbach | h2. How does a Scanner look? |
57 | 3 | Kornelius Kalnbach | |
58 | 3 | Kornelius Kalnbach | For example, the JSON scanner: |
59 | 3 | Kornelius Kalnbach | |
60 | 3 | Kornelius Kalnbach | <pre><code class="ruby"> |
61 | 3 | Kornelius Kalnbach | module CodeRay |
62 | 3 | Kornelius Kalnbach | module Scanners |
63 | 3 | Kornelius Kalnbach | |
64 | 3 | Kornelius Kalnbach | class JSON < Scanner |
65 | 3 | Kornelius Kalnbach | |
66 | 3 | Kornelius Kalnbach | include Streamable |
67 | 3 | Kornelius Kalnbach | |
68 | 3 | Kornelius Kalnbach | register_for :json |
69 | 3 | Kornelius Kalnbach | |
70 | 3 | Kornelius Kalnbach | CONSTANTS = %w( true false null ) |
71 | 3 | Kornelius Kalnbach | IDENT_KIND = WordList.new(:key).add(CONSTANTS, :reserved) |
72 | 3 | Kornelius Kalnbach | |
73 | 3 | Kornelius Kalnbach | ESCAPE = / [bfnrt\\"\/] /x |
74 | 3 | Kornelius Kalnbach | UNICODE_ESCAPE = / u[a-fA-F0-9]{4} /x |
75 | 3 | Kornelius Kalnbach | |
76 | 3 | Kornelius Kalnbach | def scan_tokens tokens, options |
77 | 3 | Kornelius Kalnbach | |
78 | 3 | Kornelius Kalnbach | state = :initial |
79 | 3 | Kornelius Kalnbach | stack = [] |
80 | 3 | Kornelius Kalnbach | string_delimiter = nil |
81 | 3 | Kornelius Kalnbach | key_expected = false |
82 | 3 | Kornelius Kalnbach | |
83 | 3 | Kornelius Kalnbach | until eos? |
84 | 3 | Kornelius Kalnbach | |
85 | 3 | Kornelius Kalnbach | kind = nil |
86 | 3 | Kornelius Kalnbach | match = nil |
87 | 3 | Kornelius Kalnbach | |
88 | 3 | Kornelius Kalnbach | case state |
89 | 3 | Kornelius Kalnbach | |
90 | 3 | Kornelius Kalnbach | when :initial |
91 | 3 | Kornelius Kalnbach | if match = scan(/ \s+ | \\\n /x) |
92 | 3 | Kornelius Kalnbach | tokens << [match, :space] |
93 | 3 | Kornelius Kalnbach | next |
94 | 3 | Kornelius Kalnbach | elsif match = scan(/ [:,\[{\]}] /x) |
95 | 3 | Kornelius Kalnbach | kind = :operator |
96 | 3 | Kornelius Kalnbach | case match |
97 | 3 | Kornelius Kalnbach | when '{' then stack << :object; key_expected = true |
98 | 3 | Kornelius Kalnbach | when '[' then stack << :array |
99 | 3 | Kornelius Kalnbach | when ':' then key_expected = false |
100 | 3 | Kornelius Kalnbach | when ',' then key_expected = true if stack.last == :object |
101 | 3 | Kornelius Kalnbach | when '}', ']' then stack.pop # no error recovery, but works for valid JSON |
102 | 3 | Kornelius Kalnbach | end |
103 | 3 | Kornelius Kalnbach | elsif match = scan(/ true | false | null /x) |
104 | 3 | Kornelius Kalnbach | kind = IDENT_KIND[match] |
105 | 3 | Kornelius Kalnbach | elsif match = scan(/-?(?:0|[1-9]\d*)/) |
106 | 3 | Kornelius Kalnbach | kind = :integer |
107 | 3 | Kornelius Kalnbach | if scan(/\.\d+(?:[eE][-+]?\d+)?|[eE][-+]?\d+/) |
108 | 3 | Kornelius Kalnbach | match << matched |
109 | 3 | Kornelius Kalnbach | kind = :float |
110 | 3 | Kornelius Kalnbach | end |
111 | 3 | Kornelius Kalnbach | elsif match = scan(/"/) |
112 | 3 | Kornelius Kalnbach | state = key_expected ? :key : :string |
113 | 3 | Kornelius Kalnbach | tokens << [:open, state] |
114 | 3 | Kornelius Kalnbach | kind = :delimiter |
115 | 3 | Kornelius Kalnbach | else |
116 | 3 | Kornelius Kalnbach | getch |
117 | 3 | Kornelius Kalnbach | kind = :error |
118 | 3 | Kornelius Kalnbach | end |
119 | 3 | Kornelius Kalnbach | |
120 | 3 | Kornelius Kalnbach | when :string, :key |
121 | 3 | Kornelius Kalnbach | if scan(/[^\\"]+/) |
122 | 3 | Kornelius Kalnbach | kind = :content |
123 | 3 | Kornelius Kalnbach | elsif scan(/"/) |
124 | 3 | Kornelius Kalnbach | tokens << ['"', :delimiter] |
125 | 3 | Kornelius Kalnbach | tokens << [:close, state] |
126 | 3 | Kornelius Kalnbach | state = :initial |
127 | 3 | Kornelius Kalnbach | next |
128 | 3 | Kornelius Kalnbach | elsif scan(/ \\ (?: #{ESCAPE} | #{UNICODE_ESCAPE} ) /mox) |
129 | 3 | Kornelius Kalnbach | kind = :char |
130 | 3 | Kornelius Kalnbach | elsif scan(/\\./m) |
131 | 3 | Kornelius Kalnbach | kind = :content |
132 | 3 | Kornelius Kalnbach | elsif scan(/ \\ | $ /x) |
133 | 3 | Kornelius Kalnbach | tokens << [:close, :delimiter] |
134 | 3 | Kornelius Kalnbach | kind = :error |
135 | 3 | Kornelius Kalnbach | state = :initial |
136 | 3 | Kornelius Kalnbach | else |
137 | 3 | Kornelius Kalnbach | raise_inspect "else case \" reached; %p not handled." % peek(1), tokens |
138 | 3 | Kornelius Kalnbach | end |
139 | 3 | Kornelius Kalnbach | |
140 | 3 | Kornelius Kalnbach | else |
141 | 3 | Kornelius Kalnbach | raise_inspect 'Unknown state', tokens |
142 | 3 | Kornelius Kalnbach | |
143 | 3 | Kornelius Kalnbach | end |
144 | 3 | Kornelius Kalnbach | |
145 | 3 | Kornelius Kalnbach | match ||= matched |
146 | 3 | Kornelius Kalnbach | if $DEBUG and not kind |
147 | 3 | Kornelius Kalnbach | raise_inspect 'Error token %p in line %d' % |
148 | 3 | Kornelius Kalnbach | [[match, kind], line], tokens |
149 | 3 | Kornelius Kalnbach | end |
150 | 3 | Kornelius Kalnbach | raise_inspect 'Empty token', tokens unless match |
151 | 3 | Kornelius Kalnbach | |
152 | 3 | Kornelius Kalnbach | tokens << [match, kind] |
153 | 3 | Kornelius Kalnbach | |
154 | 3 | Kornelius Kalnbach | end |
155 | 3 | Kornelius Kalnbach | |
156 | 3 | Kornelius Kalnbach | if [:string, :key].include? state |
157 | 3 | Kornelius Kalnbach | tokens << [:close, state] |
158 | 3 | Kornelius Kalnbach | end |
159 | 3 | Kornelius Kalnbach | |
160 | 3 | Kornelius Kalnbach | tokens |
161 | 3 | Kornelius Kalnbach | end |
162 | 3 | Kornelius Kalnbach | |
163 | 3 | Kornelius Kalnbach | end |
164 | 3 | Kornelius Kalnbach | |
165 | 3 | Kornelius Kalnbach | end |
166 | 3 | Kornelius Kalnbach | end |
167 | 3 | Kornelius Kalnbach | </code></pre> |