Project

General

Profile

ScannerRequests » History » Version 3

Kornelius Kalnbach, 01/30/2009 05:29 AM

1 1 Kornelius Kalnbach
h1. Scanner Requests
2 1 Kornelius Kalnbach
3 1 Kornelius Kalnbach
Scanners are the heart of CodeRay. They split input code into tokens and classify them.
4 1 Kornelius Kalnbach
5 1 Kornelius Kalnbach
Each language has its own scanner: You can see what languages are currently supported in the "repository":http://code.licenser.net/repositories/browse/coderay/trunk/lib/coderay/scanners.
6 1 Kornelius Kalnbach
7 1 Kornelius Kalnbach
h2. Why is the CodeRay language support list so short?
8 1 Kornelius Kalnbach
9 1 Kornelius Kalnbach
CodeRay developing is a slow process, because the total number of active developers is 1 and he insists on high software quality.
10 1 Kornelius Kalnbach
11 2 Kornelius Kalnbach
Special attention is paid to the scanners: every CodeRay scanner is being tested carefully against lots of example source code, and also randomized and junk code to make it safe. A CodeRay scanner is not officially released unless it highlights very, very well.
12 1 Kornelius Kalnbach
13 1 Kornelius Kalnbach
h2. I need a new Scanner - What can I do?
14 1 Kornelius Kalnbach
15 1 Kornelius Kalnbach
Here's what you can do to speed up the development of a new scanner:
16 1 Kornelius Kalnbach
17 1 Kornelius Kalnbach
# Request it! File a "new ticket":http://code.licenser.net/projects/coderay/issues/new unless it already "exists":http://code.licenser.net/projects/coderay/issues?query_id=3; add a +1 or something to existing tickets to show your interest.
18 1 Kornelius Kalnbach
# Upload or link to *example code* in the ticket discussion.
19 1 Kornelius Kalnbach
#* Typical code in large quantities is very helpful, also for benchmarking.
20 1 Kornelius Kalnbach
#* But we also need the most *weird and strange code* you can find to make the scanner.
21 1 Kornelius Kalnbach
# Provide links to useful *information about the language lexic*, such as:
22 1 Kornelius Kalnbach
#* a list of reserved words (Did you know that "void" is a JavaScript keyword?)
23 1 Kornelius Kalnbach
#* rules for string and number literals (Can a double quoted string contain a newline?)
24 1 Kornelius Kalnbach
#* rules for comments and other token types (Does XYZ have a special syntax for multiline comments?)
25 1 Kornelius Kalnbach
#* a description of any unusual syntactic features (There's this weird %w() thing in Ruby...)
26 1 Kornelius Kalnbach
#* If there are different versions / implementations / dialects of this language: How do they differ?
27 1 Kornelius Kalnbach
# Give examples for *good and bad highlighters / syntax definitions* for the language (usually from editors or other libraries)
28 1 Kornelius Kalnbach
# Find *more example code*!
29 1 Kornelius Kalnbach
30 1 Kornelius Kalnbach
Also, read the next paragraph.
31 1 Kornelius Kalnbach
32 1 Kornelius Kalnbach
h2. I want to write a Scanner myself
33 1 Kornelius Kalnbach
34 1 Kornelius Kalnbach
Wow, you're brave! Writing CodeRay scanners is not an easy task because:
35 1 Kornelius Kalnbach
36 1 Kornelius Kalnbach
* You need excellent knowledge about the language you want to scan. Every language has a dark side!
37 1 Kornelius Kalnbach
* You need good knowledge of (Ruby) regular expressions.
38 1 Kornelius Kalnbach
* There's no documentation to speak of.
39 1 Kornelius Kalnbach
** But this is a wiki ^hint hint^ ;o)
40 1 Kornelius Kalnbach
41 1 Kornelius Kalnbach
But it has been done before, so go and try it!
42 1 Kornelius Kalnbach
43 1 Kornelius Kalnbach
# You should still request the scanner (as described above) and announce that you are working on a patch yourself.
44 1 Kornelius Kalnbach
# Check out the "repository":http://code.licenser.net/wiki/coderay/Repository and try the test suite (@[lang=xyz] rake test:scanners@).
45 1 Kornelius Kalnbach
# Copy a scanner of your choice as a base. You would know what language comes closest.
46 1 Kornelius Kalnbach
# Create a test case directory in @test/scanners@.
47 1 Kornelius Kalnbach
# --- Advertisement --- (No, just kidding.)
48 1 Kornelius Kalnbach
# Write your scanner!
49 1 Kornelius Kalnbach
# Also, look into @lib/coderay/scanners/_map.rb@ and @lib/coderay/helpers/file_type.rb@.
50 1 Kornelius Kalnbach
# Make a patch (scanner, test cases and other changes) and upload it to the ticket.
51 1 Kornelius Kalnbach
# Follow the following discussion.
52 1 Kornelius Kalnbach
# Prepare to be added to the THX list.
53 1 Kornelius Kalnbach
54 1 Kornelius Kalnbach
Contact me (murphy) if you have any questions.
55 3 Kornelius Kalnbach
56 3 Kornelius Kalnbach
h2. How does a Scanner look?
57 3 Kornelius Kalnbach
58 3 Kornelius Kalnbach
For example, the JSON scanner:
59 3 Kornelius Kalnbach
60 3 Kornelius Kalnbach
<pre><code class="ruby">
61 3 Kornelius Kalnbach
module CodeRay
62 3 Kornelius Kalnbach
module Scanners
63 3 Kornelius Kalnbach
  
64 3 Kornelius Kalnbach
  class JSON < Scanner
65 3 Kornelius Kalnbach
    
66 3 Kornelius Kalnbach
    include Streamable
67 3 Kornelius Kalnbach
    
68 3 Kornelius Kalnbach
    register_for :json
69 3 Kornelius Kalnbach
    
70 3 Kornelius Kalnbach
    CONSTANTS = %w( true false null )
71 3 Kornelius Kalnbach
    IDENT_KIND = WordList.new(:key).add(CONSTANTS, :reserved)
72 3 Kornelius Kalnbach
    
73 3 Kornelius Kalnbach
    ESCAPE = / [bfnrt\\"\/] /x
74 3 Kornelius Kalnbach
    UNICODE_ESCAPE =  / u[a-fA-F0-9]{4} /x
75 3 Kornelius Kalnbach
    
76 3 Kornelius Kalnbach
    def scan_tokens tokens, options
77 3 Kornelius Kalnbach
      
78 3 Kornelius Kalnbach
      state = :initial
79 3 Kornelius Kalnbach
      stack = []
80 3 Kornelius Kalnbach
      string_delimiter = nil
81 3 Kornelius Kalnbach
      key_expected = false
82 3 Kornelius Kalnbach
      
83 3 Kornelius Kalnbach
      until eos?
84 3 Kornelius Kalnbach
        
85 3 Kornelius Kalnbach
        kind = nil
86 3 Kornelius Kalnbach
        match = nil
87 3 Kornelius Kalnbach
        
88 3 Kornelius Kalnbach
        case state
89 3 Kornelius Kalnbach
        
90 3 Kornelius Kalnbach
        when :initial
91 3 Kornelius Kalnbach
          if match = scan(/ \s+ | \\\n /x)
92 3 Kornelius Kalnbach
            tokens << [match, :space]
93 3 Kornelius Kalnbach
            next
94 3 Kornelius Kalnbach
          elsif match = scan(/ [:,\[{\]}] /x)
95 3 Kornelius Kalnbach
            kind = :operator
96 3 Kornelius Kalnbach
            case match
97 3 Kornelius Kalnbach
            when '{' then stack << :object; key_expected = true
98 3 Kornelius Kalnbach
            when '[' then stack << :array
99 3 Kornelius Kalnbach
            when ':' then key_expected = false
100 3 Kornelius Kalnbach
            when ',' then key_expected = true if stack.last == :object
101 3 Kornelius Kalnbach
            when '}', ']' then stack.pop  # no error recovery, but works for valid JSON
102 3 Kornelius Kalnbach
            end
103 3 Kornelius Kalnbach
          elsif match = scan(/ true | false | null /x)
104 3 Kornelius Kalnbach
            kind = IDENT_KIND[match]
105 3 Kornelius Kalnbach
          elsif match = scan(/-?(?:0|[1-9]\d*)/)
106 3 Kornelius Kalnbach
            kind = :integer
107 3 Kornelius Kalnbach
            if scan(/\.\d+(?:[eE][-+]?\d+)?|[eE][-+]?\d+/)
108 3 Kornelius Kalnbach
              match << matched
109 3 Kornelius Kalnbach
              kind = :float
110 3 Kornelius Kalnbach
            end
111 3 Kornelius Kalnbach
          elsif match = scan(/"/)
112 3 Kornelius Kalnbach
            state = key_expected ? :key : :string
113 3 Kornelius Kalnbach
            tokens << [:open, state]
114 3 Kornelius Kalnbach
            kind = :delimiter
115 3 Kornelius Kalnbach
          else
116 3 Kornelius Kalnbach
            getch
117 3 Kornelius Kalnbach
            kind = :error
118 3 Kornelius Kalnbach
          end
119 3 Kornelius Kalnbach
          
120 3 Kornelius Kalnbach
        when :string, :key
121 3 Kornelius Kalnbach
          if scan(/[^\\"]+/)
122 3 Kornelius Kalnbach
            kind = :content
123 3 Kornelius Kalnbach
          elsif scan(/"/)
124 3 Kornelius Kalnbach
            tokens << ['"', :delimiter]
125 3 Kornelius Kalnbach
            tokens << [:close, state]
126 3 Kornelius Kalnbach
            state = :initial
127 3 Kornelius Kalnbach
            next
128 3 Kornelius Kalnbach
          elsif scan(/ \\ (?: #{ESCAPE} | #{UNICODE_ESCAPE} ) /mox)
129 3 Kornelius Kalnbach
            kind = :char
130 3 Kornelius Kalnbach
          elsif scan(/\\./m)
131 3 Kornelius Kalnbach
            kind = :content
132 3 Kornelius Kalnbach
          elsif scan(/ \\ | $ /x)
133 3 Kornelius Kalnbach
            tokens << [:close, :delimiter]
134 3 Kornelius Kalnbach
            kind = :error
135 3 Kornelius Kalnbach
            state = :initial
136 3 Kornelius Kalnbach
          else
137 3 Kornelius Kalnbach
            raise_inspect "else case \" reached; %p not handled." % peek(1), tokens
138 3 Kornelius Kalnbach
          end
139 3 Kornelius Kalnbach
          
140 3 Kornelius Kalnbach
        else
141 3 Kornelius Kalnbach
          raise_inspect 'Unknown state', tokens
142 3 Kornelius Kalnbach
          
143 3 Kornelius Kalnbach
        end
144 3 Kornelius Kalnbach
        
145 3 Kornelius Kalnbach
        match ||= matched
146 3 Kornelius Kalnbach
        if $DEBUG and not kind
147 3 Kornelius Kalnbach
          raise_inspect 'Error token %p in line %d' %
148 3 Kornelius Kalnbach
            [[match, kind], line], tokens
149 3 Kornelius Kalnbach
        end
150 3 Kornelius Kalnbach
        raise_inspect 'Empty token', tokens unless match
151 3 Kornelius Kalnbach
        
152 3 Kornelius Kalnbach
        tokens << [match, kind]
153 3 Kornelius Kalnbach
        
154 3 Kornelius Kalnbach
      end
155 3 Kornelius Kalnbach
      
156 3 Kornelius Kalnbach
      if [:string, :key].include? state
157 3 Kornelius Kalnbach
        tokens << [:close, state]
158 3 Kornelius Kalnbach
      end
159 3 Kornelius Kalnbach
      
160 3 Kornelius Kalnbach
      tokens
161 3 Kornelius Kalnbach
    end
162 3 Kornelius Kalnbach
    
163 3 Kornelius Kalnbach
  end
164 3 Kornelius Kalnbach
  
165 3 Kornelius Kalnbach
end
166 3 Kornelius Kalnbach
end
167 3 Kornelius Kalnbach
</code></pre>