Project

General

Profile

ScannerRequests » History » Version 3

Version 2 (Kornelius Kalnbach, 01/29/2009 12:48 AM) → Version 3/22 (Kornelius Kalnbach, 01/30/2009 05:29 AM)

h1. Scanner Requests

Scanners are the heart of CodeRay. They split input code into tokens and classify them.

Each language has its own scanner: You can see what languages are currently supported in the "repository":http://code.licenser.net/repositories/browse/coderay/trunk/lib/coderay/scanners.

h2. Why is the CodeRay language support list so short?

CodeRay developing is a slow process, because the total number of active developers is 1 and he insists on high software quality.

Special attention is paid to the scanners: every CodeRay scanner is being tested carefully against lots of example source code, and also randomized and junk code to make it safe. A CodeRay scanner is not officially released unless it highlights very, very well.

h2. I need a new Scanner - What can I do?

Here's what you can do to speed up the development of a new scanner:

# Request it! File a "new ticket":http://code.licenser.net/projects/coderay/issues/new unless it already "exists":http://code.licenser.net/projects/coderay/issues?query_id=3; add a +1 or something to existing tickets to show your interest.
# Upload or link to *example code* in the ticket discussion.
#* Typical code in large quantities is very helpful, also for benchmarking.
#* But we also need the most *weird and strange code* you can find to make the scanner.
# Provide links to useful *information about the language lexic*, such as:
#* a list of reserved words (Did you know that "void" is a JavaScript keyword?)
#* rules for string and number literals (Can a double quoted string contain a newline?)
#* rules for comments and other token types (Does XYZ have a special syntax for multiline comments?)
#* a description of any unusual syntactic features (There's this weird %w() thing in Ruby...)
#* If there are different versions / implementations / dialects of this language: How do they differ?
# Give examples for *good and bad highlighters / syntax definitions* for the language (usually from editors or other libraries)
# Find *more example code*!

Also, read the next paragraph.

h2. I want to write a Scanner myself

Wow, you're brave! Writing CodeRay scanners is not an easy task because:

* You need excellent knowledge about the language you want to scan. Every language has a dark side!
* You need good knowledge of (Ruby) regular expressions.
* There's no documentation to speak of.
** But this is a wiki ^hint hint^ ;o)

But it has been done before, so go and try it!

# You should still request the scanner (as described above) and announce that you are working on a patch yourself.
# Check out the "repository":http://code.licenser.net/wiki/coderay/Repository and try the test suite (@[lang=xyz] rake test:scanners@).
# Copy a scanner of your choice as a base. You would know what language comes closest.
# Create a test case directory in @test/scanners@.
# --- Advertisement --- (No, just kidding.)
# Write your scanner!
# Also, look into @lib/coderay/scanners/_map.rb@ and @lib/coderay/helpers/file_type.rb@.
# Make a patch (scanner, test cases and other changes) and upload it to the ticket.
# Follow the following discussion.
# Prepare to be added to the THX list.

Contact me (murphy) if you have any questions.

h2. How does a Scanner look?

For example, the JSON scanner:

<pre><code class="ruby">
module CodeRay
module Scanners

class JSON < Scanner

include Streamable

register_for :json

CONSTANTS = %w( true false null )
IDENT_KIND = WordList.new(:key).add(CONSTANTS, :reserved)

ESCAPE = / [bfnrt\\"\/] /x
UNICODE_ESCAPE = / u[a-fA-F0-9]{4} /x

def scan_tokens tokens, options

state = :initial
stack = []
string_delimiter = nil
key_expected = false

until eos?

kind = nil
match = nil

case state

when :initial
if match = scan(/ \s+ | \\\n /x)
tokens << [match, :space]
next
elsif match = scan(/ [:,\[{\]}] /x)
kind = :operator
case match
when '{' then stack << :object; key_expected = true
when '[' then stack << :array
when ':' then key_expected = false
when ',' then key_expected = true if stack.last == :object
when '}', ']' then stack.pop # no error recovery, but works for valid JSON
end
elsif match = scan(/ true | false | null /x)
kind = IDENT_KIND[match]
elsif match = scan(/-?(?:0|[1-9]\d*)/)
kind = :integer
if scan(/\.\d+(?:[eE][-+]?\d+)?|[eE][-+]?\d+/)
match << matched
kind = :float
end
elsif match = scan(/"/)
state = key_expected ? :key : :string
tokens << [:open, state]
kind = :delimiter
else
getch
kind = :error
end

when :string, :key
if scan(/[^\\"]+/)
kind = :content
elsif scan(/"/)
tokens << ['"', :delimiter]
tokens << [:close, state]
state = :initial
next
elsif scan(/ \\ (?: #{ESCAPE} | #{UNICODE_ESCAPE} ) /mox)
kind = :char
elsif scan(/\\./m)
kind = :content
elsif scan(/ \\ | $ /x)
tokens << [:close, :delimiter]
kind = :error
state = :initial
else
raise_inspect "else case \" reached; %p not handled." % peek(1), tokens
end

else
raise_inspect 'Unknown state', tokens

end

match ||= matched
if $DEBUG and not kind
raise_inspect 'Error token %p in line %d' %
[[match, kind], line], tokens
end
raise_inspect 'Empty token', tokens unless match

tokens << [match, kind]

end

if [:string, :key].include? state
tokens << [:close, state]
end

tokens
end

end

end
end
</code></pre>