Project

General

Profile

ScannerRequests » History » Version 3

« Previous - Version 3/22 (diff) - Next » - Current version
Kornelius Kalnbach, 01/30/2009 05:29 AM


Scanner Requests

Scanners are the heart of CodeRay. They split input code into tokens and classify them.

Each language has its own scanner: You can see what languages are currently supported in the repository.

Why is the CodeRay language support list so short?

CodeRay developing is a slow process, because the total number of active developers is 1 and he insists on high software quality.

Special attention is paid to the scanners: every CodeRay scanner is being tested carefully against lots of example source code, and also randomized and junk code to make it safe. A CodeRay scanner is not officially released unless it highlights very, very well.

I need a new Scanner - What can I do?

Here's what you can do to speed up the development of a new scanner:

  1. Request it! File a new ticket unless it already exists add a +1 or something to existing tickets to show your interest.
  2. Upload or link to example code in the ticket discussion.
    • Typical code in large quantities is very helpful, also for benchmarking.
    • But we also need the most weird and strange code you can find to make the scanner.
  3. Provide links to useful information about the language lexic, such as:
    • a list of reserved words (Did you know that "void" is a JavaScript keyword?)
    • rules for string and number literals (Can a double quoted string contain a newline?)
    • rules for comments and other token types (Does XYZ have a special syntax for multiline comments?)
    • a description of any unusual syntactic features (There's this weird %w() thing in Ruby...)
    • If there are different versions / implementations / dialects of this language: How do they differ?
  4. Give examples for good and bad highlighters / syntax definitions for the language (usually from editors or other libraries)
  5. Find more example code!

Also, read the next paragraph.

I want to write a Scanner myself

Wow, you're brave! Writing CodeRay scanners is not an easy task because:

  • You need excellent knowledge about the language you want to scan. Every language has a dark side!
  • You need good knowledge of (Ruby) regular expressions.
  • There's no documentation to speak of.
    • But this is a wiki hint hint ;o)

But it has been done before, so go and try it!

  1. You should still request the scanner (as described above) and announce that you are working on a patch yourself.
  2. Check out the repository and try the test suite ([lang=xyz] rake test:scanners).
  3. Copy a scanner of your choice as a base. You would know what language comes closest.
  4. Create a test case directory in test/scanners.
  5. --- Advertisement --- (No, just kidding.)
  6. Write your scanner!
  7. Also, look into lib/coderay/scanners/_map.rb and lib/coderay/helpers/file_type.rb.
  8. Make a patch (scanner, test cases and other changes) and upload it to the ticket.
  9. Follow the following discussion.
  10. Prepare to be added to the THX list.

Contact me (murphy) if you have any questions.

How does a Scanner look?

For example, the JSON scanner:

module CodeRay
module Scanners

  class JSON < Scanner

    include Streamable

    register_for :json

    CONSTANTS = %w( true false null )
    IDENT_KIND = WordList.new(:key).add(CONSTANTS, :reserved)

    ESCAPE = / [bfnrt\\"\/] /x
    UNICODE_ESCAPE =  / u[a-fA-F0-9]{4} /x

    def scan_tokens tokens, options

      state = :initial
      stack = []
      string_delimiter = nil
      key_expected = false

      until eos?

        kind = nil
        match = nil

        case state

        when :initial
          if match = scan(/ \s+ | \\\n /x)
            tokens << [match, :space]
            next
          elsif match = scan(/ [:,\[{\]}] /x)
            kind = :operator
            case match
            when '{' then stack << :object; key_expected = true
            when '[' then stack << :array
            when ':' then key_expected = false
            when ',' then key_expected = true if stack.last == :object
            when '}', ']' then stack.pop  # no error recovery, but works for valid JSON
            end
          elsif match = scan(/ true | false | null /x)
            kind = IDENT_KIND[match]
          elsif match = scan(/-?(?:0|[1-9]\d*)/)
            kind = :integer
            if scan(/\.\d+(?:[eE][-+]?\d+)?|[eE][-+]?\d+/)
              match << matched
              kind = :float
            end
          elsif match = scan(/"/)
            state = key_expected ? :key : :string
            tokens << [:open, state]
            kind = :delimiter
          else
            getch
            kind = :error
          end

        when :string, :key
          if scan(/[^\\"]+/)
            kind = :content
          elsif scan(/"/)
            tokens << ['"', :delimiter]
            tokens << [:close, state]
            state = :initial
            next
          elsif scan(/ \\ (?: #{ESCAPE} | #{UNICODE_ESCAPE} ) /mox)
            kind = :char
          elsif scan(/\\./m)
            kind = :content
          elsif scan(/ \\ | $ /x)
            tokens << [:close, :delimiter]
            kind = :error
            state = :initial
          else
            raise_inspect "else case \" reached; %p not handled." % peek(1), tokens
          end

        else
          raise_inspect 'Unknown state', tokens

        end

        match ||= matched
        if $DEBUG and not kind
          raise_inspect 'Error token %p in line %d' %
            [[match, kind], line], tokens
        end
        raise_inspect 'Empty token', tokens unless match

        tokens << [match, kind]

      end

      if [:string, :key].include? state
        tokens << [:close, state]
      end

      tokens
    end

  end

end
end