Skip to content

Indexing Bug with Mixed unicode and ASCII String #40974

@bradcarman

Description

@bradcarman

The following code causes the indexing of the string to give some strange behavior.

cc = Char.([84, 83, 61, 109, 47, 115, 178, 13, 0, 67, 72, 65, 78, 78, 69, 76])

cs = String(cc)

length(cc)
L = length(cs)
display(cs)
display(cs[1:end])
display(cs[1:L])
display(cs[1:L+1])

With the output

julia> cc = Char.([84, 83, 61, 109, 47, 115, 178, 13, 0, 67, 72, 65, 78, 78, 69, 76])
16-element Vector{Char}:
 'T': ASCII/Unicode U+0054 (category Lu: Letter, uppercase)
 'S': ASCII/Unicode U+0053 (category Lu: Letter, uppercase)
 '=': ASCII/Unicode U+003D (category Sm: Symbol, math)
 'm': ASCII/Unicode U+006D (category Ll: Letter, lowercase)
 '/': ASCII/Unicode U+002F (category Po: Punctuation, other)
 's': ASCII/Unicode U+0073 (category Ll: Letter, lowercase)
 '²': Unicode U+00B2 (category No: Number, other)
 '\r': ASCII/Unicode U+000D (category Cc: Other, control)
 '\0': ASCII/Unicode U+0000 (category Cc: Other, control)
 'C': ASCII/Unicode U+0043 (category Lu: Letter, uppercase)
 'H': ASCII/Unicode U+0048 (category Lu: Letter, uppercase)
 'A': ASCII/Unicode U+0041 (category Lu: Letter, uppercase)
 'N': ASCII/Unicode U+004E (category Lu: Letter, uppercase)
 'N': ASCII/Unicode U+004E (category Lu: Letter, uppercase)
 'E': ASCII/Unicode U+0045 (category Lu: Letter, uppercase)
 'L': ASCII/Unicode U+004C (category Lu: Letter, uppercase)

julia> cs = String(cc)
"TS=m/s²\r\0CHANNEL"

julia> length(cc)
16

julia> L = length(cs)
16

julia> display(cs) #OK
"TS=m/s²\r\0CHANNEL"

julia> display(cs[1:end]) #OK
"TS=m/s²\r\0CHANNEL"

julia> display(cs[1:L]) # NOT OK!!!
"TS=m/s²\r\0CHANNE"

julia> display(cs[1:L+1]) # DOESN'T MAKE SENSE
"TS=m/s²\r\0CHANNEL"

As can be seen, to access the full string of cs, I need to extend the index by 1. The length of cs is 16, but I need to index 1:17 to display the full string.

My version info is

julia> versioninfo()
Julia Version 1.6.0-rc1
Commit a58bdd9010 (2021-02-06 15:49 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake)
Environment:
  JULIA_DIR = D:\Programs\julia-1.6.0-rc1
  JULIA_NUM_THREADS = 8

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions