Skip to content

Fix multibyte character tokenization bug in ERB::Util #53655

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 17, 2024

Conversation

martinemde
Copy link
Contributor

Motivation / Background

This Pull Request has been created because there are problems parsing certain ERB templates that cause error highlighting to be wrong.

Detail

StringScanner uses bytes, so use byteslice so that all length calculations are done in bytes.

Additional information

The tests I added fail as follows when multibyte characters exist before the tag:

Failure:
ActiveSupport::ERBUtilTest#test_multibyte_characters_start [test/core_ext/erb_util_test.rb:141]:
--- expected
+++ actual
@@ -1 +1 @@
-[[:TEXT, "こんにちは"], [:OPEN, "<%="], [:CODE, " name "], [:CLOSE, "%>"]]
+[[:TEXT, "こんにちは<%= name %"], [:OPEN, "<%="], [:CODE, " name "], [:CLOSE, "%>"]]

Failure:
ActiveSupport::ERBUtilTest#test_multibyte_characters_end [test/core_ext/erb_util_test.rb:151]:
--- expected
+++ actual
@@ -1 +1 @@
-[[:CODE, " 'こんにちは' "], [:CLOSE, "%>"]]
+[[:CODE, " 'こんにちは' %>"], [:CLOSE, "%>"]]

Checklist

Before submitting the PR make sure the following are checked:

  • This Pull Request is related to one change. Unrelated changes should be opened in separate PRs.
  • Commit message has a detailed description of what changed and why. If this PR fixes a related issue include it in the commit message. Ex: [Fix #issue-number]
  • Tests are added or updated if you fix a bug or add a feature.
  • CHANGELOG files are updated for the changed libraries if there is a behavior change or additional feature. Minor bug fixes and documentation changes should not be included.

@martinemde martinemde changed the title Fix multibyte character offset problem before ERB tag Fix multibyte character tokenization problem in ERB::Util Nov 17, 2024
@martinemde martinemde changed the title Fix multibyte character tokenization problem in ERB::Util Fix multibyte character tokenization bug in ERB::Util Nov 17, 2024
@martinemde martinemde force-pushed the martinemde/byteslice-erb-tokenize branch from 0b0d04e to 3c7ce9a Compare November 17, 2024 04:05
@martinemde martinemde force-pushed the martinemde/byteslice-erb-tokenize branch from 3c7ce9a to 30010bb Compare November 17, 2024 04:09
@byroot byroot merged commit 51f3fa6 into rails:main Nov 17, 2024
3 checks passed
byroot added a commit that referenced this pull request Nov 17, 2024
…kenize

Fix multibyte character tokenization bug in ERB::Util
byroot added a commit that referenced this pull request Nov 17, 2024
…kenize

Fix multibyte character tokenization bug in ERB::Util
@martinemde martinemde deleted the martinemde/byteslice-erb-tokenize branch November 17, 2024 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants