Skip to content

Powershell Parser can't tell the difference between En dash and dash #1308

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gbuktenica opened this issue May 2, 2018 · 14 comments
Closed
Labels
Issue-Discussion Let's talk about it.

Comments

@gbuktenica
Copy link

gbuktenica commented May 2, 2018

Issue Type: Bug

This issue will happen when code is copied from an external source such as the internet.

  1. Open VScode and create a new PowerShell File.
  2. Type the following: New-Item -Path $env:TEMP\hello1.txt [Press ENTER]
  3. Type the following: New-Item [Hold ALT]0150[Release ALT]Path $env:TEMP\hello2.txt
  4. Save the file and run in PowerShell
    Error returned:
At C:\Scripts\fail.ps1:2 char:12
+ New-Item –Path $env:TEMP\hello2.txt
+            ~~~~~~~~~~~~~~~~~~~~~~~~~~
The string is missing the terminator: ".

Extension version: 1.7.0
VS Code version: Code 1.22.2 (3aeede733d9a3098f7b4bdc1f66b63b0f48c1ef9, 2018-04-12T16:38:45.278Z)
OS version: Windows_NT x64 10.0.15063

System Info
Item Value
CPUs Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz (4 x 2808)
Memory (System) 15.92GB (7.72GB free)
Process Argv C:\Program Files\Microsoft VS Code\Code.exe
Screen Reader no
VM 0%
@gbuktenica
Copy link
Author

gbuktenica commented May 2, 2018

When the file created above is opened in PowerShell ISE line 2 is highlighted and the En dash is shown as:
–
e.g.
New-Item -Path $env:TEMP\hello1.txt
New-Item –Path $env:TEMP\hello2.txt

Vscode displays the above as:
New-Item -Path $env:TEMP\hello1.txt
New-Item –Path $env:TEMP\hello2.txt

@TylerLeonhardt
Copy link
Member

Thanks for this! Just curious, what do you expect to happen? That PowerShell would be able to handle the En dash? Or that VSCode was more outward on the fact that it is an en dash?

@omniomi
Copy link
Contributor

omniomi commented May 2, 2018

For me if I use an en dash or em dash it does break highlighting which is an indicator of the wrong dash being used although not super in your face. Seems like something that should perhaps be handled by PSScriptAnalyzer though as an en/em dash could be valid in a string and with positional parameters "<en-dash>something" might be a string not a parameter which would be valid.

PSScriptAnalyzer could have some sort of "potential illegal dash" warning that could be suppressed when it's intentional.

@TylerLeonhardt
Copy link
Member

That's a great idea @omniomi 👍

We should open an issue on:
https://github.com/PowerShell/PSScriptAnalyzer

to create such a rule.

@rjmholt
Copy link
Contributor

rjmholt commented May 2, 2018

To clarify, is the problem that an em-dash is working like a dash, or that it's not working like a dash? Or is the problem that the encoding is bad and we're displaying it not as a dash but as a nasty character sequence?

PowerShell accepts all dash variants equally by design. So I guess there's a problem if we're not doing that.

@rjmholt
Copy link
Contributor

rjmholt commented May 2, 2018

If I type:

Get-Item —Path ~

And execute it with F8, it works.

But we don't get any completions on it, so we should definitely fix the regex/completions mechanism there.

@gbuktenica Can you explain what the hold-Alt functionality does and what provides it?

@gbuktenica
Copy link
Author

@rjmholt The ALT function is just to show you how to recreate the problem. A developer would never normally do this. Normally this issue arises when a segment of code is copied from an external source, say github, stackoverflow or more likely a blog and during the copy and paste a valid dash is converted to an en dash. Potentially the original hosting platform had already converted a dash.
Similar behavior can be seen with double quotes
Type Write-Output "Blah" into Microsoft word then copy and paste it into Vscode and save.
Word has converted the double quotes into start and finish double quotes.
Doing the same into PowerShell ISE and the start and finish double quotes are correctly converted back to normal double quotes.

@gbuktenica
Copy link
Author

gbuktenica commented May 3, 2018

@omniomi and @tylerl0706 I have raised a request on PSScriptAnalyzer.
New Rule: Detect common Unicode character substitutions.
PowerShell/PSScriptAnalyzer#981

@rjmholt
Copy link
Contributor

rjmholt commented May 3, 2018

So I've done some research and think I finally understand the Alt-code thing as a Windows CP1252 code-point entry feature, except that I can't find any specification of how it's supposed to function these days (i.e. how much code-point translation it does).

In CP1252, en-dash is code-point 150, but it looks like VSCode converts that to UTF-8, where it is encoded as 0xE2 0x80 0x93. Those bytes, when interpreted as CP1252, render as – (see the CP1252 link above again).

Because I haven't been able to reproduce the Alt-code input, I don't know if VSCode is displaying – into the buffer, or putting in as it should be. If it's the first case, it looks like that would be a bug in VSCode not dealing with Windows Alt-codes. In the second case, that seems like pretty reasonable behaviour — converting the Alt-code to the correct encoding to display the intended glyph in the document. The third possibility, that VSCode is displaying the en-dash but sending – to EditorServices for execution, is especially bad. But would definitely be a bug in VSCode.

But I did copy and paste an em-dash (—) into the buffer and it worked properly — recognised as a dash by the PowerShell parser/tokenizer, so that New-Item —Path ~/myfile.ps1 works when I press F8 (just not completions or syntax highlighting, which we should fix).

In terms of other applications converting characters and then copying them back into VSCode, I don't really see how we can filter characters copied into the document. Even if VSCode allows the extension to do it, PowerShell supports unicode strings, so backquotes and other things generated by MSWord are legitimate string contents.

Also, PowerShell already supports using left/right double quotes and en/em-dashes where ASCII quotes and dashes are conventionally used respectively. That is, they're not wrong or invalid — you're allowed to use them.

@rjmholt
Copy link
Contributor

rjmholt commented May 3, 2018

Also @gbuktenica, hello from a fellow sandgroper.

@gbuktenica
Copy link
Author

@rjmholt Yes, I included the Alt-code thing so that a Windows developer recreate the Unicode character I had the issue with (em-dash).
My issue arose when I copied example code into a ps1 file using vscode and one of the dashes was an em-dash. Another developer using PowerShell ISE pulled my code from source control and had the execution error above and the syntax highlighting showed the dash as –

@rjmholt
Copy link
Contributor

rjmholt commented May 3, 2018

Ah, I see!

Well, for completeness, VSCode lets you change your file encoding.

And it also looks like there's a way to make the ISE use UTF-8.

@TylerLeonhardt
Copy link
Member

So now that we have an issue on PSSA, do we have any action to take in the extension? I think no - and we can probably close this.

@joeyaiello
Copy link

joeyaiello commented May 3, 2018

@rjmholt and I are discussing this. I think we generally agree that this is already being handled correctly in PowerShell, and that we shouldn't automatically transform what people paste into VS Code.

That being said, it does make sense to me to have a PSSA rule that can help detect where these characters are showing up in your file (and said PSSA rule can have a "quick fix" in VS Code that allows you to snap to one specific dash universally). However, I wouldn't place it at a particularly high priority.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue-Discussion Let's talk about it.
Projects
None yet
Development

No branches or pull requests

5 participants