-
Notifications
You must be signed in to change notification settings - Fork 735
Description
The first rule for collapsing segment breaks is:
If the character immediately before or immediately after the segment break is the zero-width space character (
U+200B
), then the break is removed, leaving behind the zero-width space.
It is not clear to me what should happen if there are multiple segment breaks involve here. For example, if I have ZWSP LF LF LF x
, would this rule produce:
ZWSP LF LF x
(with only the firstLF
removed), orZWSP x
(with allLF
removed because of recursively applying this rule)?
(In the first case, the remaining LF
s would be converted to whitespaces by the last rule there, and the second whitespace would be removed by step 4 of Phase I, so the final result would be ZWSP WS x
.)
This may also affect the second rule:
Otherwise, if the East Asian Width property of both the character before and after the line feed is
F
,W
, orH
(notA
), and neither side is Hangul, then the segment break is removed.
If I have W LF LF W
, should the two LF
s be removed by this rule?
It seems to me that removing all segment breaks together would be easier for implementation, so I would propose making the rules that way if there are no other concerns.