Improve comment in regc_pg_locale.c.

author Jeff Davis <[email protected]>

Wed, 16 Apr 2025 23:46:16 +0000 (16:46 -0700)

committer Jeff Davis <[email protected]>

Wed, 16 Apr 2025 23:49:35 +0000 (16:49 -0700)
author Jeff Davis <[email protected]>
Wed, 16 Apr 2025 23:46:16 +0000 (16:46 -0700)
committer Jeff Davis <[email protected]>
Wed, 16 Apr 2025 23:49:35 +0000 (16:49 -0700)
diff --git a/src/backend/regex/regc_pg_locale.c b/src/backend/regex/regc_pg_locale.c

index ed7411df83d20e2a3ba83e7745aae8f75f1cc731..ffc787cd7b45e03edcf5bb1eb84d75ea8df26247 100644 (file)
--- a/src/backend/regex/regc_pg_locale.c
+++ b/src/backend/regex/regc_pg_locale.c
@@ -21,22 +21,22 @@
  #include "utils/pg_locale.h"
  
  /*
- * To provide as much functionality as possible on a variety of platforms,
- * without going so far as to implement everything from scratch, we use
- * several implementation strategies depending on the situation:
+ * For the libc provider, to provide as much functionality as possible on a
+ * variety of platforms without going so far as to implement everything from
+ * scratch, we use several implementation strategies depending on the
+ * situation:
   *
   * 1. In C/POSIX collations, we use hard-wired code.  We can't depend on
   * the <ctype.h> functions since those will obey LC_CTYPE.  Note that these
   * collations don't give a fig about multibyte characters.
   *
- * 2. In the "default" collation (which is supposed to obey LC_CTYPE):
- *
- * 2a. When working in UTF8 encoding, we use the <wctype.h> functions.
+ * 2. When working in UTF8 encoding, we use the <wctype.h> functions.
   * This assumes that every platform uses Unicode codepoints directly
- * as the wchar_t representation of Unicode.  On some platforms
+ * as the wchar_t representation of Unicode.  (XXX: ICU makes this assumption
+ * even for non-UTF8 encodings, which may be a problem.)  On some platforms
   * wchar_t is only 16 bits wide, so we have to punt for codepoints > 0xFFFF.
   *
- * 2b. In all other encodings, we use the <ctype.h> functions for pg_wchar
+ * 3. In all other encodings, we use the <ctype.h> functions for pg_wchar
   * values up to 255, and punt for values above that.  This is 100% correct
   * only in single-byte encodings such as LATINn.  However, non-Unicode
   * multibyte encodings are mostly Far Eastern character sets for which the
@@ -46,14 +46,11 @@
   * the platform's wchar_t representation matches what we do in pg_wchar
   * conversions.
   *
- * 3. Here, we use the locale_t-extended forms of the <wctype.h> and <ctype.h>
- * functions, under exactly the same cases as #2.
- *
- * There is one notable difference between cases 2 and 3: in the "default"
- * collation we force ASCII letters to follow ASCII upcase/downcase rules,
- * while in a non-default collation we just let the library functions do what
- * they will.  The case where this matters is treatment of I/i in Turkish,
- * and the behavior is meant to match the upper()/lower() SQL functions.
+ * As a special case, in the "default" collation, (2) and (3) force ASCII
+ * letters to follow ASCII upcase/downcase rules, while in a non-default
+ * collation we just let the library functions do what they will.  The case
+ * where this matters is treatment of I/i in Turkish, and the behavior is
+ * meant to match the upper()/lower() SQL functions.
   *
   * We store the active collation setting in static variables.  In principle
   * it could be passed down to here via the regex library's "struct vars" data
author	Jeff Davis <[email protected]>
	Wed, 16 Apr 2025 23:46:16 +0000 (16:46 -0700)
committer	Jeff Davis <[email protected]>
	Wed, 16 Apr 2025 23:49:35 +0000 (16:49 -0700)