I work at Red Hat on GCC, the GNU Compiler Collection, and I spent most of the past year making GCC easier to use. Let's look at C and C++ improvements that will be in the next major release of GCC, GCC 9.
A new look for diagnostics
By way of example, let's look at how GCC 8 reports an attempt to use a missing binary "+" in C++:
$ gcc-8 t.cc t.cc: In function βint test(const shape&, const shape&)β: t.cc:15:4: error: no match for βoperator+β (operand types are βboxed_value<double>β and βboxed_value<double>β) return (width(s1) * height(s1) ~~~~~~~~~~~~~~~~~~~~~~ + width(s2) * height(s2)); ^~~~~~~~~~~~~~~~~~~~~~~~
Here's what it looks like in GCC 9:
$ gcc-9 t.cc t.cc: In function βint test(const shape&, const shape&)β: t.cc:15:4: error: no match for βoperator+β (operand types are βboxed_value<double>β and βboxed_value<double>β) 14 | return (width(s1) * height(s1) | ~~~~~~~~~~~~~~~~~~~~~~ | | | boxed_value<[...]> 15 | + width(s2) * height(s2)); | ^ ~~~~~~~~~~~~~~~~~~~~~~ | | | boxed_value<[...]>
There are a few changes here. I've added a left-hand margin, showing line numbers. The "error" line mentions line 15, but the expression in question spans multiple lines, and we're actually starting with line 14. I think it's worth a little extra horizontal space to make it clear which line is which. It also helps distinguish your source code from the annotations that GCC emits. I believe they also make it a little easier to see where each diagnostic starts, by visually breaking things up at the leftmost column.
Speaking of annotations, this example shows another new GCC 9 feature: diagnostics can label regions of the source code to show pertinent information. Here, what's most important are the types of the left-hand and right-hand sides of the "+" operator, so GCC highlights them inline. Notice how the diagnostic also uses color to distinguish the two operands from each other and the operator.
The left margin affects how we print things like fix-it hints for missing header files:
$ gcc-9 -xc++ -c incomplete.c incomplete.c:1:6: error: βstringβ in namespace βstdβ does not name a type 1 | std::string test(void) | ^~~~~~ incomplete.c:1:1: note: βstd::stringβ is defined in header β<string>β; did you forget to β#include <string>β? +++ |+#include <string> 1 | std::string test(void)
I've turned on these changes by default; they can be disabled via -fno-diagnostics-show-line-numbers and -fno-diagnostics-show-labels, respectively.
Another example can be seen in the type-mismatch error from the article I wrote last year, Usability improvements in GCC 8:
extern int callee(int one, const char *two, float three); int caller(int first, int second, float third) { return callee(first, second, third); }
where the bogus type of the expression is now highlighted inline:
$ gcc-9 -c param-type-mismatch.c param-type-mismatch.c: In function βcallerβ: param-type-mismatch.c:5:24: warning: passing argument 2 of βcalleeβ makes pointer from integer without a cast [-Wint-conversion] 5 | return callee(first, second, third); | ^~~~~~ | | | int param-type-mismatch.c:1:40: note: expected βconst char *β but argument is of type βintβ 1 | extern int callee(int one, const char *two, float three); | ~~~~~~~~~~~~^~~
Yet another example can be seen in this bad printf
call:
$ g++-9 -c bad-printf.cc -Wall bad-printf.cc: In function βvoid print_field(const char*, float, long int, long int)β: bad-printf.cc:6:17: warning: field width specifier β*β expects argument of type βintβ, but argument 3 has type βlong intβ [-Wformat=] 6 | printf ("%s: %*ld ", fieldname, column - width, value); | ~^~~ ~~~~~~~~~~~~~~ | | | | int long int bad-printf.cc:6:19: warning: format β%ldβ expects argument of type βlong intβ, but argument 4 has type βdoubleβ [-Wformat=] 6 | printf ("%s: %*ld ", fieldname, column - width, value); | ~~~^ ~~~~~ | | | | long int double | %*f
which contrasts "inline" the type expected by the format string versus what was passed in. (Embarrassingly, we didn't properly highlight format string locations in older versions of the C++ front end; for GCC 9, I've implemented this so it has parity with that of the C front end, as shown here).
Not just for humans
One concern I've heard when changing how GCC prints diagnostics is that it might break someone's script for parsing GCC output. I don't think these changes will do that: most such scripts are set up to parse the
"FILENAME:LINE:COL: error: MESSAGE"
lines and ignore the rest, and I'm not touching that part of the output.
But it made me think it was about time we had a machine-readable output format for diagnostics, so for GCC 9, I've added a JSON output format: -fdiagnostics-format=json.
Consider this warning:
$ gcc-9 -c cve-2014-1266.c -Wall cve-2014-1266.c: In function βSSLVerifySignedServerKeyExchangeβ: cve-2014-1266.c:629:2: warning: this βifβ clause does not guard... [-Wmisleading-indentation] 629 | if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0) | ^~ cve-2014-1266.c:631:3: note: ...this statement, but the latter is misleadingly indented as if it were guarded by the βifβ 631 | goto fail; | ^~~~
With -fdiagnostics-format=json
, the diagnostics are emitted as a big blob of JSON to stderr. Running them through the handy python -m json.tool
to format them gives an idea of the structure:
$ (gcc-9 -c cve-2014-1266.c -Wall -fdiagnostics-format=json 2>&1) | python -m json.tool | pygmentize -l json [ { "children": [ { "kind": "note", "locations": [ { "caret": { "column": 3, "file": "cve-2014-1266.c", "line": 631 }, "finish": { "column": 6, "file": "cve-2014-1266.c", "line": 631 } } ], "message": "...this statement, but the latter is misleadingly indented as if it were guarded by the \u2018if\u2019" } ], "kind": "warning", "locations": [ { "caret": { "column": 2, "file": "cve-2014-1266.c", "line": 629 }, "finish": { "column": 3, "file": "cve-2014-1266.c", "line": 629 } } ], "message": "this \u2018if\u2019 clause does not guard...", "option": "-Wmisleading-indentation" } ]
In particular, the supplementary "note" is nested within the "warning" at the JSON level, allowing, for example, IDEs to group them. Some of our C++ diagnostics can have numerous child diagnostics giving additional detail, so being able to group them, for example, via a disclosure widget, could be helpful.
Simpler C++ errors
C++ is a complicated language. For example, the rules for figuring out which C++ function is to be invoked at a call site are non-trivial.
The compiler could need to consider several functions at a given call site, reject all of them for different reasons, and g++
's error messages have to cope with this generality, explaining why each was rejected.
This generality can make simple cases harder to read than they could be, so for GCC 9, I've added special-casing to simplify some g++
errors for common cases where there's just one candidate function.
For example, GCC 8 could emit this:
$ g++-8 param-type-mismatch.cc param-type-mismatch.cc: In function βint test(int, const char*, float)β: param-type-mismatch.cc:8:45: error: no matching function for call to βfoo::member_1(int&, const char*&, float&)β return foo::member_1 (first, second, third); ^ param-type-mismatch.cc:3:14: note: candidate: βstatic int foo::member_1(int, const char**, float)β static int member_1 (int one, const char **two, float three); ^~~~~~~~ param-type-mismatch.cc:3:14: note: no known conversion for argument 2 from βconst char*β to βconst char**β
For GCC 9, I've special-cased this, giving a more direct error message, which highlights both the problematic argument and the parameter that it can't be converted to:
$ g++-9 param-type-mismatch.cc param-type-mismatch.cc: In function βint test(int, const char*, float)β: param-type-mismatch.cc:8:32: error: cannot convert βconst char*β to βconst char**β 8 | return foo::member_1 (first, second, third); | ^~~~~~ | | | const char* param-type-mismatch.cc:3:46: note: initializing argument 2 of βstatic int foo::member_1(int, const char**, float)β 3 | static int member_1 (int one, const char **two, float three); | ~~~~~~~~~~~~~^~~
Similarly, GCC 8 took two messages to offer suggestions for various kinds of misspelled names:
$ g++-8 typo.cc typo.cc:5:13: error: βBUFSIZEβ was not declared in this scope uint8_t buf[BUFSIZE]; ^~~~~~~ typo.cc:5:13: note: suggested alternative: βBUF_SIZEβ uint8_t buf[BUFSIZE]; ^~~~~~~ BUF_SIZE
so for GCC 9, I've consolidated the messages:
$ g++-9 typo.cc typo.cc:5:13: error: βBUFSIZEβ was not declared in this scope; did you mean βBUF_SIZEβ? 5 | uint8_t buf[BUFSIZE]; | ^~~~~~~ | BUF_SIZE
In some cases, where GCC 8 knew to offer suggestions within namespaces:
$ g++-8 typo-2.cc typo-2.cc: In function βvoid mesh_to_strip()β: typo-2.cc:8:3: error: βtri_stripβ was not declared in this scope tri_strip result; ^~~~~~~~~ typo-2.cc:8:3: note: suggested alternative: typo-2.cc:2:9: note: βengine::tri_stripβ class tri_strip { ^~~~~~~~~
GCC 9 can now offer fix-it hints:
$ g++-9 typo-2.cc typo-2.cc: In function βvoid mesh_to_strip()β: typo-2.cc:8:3: error: βtri_stripβ was not declared in this scope; did you mean βengine::tri_stripβ? 8 | tri_strip result; | ^~~~~~~~~ | engine::tri_strip typo-2.cc:2:9: note: βengine::tri_stripβ declared here 2 | class tri_strip { | ^~~~~~~~~
Location, location, location
A long-standing issue within GCC's internal representation is that not every node within the syntax tree has a source location.
For GCC 8, I added a way to ensure that every argument at a C++ call site has a source location.
For GCC 9, I've extended this work so that many more places in the C++ syntax tree now retain location information for longer.
This really helps when tracking down bad initializations. GCC 8 and earlier might unhelpfully emit errors on the final closing parenthesis or brace, for example:
$ g++-8 bad-inits.cc bad-inits.cc:12:1: error: cannot convert βjsonβ to βintβ in initialization }; ^ bad-inits.cc:14:47: error: initializer-string for array of chars is too long [-fpermissive] char buffers[3][5] = { "red", "green", "blue" }; ^ bad-inits.cc: In constructor βX::X()β: bad-inits.cc:17:35: error: invalid conversion from βintβ to βvoid*β [-fpermissive] X() : one(42), two(42), three(42) ^
whereas now, GCC 9 can highlight exactly where the various problems are:
$ g++-9 bad-inits.cc bad-inits.cc:10:14: error: cannot convert βjsonβ to βintβ in initialization 10 | { 3, json::object }, | ~~~~~~^~~~~~ | | | json bad-inits.cc:14:31: error: initializer-string for array of chars is too long [-fpermissive] 14 | char buffers[3][5] = { "red", "green", "blue" }; | ^~~~~~~ bad-inits.cc: In constructor βX::X()β: bad-inits.cc:17:13: error: invalid conversion from βintβ to βvoid*β [-fpermissive] 17 | X() : one(42), two(42), three(42) | ^~ | | | int
What is the optimizer doing?
GCC can automatically "vectorize" loops, reorganizing them to work on multiple iterations at once, to take advantage of the vector units on your CPU. However, it can do this only for some loops; if you stray from the path, GCC will have to use scalar code instead.
Unfortunately, historically it hasn't been easy to get a sense from GCC about the decisions it's making as it's optimizing your code. We have an option, -fopt-info, that emits optimization information, but it's been more of a tool for the developers of GCC itself, rather than something aimed at end users.
For example, consider this (contrived) example:
#define N 1024 void test (int *p, int *q) { int i; for (i = 0; i < N; i++) { p[i] = q[i]; asm volatile ("" ::: "memory"); } }
I tried compiling it with GCC 8 with -O3 -fopt-info-all-vec
, but it wasn't very enlightening:
$ gcc-8 -c v.c -O3 -fopt-info-all-vec Analyzing loop at v.c:7 v.c:7:3: note: ===== analyze_loop_nest ===== v.c:7:3: note: === vect_analyze_loop_form === v.c:7:3: note: === get_loop_niters === v.c:7:3: note: not vectorized: loop contains function calls or data references that cannot be analyzed v.c:3:6: note: vectorized 0 loops in function. v.c:3:6: note: ===vect_slp_analyze_bb=== v.c:3:6: note: ===vect_slp_analyze_bb=== v.c:10:7: note: === vect_analyze_data_refs === v.c:10:7: note: got vectype for stmt: _5 = *_3; vector(4) int v.c:10:7: note: got vectype for stmt: *_4 = _5; vector(4) int v.c:10:7: note: === vect_analyze_data_ref_accesses === v.c:10:7: note: not consecutive access _5 = *_3; v.c:10:7: note: not consecutive access *_4 = _5; v.c:10:7: note: not vectorized: no grouped stores in basic block. v.c:7:3: note: === vect_analyze_data_refs === v.c:7:3: note: not vectorized: not enough data-refs in basic block. v.c:7:3: note: ===vect_slp_analyze_bb=== v.c:7:3: note: ===vect_slp_analyze_bb=== v.c:12:1: note: === vect_analyze_data_refs === v.c:12:1: note: not vectorized: not enough data-refs in basic block.
For GCC 9, I've reorganized problem-tracking within the vectorizer so that the output is of the form:
[LOOP-LOCATION]: couldn't vectorize this loop [PROBLEM-LOCATION]: because of [REASON]
For the example above, this gives the following, identifying the location of the construct within the loop that the vectorizer couldn't handle. (I hoped to have it also show the source code, but that didn't make feature freeze):
$ gcc-9 -c v.c -O3 -fopt-info-all-vec v.c:7:3: missed: couldn't vectorize loop v.c:10:7: missed: statement clobbers memory: __asm__ __volatile__("" : : : "memory"); v.c:3:6: note: vectorized 0 loops in function. v.c:10:7: missed: statement clobbers memory: __asm__ __volatile__("" : : : "memory");
This improves things, but still has some limitations, so for GCC 9 I've also added a new option to emit machine-readable optimization information: -fsave-optimization-record.
This writes out a SRCFILE.opt-record.json.gz
file with much richer data: for example, every message is tagged with profile information (if available), so that you can look at the "hottest" part of the code, and it captures inlining information, so that if a function has been inlined into several places, you can see how each instance of the function has been optimized.
Other improvements
GCC can emit "fix-it hints" that suggest how to fix a problem in your code. These can be automatically applied by an IDE.
For GCC 9, I've added various new fix-it hints. There are now fix-it hints for forgetting the return *this;
needed by various C++ operators:
$ g++-9 -c operator.cc operator.cc: In member function βboxed_ptr& boxed_ptr::operator=(const boxed_ptr&)β: operator.cc:7:3: warning: no return statement in function returning non-void [-Wreturn-type] 6 | m_ptr = other.m_ptr; +++ |+ return *this; 7 | } | ^
and for when the compiler needs a typename
:
$ g++-9 -c template.cc template.cc:3:3: error: need βtypenameβ before βTraits::typeβ because βTraitsβ is a dependent scope 3 | Traits::type type; | ^~~~~~ | typename
and when you try to use an accessor member as if it were a data member:
$ g++-9 -c fncall.cc fncall.cc: In function βvoid hangman(const mystring&)β: fncall.cc:12:11: error: invalid use of member function βint mystring::get_length() constβ (did you forget the β()β ?) 12 | if (str.get_length > 0) | ~~~~^~~~~~~~~~ | ()
and for C++11's scoped enums:
$ g++-9 -c enums.cc enums.cc: In function βvoid json::test(const json::value&)β: enums.cc:12:26: error: βSTRINGβ was not declared in this scope; did you mean βjson::kind::STRINGβ? 12 | if (v.get_kind () == STRING) | ^~~~~~ | json::kind::STRING enums.cc:3:44: note: βjson::kind::STRINGβ declared here 3 | enum class kind { OBJECT, ARRAY, NUMBER, STRING, TRUE, FALSE, NULL_ }; | ^~~~~~
And I added a tweak to integrate the suggestions about misspelled members with that for accessors:
$ g++-9 -c accessor-fixit.cc accessor-fixit.cc: In function βint test(t*)β: accessor-fixit.cc:17:15: error: βclass tβ has no member named βratioβ; did you mean βint t::m_ratioβ? (accessible via βint t::get_ratio() constβ) 17 | return ptr->ratio; | ^~~~~ | get_ratio()
I've also tweaked the suggestions code so it considers transposed letters, so it should do a better job of figuring out misspellings.
Looking to the future
The above covers some of the changes I've made for GCC 9.
Perhaps a deeper change is that we now have a set of user experience guidelines for GCC, to try to keep a focus on the programmer's experience as we implement new diagnostics. If you'd like to get involved in GCC development, please join us on the GCC mailing list. Hacking on diagnostics is a great way to get started.
Trying it out
GCC 9 will be in Fedora 30, which should be out in a few weeks.
For simple code examples, you can play around with the new
GCC at https://godbolt.org/ (select GCC "trunk").
Have fun!
See Also
If you are using GCC 8 on Red Hat Enterprise Linux 6, 7, or 8 Beta, some articles that might be of interest:
Last updated: March 7, 2019