One of the many cool features of Google is the Did you mean phrase suggested if you fat-fingered your search keywords. Go ahead, search for the clong compiler. Google says, yeah, we think you might have meant clang compiler. Thanks Google!
While working one day with some OpenCV code (I’ll blog about that some day, when I know what I’m doing), I noticed I had mistyped cvNamedWindow
as cvNameWindow
and clang replied:
1 |
motion.cpp:12:3: error: use of undeclared identifier 'cvNameWindow'; did you mean 'cvNamedWindow'? |
Well, isn’t that cool? Intrigued I decided to see what gcc would respond with given some typos, and then compare it to clang.
To enable a quick illustration I’m going to use g++
and clang++
.
mathr.h
[c]
int addTwoInts(int a, int b);
int addThreeInts(int a, int b, int c);
[/c]
mathr.c
[c]
int addTwoInts(int a, int b) { return a + b; }
int addThreeInts(int a, int b, int c) { return a + b + c; }
[/c]
umath.c
[c]
#include "mathr.h"
int main(void) {
addTwInts(4, 5);
}
[/c]
Notice the addTwInts
function call. Obviously I meant addTwoInts
.
Compiling with g++-4.8
gives
1 2 3 4 5 6 |
g++-4.8 -c umath.c umath.c: In function 'int main()': umath.c:4:17: error: 'addTwInts' was not declared in this scope addTwInts(4, 5); ^ make: *** [umath.o] Error 1 |
Okay, addTwInts
not declared in this scope. Since this is a really simple example I know where I went wrong. But, take a look at what clang++
can tell me:
1 2 3 4 5 6 7 8 9 10 11 |
clang++ -c umath.c clang: warning: treating 'c' input as 'c++' when in C++ mode, this behavior is deprecated umath.c:4:3: error: use of undeclared identifier 'addTwInts'; did you mean 'addTwoInts'? addTwInts(4, 5); ^~~~~~~~~ addTwoInts ./mathr.h:1:5: note: 'addTwoInts' declared here int addTwoInts(int a, int b); ^ 1 error generated. |
Nice! Yes, I did mean addTwoInts
!
Searching on Google led me to Chris Lattner’s 2010 article on Clang’s neat error recovery features, and there’s more than just the spell-checking-suggestion-engine.
Clang uses the Levenshtein distance algorithm for determining possible corrections to typos. If you notice, addTwInts
is just 1 character away (an insertion) from addTwoInts
, and Clang is able to recognize that and make the suggestion.
Of course there are limits. Substituting addTwInts
with addInts
results in:
1 2 |
umath.c:4:3: error: use of undeclared identifier 'addInts' addInts(4, 5); |
The distance is 3 inserts from addTwoInts
and 4 inserts from addThreeInts
. What’s interesting however is that a “strong match” (not a technical term!) will boost Clang’s ability to make a suggestion. For example, you can change addTwoInts
to addTwoInts_____
(your underscore key was stuck that day), and Clang will still make the suggestion “use of undeclared identifier 'addTwoInts_____'; did you mean 'addTwoInts'?
” Add another underscore (for a total of 6) and you are back to use of undeclared identifier
with no suggestions. To quickly calculate the Levenshtein distance of two arbitrary strings, you can try a online Levenshtein distance calculator.
I’m currently only using Clang on my Mac, and the diagnostics features work for C, C++, Objective-C, and of course, Swift. I’m definitely looking forward to using Clang on Linux where GCC has been the only game in town for decades. Of course, recognizing the rising adoption of Clang has led to significant improvements in GCC’s diagnostics. Only time will tell if GCC will remain the premier compiler on Linux systems (sorry, GNU/Linux. *smirk*).