A regex to find C/C++ preprocessor includes
2019-10-10
Yes I know .. “parsing” C or C++ with regular expressions is bad. There’s of course libclang (1, 2) and also some way to hook into GCC … but sometimes it’s just easier to slap a regex on the problem. In this case getting the names of the directly included files:
grep -oP '#[[:blank:]]*include[[:blank:]]*("|<)\K[\w.\/]+(?=("|>))' FILEThis calls grep with “Perl-compatible” regex support (-P) and prints only the matches (-o). The regex:
#… literally that character[[:blank:]]*: some amount of whitespace; except newlinesinclude… literally that word[[:blank:]]*: again some whitespace; no newlines("|<): Either"or<\Kresets the starting point of the reported match. This basically means “only treat the following part of the match as the match” (which is then printed by grep)[\w.\/]+1 or more of “word characters” (a-z and A-Z), the literal dot.or forward slashes/(?= )is a “positive lookahead” … meaning make sure that this comes next, but don’t add it the the reported match.("|<)(inside the positive lookahead): Either"or>
I’ve found this regex to work well in practice, though of course it may return false positives due to:
#includein comments; these will be picked up, too#include <foo"and#include "foo>will be picked up, too. This could be fixed … but honestly … I don’t think that this is a likely case to happen
And of course it doesn’t care about conditional compilation … so this will yield foo:
#if 0
#include "foo"
#endifBut that’s exactly what I need in my case.