博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
深入研究Clang(七) Clang Lexer代码阅读笔记之Lexer
阅读量:5976 次
发布时间:2019-06-20

本文共 4465 字,大约阅读时间需要 14 分钟。

作者:

源码位置:clang/lib/Lexer.cpp

源码网络地址:

Lexer.cpp这个文件,是Clang这个前端的词法分析器的主要文件,它的内容是对Lexer这个类的具体实现,原文件的注释中:“This file implements the Lexer and Token interfaces.” 这么解释这个文件的,但是Token只有两个简单函数的实现,剩下的都是Lexer的实现。所以要想搞清楚Clang的词法分析器是怎么实现的,那么必须对这个文件有着深入的理解。

从Lexer的初始化函数开始入手:

void Lexer::InitLexer(const char *BufStart, const char *BufPtr,   56                       const char *BufEnd) {   57   BufferStart = BufStart;   58   BufferPtr = BufPtr;   59   BufferEnd = BufEnd;   60    61   assert(BufEnd[0] == 0 &&   62          "We assume that the input buffer has a null character at the end"   63          " to simplify lexing!");   64    65   // Check whether we have a BOM in the beginning of the buffer. If yes - act   66   // accordingly. Right now we support only UTF-8 with and without BOM, so, just   67   // skip the UTF-8 BOM if it's present.   68   if (BufferStart == BufferPtr) {   69     // Determine the size of the BOM.   70     StringRef Buf(BufferStart, BufferEnd - BufferStart);   71     size_t BOMLength = llvm::StringSwitch
(Buf) 72 .StartsWith("\xEF\xBB\xBF", 3) // UTF-8 BOM 73 .Default(0); 74 75 // Skip the BOM. 76 BufferPtr += BOMLength; 77 } 78 79 Is_PragmaLexer = false; 80 CurrentConflictMarkerState = CMK_None; 81 82 // Start of the file is a start of line. 83 IsAtStartOfLine = true; 84 IsAtPhysicalStartOfLine = true; 85 86 HasLeadingSpace = false; 87 HasLeadingEmptyMacro = false; 88 89 // We are not after parsing a #. 90 ParsingPreprocessorDirective = false; 91 92 // We are not after parsing #include. 93 ParsingFilename = false; 94 95 // We are not in raw mode. Raw mode disables diagnostics and interpretation 96 // of tokens (e.g. identifiers, thus disabling macro expansion). It is used 97 // to quickly lex the tokens of the buffer, e.g. when handling a "#if 0" block 98 // or otherwise skipping over tokens. 99 LexingRawMode = false; 100 101 // Default to not keeping comments. 102 ExtendedTokenMode = 0; 103 }

这个初始化函数,是在Lexer类的两个构造函数里被调用的,具体代码如下:

104   105 /// Lexer constructor - Create a new lexer object for the specified buffer  106 /// with the specified preprocessor managing the lexing process.  This lexer  107 /// assumes that the associated file buffer and Preprocessor objects will  108 /// outlive it, so it doesn't take ownership of either of them.  109 Lexer::Lexer(FileID FID, const llvm::MemoryBuffer *InputFile, Preprocessor &PP)  110   : PreprocessorLexer(&PP, FID),  111     FileLoc(PP.getSourceManager().getLocForStartOfFile(FID)),  112     LangOpts(PP.getLangOpts()) {  113   114   InitLexer(InputFile->getBufferStart(), InputFile->getBufferStart(),  115             InputFile->getBufferEnd());  116   117   resetExtendedTokenMode();  118 }  119   120 void Lexer::resetExtendedTokenMode() {  121   assert(PP && "Cannot reset token mode without a preprocessor");  122   if (LangOpts.TraditionalCPP)  123     SetKeepWhitespaceMode(true);  124   else  125     SetCommentRetentionState(PP->getCommentRetentionState());  126 }  127   128 /// Lexer constructor - Create a new raw lexer object.  This object is only  129 /// suitable for calls to 'LexFromRawLexer'.  This lexer assumes that the text  130 /// range will outlive it, so it doesn't take ownership of it.  131 Lexer::Lexer(SourceLocation fileloc, const LangOptions &langOpts,  132              const char *BufStart, const char *BufPtr, const char *BufEnd)  133   : FileLoc(fileloc), LangOpts(langOpts) {  134   135   InitLexer(BufStart, BufPtr, BufEnd);  136   137   // We *are* in raw mode.  138   LexingRawMode = true;  139 }
这两个构造函数各有不同,从输入参数上就可以看出。也有相同的地方,就是对一些参数只是引用的关系,并没有获取这些参数的所有权。

Lexer的构造函数,在自己的类内部,分别被以下的函数所调用:

Create_PragmaLexer:  constructor - Create a new lexer object for _Pragma expansion.

http://clang.llvm.org/doxygen/classclang_1_1Lexer.html#ac7f3b1ce4f2eeaec8d787d22bf197cd0

getSpelling - This method is used to get the spelling of a token into a preallocated buffer, instead of as an std::string.

http://clang.llvm.org/doxygen/classclang_1_1Lexer.html#a94f2c5710332ae19d7955c609ac37adb

getRawToken

http://clang.llvm.org/doxygen/classclang_1_1Lexer.html#adac8b8cf001621ec3b109d82a7074f05

 getBeginningOfFileToken

http://clang.llvm.org/doxygen/Lexer_8cpp.html#a4845396d18432c436e605303b057dbb4

findLocationAfterToken

http://clang.llvm.org/doxygen/classclang_1_1Lexer.html#a099b99b2d19ef5cdd8fcb80d8cf4064e

转载地址:http://topox.baihongyu.com/

你可能感兴趣的文章
【Linux】Linux 在线安装yum
查看>>
Atom 编辑器系列视频课程
查看>>
[原][osgearth]osgearthviewer读取earth文件,代码解析(earth文件读取的一帧)
查看>>
mybatis update返回值的意义
查看>>
expdp 详解及实例
查看>>
通过IP判断登录地址
查看>>
深入浅出JavaScript (五) 详解Document.write()方法
查看>>
Beta冲刺——day6
查看>>
在一个程序中调用另一个程序并且传输数据到选择屏幕执行这个程序
查看>>
代码生成工具Database2Sharp中增加视图的代码生成以及主从表界面生成功能
查看>>
关于在VS2005中编写DLL遇到 C4251 警告的解决办法
查看>>
提高信息安全意识对网络勒索病毒说不
查看>>
我的友情链接
查看>>
IDE---Python IDE之Eric5在window下的安装
查看>>
Mybatis调用Oracle中的存储过程和function
查看>>
基本安装lnmp环境
查看>>
yum源资料汇总
查看>>
7、MTC与MTV,http请求介绍
查看>>
logstash消费阿里云kafka消息
查看>>
unix 环境高级编程
查看>>