预注:命令行(commandline)被操作系统的命令分析器(/往后简称cmdlineparser)分解到命令参数argv[0]…[n],这里,commandline是入料,argv是出品.
Microsoft C/C++ 程序引导代码使用以下规则解析操作系统命令行中给出的参数:
- cmdlineparser用空白字符从commandline中分隔出argv;空白字符可以是一个空格(0x20)或制表符(0x09);注意,空白字符不一定就分割了argv,因为空白字符可能是argv的一部分
- 相比0x20和0x09,字符^(0x5E) 未被识别为转义符或者分隔符;出品argv之前,commandline由cmdlineparser完全处理
- commandline中,双引号括起来的字符串"string"被解释为单个参数,即使其中包含空格0x20,譬如"a string",解析为a string; 带引号的字符串可以嵌入在参数内,譬如d"e f"g,将被cmdlineparser解析为de fg
- commandline中,前面有反斜杠(0x5C)的双引号 (\") 被解释为argv中的双引号字符 (")
- 承4.,反斜杠在argv中按其原义解释,除非它们紧位于双引号之前
- commandline中,如果偶数个反斜杠后跟一个双引号,每对反斜杠将被cmdlineparser解析为argv中的一个反斜杠;而紧跟后面的那个双引号将被cmdlineparser当作分隔符,等价于commandline中的空白字符
- commandline中,如果奇数个反斜杠后跟一个双引号,每对反斜杠将被cmdlineparser解析为argv中的一个反斜杠;剩下的反斜杠+双引号按4.被转义解释为双引号
以上这段文字翻译自 ,主要还是本人理解的语义。原文如下
Microsoft C/C++ startup code uses the following rules when interpreting arguments given on the operating system command line:
- Arguments are delimited by white space, which is either a space or a tab.
- The caret character (^) is not recognized as an escape character or delimiter. The character is handled completely by the command-line parser in the operating system before being passed to the argv array in the program.
- A string surrounded by double quotation marks ("string") is interpreted as a single argument, regardless of white space contained within. A quoted string can be embedded in an argument.
- A double quotation mark preceded by a backslash (\") is interpreted as a literal double quotation mark character (").
- Backslashes are interpreted literally, unless they immediately precede a double quotation mark.
- If an even number of backslashes is followed by a double quotation mark, one backslash is placed in the argv array for every pair of backslashes, and the double quotation mark is interpreted as a string delimiter.
- If an odd number of backslashes is followed by a double quotation mark, one backslash is placed in the argv array for every pair of backslashes, and the double quotation mark is "escaped" by the remaining backslash, causing a literal double quotation mark (") to be placed in argv.
示例
下面的过程演示如何通过命令行参数:
// command_line_arguments.cpp // compile with: /EHsc #include < iostream > using namespace std; int main( int argc, // Number of strings in array argv char * argv[], // Array of command-line argument strings char * envp[] ) // Array of environment variable strings { int count; // Display each command-line argument. cout << " \nCommand-line arguments:\n " ; for ( count = 0 ; count < argc; count ++ ) cout << " argv[ " << count << " ] " << argv[count] << " \n " ; }
下表显示示例输入,并预期的输出,演示上面的规则列表
命令行输入 | argv [1] | argv [2] | argv [3] -----------------|-------------|--------------|--------------- "abc" d e | abc | d | e a\\b d"e f"g h | a\\b | de fg | h a\\\"b c d | a\"b | c | d a\\\\"b c" d e | a\\b c | d | e
/
又:
有关连在一起的多个双引号的解析,非常狗血,请参考讨论
- (/为便于阅读,但请把你浏览器的字符集设置为ISO-8859-1,然后ZoomIn视图)
尤其是 中的这个补充说明:
- And here's the missing undocumented rule: If a closing " is followed immediately by another ", the 2nd " is accepted literally and added to the parameter.
及其算法:
5.10 The Microsoft C/C++ Command Line Parameter Parsing Algorithm The following algorithm was reverse engineered by disassembling a small C program compiled using Microsoft Visual C++ and examining the disassembled code:
1. Parse off parameter 0 (the program filename) * The entire parameter may be enclosed in double quotes (it handles double quoted parts) (Double quotes are necessary if there are any spaces or tabs in the parameter) * There is no special processing of backslashes (\)
2. Parse off next parameter: a. Skip over multiple spaces/tabs between parameters LOOP b. Count the backslashes (\). Let m = number of backslashes. (m may be zero.) c. IF next character following m backslashes is a double quote: If m is even (or zero) if currently in a double quoted part IF next character is also a " move to next character (the 2nd ". This character will be added to the parameter.) ELSE set flag to not add this " character to the parameter ENDIF toggle double quoted part flag else set flag to not add this " character to the parameter endif Endif m = m/2 (floor divide e.g. 0/2=0, 1/2=0, 2/2=1, 3/2=1, 4/2=2, 5/2=2, etc.) ENDIF d. add m backslashes e. add this character to our parameter ENDLOOP