关于path variables ;的解析及经典..;/的成因

存货文,积了好久,懒得发233,今天发下。

关于 path variables ; 的解析及经典..;/的成因

RFC文档中关于 path expansion 的描述中,有下面的模板例子

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Type    Separator
"," (default)
+ ","
# ","
. "."
/ "/"
; ";"
? "&"
& "&"

Example Template Expansion
{count} one,two,three
{count*} one,two,three
{/count} /one,two,three
{/count*} /one/two/three
{;count} ;count=one,two,three
{;count*} ;count=one;count=two;count=three
{?count} ?count=one,two,three
{?count*} ?count=one&count=two&count=three
{&count*} &count=one&count=two&count=three

/path;key=value 是一种Path variables的形式,类似最常见的?key=value。Tomcat对其的解析在CoyoteAdapter#parsePathParameters()

关键代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
protected void parsePathParameters(org.apache.coyote.Request req,
Request request) {

req.decodedURI().toBytes();
ByteChunk uriBC = req.decodedURI().getByteChunk();

//从URI中找到第一个;
int semicolon = uriBC.indexOf(';', 0);
if (semicolon == -1) {
return;
}

//读取多个;key=value
while (semicolon > -1) {
//确定;key=value的右边界
int start = uriBC.getStart();
int end = uriBC.getEnd();
int pathParamStart = semicolon + 1;
//[!] 确定右边界时,边界分隔符是;或/
int pathParamEnd = ByteChunk.findBytes(uriBC.getBuffer(),
start + pathParamStart, end,
new byte[] {';', '/'});

String pv = null;
if (pathParamEnd >= 0) {
//从/path;key=value.... 中提取key=value
if (charset != null) {
pv = new String(uriBC.getBuffer(), start + pathParamStart,
pathParamEnd - pathParamStart, charset);
}
//[+]取/path;key=value... 中...的部分,抹除掉原URI的;key=value,将...拼接到/path后面
byte[] buf = uriBC.getBuffer();
for (int i = 0; i < end - start - pathParamEnd; i++) {
buf[start + semicolon + i]
= buf[start + i + pathParamEnd];
}
uriBC.setBytes(buf, start,
end - start - pathParamEnd + semicolon);
}
//没边界了,结束提取key=value
else {
if (charset != null) {
pv = new String(uriBC.getBuffer(), start + pathParamStart,
(end - start) - pathParamStart, charset);
}
uriBC.setEnd(start + semicolon);
}

//解析key=value,分开key和value
if (pv != null) {
int equals = pv.indexOf('=');
if (equals > -1) {
String name = pv.substring(0, equals);
String value = pv.substring(equals + 1);
request.addPathParameter(name, value);
if (log.isDebugEnabled()) {
log.debug(sm.getString("coyoteAdapter.debug", "equals",
String.valueOf(equals)));
log.debug(sm.getString("coyoteAdapter.debug", "name",
name));
log.debug(sm.getString("coyoteAdapter.debug", "value",
value));
}
}
}

//如果有多个;key=value,继续找
semicolon = uriBC.indexOf(';', semicolon);

}
}

其中尤为关注带[+]注释的代码,这些代码是..;/会被tomcat认为是../的关键。

首先判断path variables的右边界是/,这样我们可以构造形如/path;key=value/;key=value的形式

下面将;key=value抹除,将后面的字符串拼接上去。类似下面这样

1
2
3
4
URI
/path;key=value/;key=value ->
/path/;key=value ->
/path/

结合上面两种特性,可以发现URI中有”被替换”和”拼接”的操作。这两种操作组合起来容易衍生出”绕过“、”预期不符“、”解析不一致“的问题。所以当我们传入了这样的URI

1
/path/..;/

由于;/之间没东西,又会被抹去;。最终URI会变成

1
/path/../

PS:看代码的时候,思路一定不能被他原本的功能限制住了,梳理好功能后一定要跳出来看,结合功能的场景(比如这里path variables是在URI上的)发散思考。

Rerference

https://www.youtube.com/watch?v=CIhHpkybYsY