本文分享自天翼云开发者社区《探究Openresty中ngx.re与Lua string.re两种正则的选择》.作者:王****淋
0. 背景
openresty中存在2套正则API,即ngx.re与 lua语言的string库,都可以实现正则匹配查找等功能,那么,这2个API有什么区别,又如何选择呢?
1. 性能测试
1.1 简单loop测试
a) 短字符串&正则串- local http_range = 'bytes=10-65535'
- local string_re_p = '^bytes=([%d]*)%-([%d]*)$'
- local ngx_re_p = '^bytes=([\\d]*)?\\-([\\d]*)$'
- local loop = 1000000
- local t0 = get_t()
- for i = 1, loop do
- local _, _ = string_match(http_range, string_re_p)
- end
- local t1 = get_t()
- for i = 1, loop do
- local m, err = ngx_re_match(http_range, ngx_re_p, "jo")
- end
- local t2 = get_t()
复制代码 Result: 0.247 vs. 0.32
b) 长字符串&复杂正则串- local http_range = 'dsfds65465fwef bytes=12345757860-4465458586465 ewfsd65sd4fg65fsd'
- local string_re_p = '.*bytes=([%d]*)%-([%d]*) .+'
- local ngx_re_p = '.*bytes=([\\d]*)?\\-([\\d]*) .+'
- local loop = 1000000
复制代码 Result: 1.16 vs. 0.526
由测试结果可以看出,对于字符串/正则规则越复杂,ngx-re的性能是有优势的
1.2. 加入jit扰动
a) 对照组:ipairs不破坏jit (短串正则)- local http_range = 'bytes=10-65535'
- local string_re_p = '^bytes=([%d]*)%-([%d]*)$'
- local ngx_re_p = '^bytes=([\\d]*)?\\-([\\d]*)$'
- local loop = 1000000
- local t0 = get_t()
- for i = 1, loop do
- for k, v in ipairs({1,2}) do end
- local _, _ = string_match(http_range, string_re_p)
- end
- local t1 = get_t()
- for i = 1, loop do
- for k, v in ipairs({1,2}) do end
- local m, err = ngx_re_match(http_range, ngx_re_p, "jo")
- end
- local t2 = get_t()
复制代码 jit-on: 0.369 - 0.326
jit-off: 0.38 - 3.265
b) pairs 破坏jit (短串正则)- local http_range = 'bytes=10-65535'
- local string_re_p = '^bytes=([%d]*)%-([%d]*)$'
- local ngx_re_p = '^bytes=([\\d]*)?\\-([\\d]*)$'
- local loop = 1000000
- local t0 = get_t()
- for i = 1, loop do
- for k, v in pairs({a=1,b=2}) do end
- local _, _ = string_match(http_range, string_re_p)
- end
- local t1 = get_t()
- for i = 1, loop do
- for k, v in pairs({a=1,b=2}) do end
- local m, err = ngx_re_match(http_range, ngx_re_p, "jo")
- end
- local t2 = get_t()
复制代码 jit-off: 0.395 - 3.216
jit-on: 0.394 - 1.04
c) pairs + 长复杂串- local http_range = 'dsfds65465fwef bytes=12345757860-4465458586465 ewfsd65sd4fg65fsd'
- local string_re_p = '.*bytes=([%d]*)%-([%d]*) .+'
- local ngx_re_p = '.*bytes=([\\d]*)?\\-([\\d]*) .+'
- local loop = 1000000
复制代码 jit-on: 1.31 - 1.30
jit-off: 1.307 - 2.94
超长串 + jit-on:- local http_range = 'dsfds6546vsdvsdfdsfsdfsdfwaasdasdasdas5fwef bytes=12354345345345757860-4465453453453453453453453458586465 ewfsd65safdknsalk;nlkasdnflksdajfhkldashjnfkl;ashfgjklahfg;jlsasd4fg65fsd'
复制代码 结果: 2.775 - 1.739
1.3测试结果汇总
string.matchngx.re.match备注短串正则0.247 秒0.32 秒jit-hit短串正则 带ipirs0.3690.326jit-hit短串正则 带pairs0.3941.04 长串正则 带pairs2.7751.739 短串正则 带pairs+jit-off0.3953.216jit-off短串正则 带ipairs+jit-off0.383.265jit-off2. 结论
由测试结果可知:
1)在一般情况下,nginx-re正则库更能适应复杂字符串与复杂正则规则的情况,一般情况下比较推荐使用
2)在极简单字符串的情况下,二者差别不大,string正则稍带优势,可以按照方便的写法来写;
3)nginx-re正则受JIT的影响更大,在关闭jit或使用pairs等情况下,可能会有拖累;
来源:程序园用户自行投稿发布,如果侵权,请联系站长删除
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作! |