2013/03 归档 - Felix021 - So far so good

Mar 5

记坑

felix021 @ 2013-3-5 15:39 [IT » 其他] 评论(4) , 引用(0) , 阅读(16791) | Via 本站原创

1. Python的除法

线上有一个简单的函数，运行一年多了，作用是把"分"表示的字符串转成"元"：

def fen_to_yuan(str_fen):
fen = int(str_fen)
return '%d.%02d' % (int(fen / 100) , fen % 100)

看起来也的确是没有什么问题，但是就这么简单的一点代码，它还是错了，原因是fen_to_yuan("-270")居然返回了"-3.30"！坑爹啊。简单测试一下，原来是这样：

引用

>>> -270/100
-3
>>> -270%100
30

所以只好蛋疼地修改成这样：

def fen_to_yuan(str_fen):
fen = int(str_fen)
sign, fen = fen < 0 and ('-', -fen) or ('', fen)
return sign + '%d.%02d' % (int(fen / 100) , fen % 100)

2. 线上有一个脚本，要得到上个月的月份，bash的实现就是

引用

date -d "-1 month $date" +%m

看起来也的确是没有什么问题，但是就这么简单的一点代码，它还是错了，原因是对于10月31号居然返回了10！坑爹啊。简单测试一下，原来是这样：

引用

$ date -d "-1 month 20121031" +%Y-%m-%d
20121001
$ date -d "-1 month 20130331" +%Y-%m-%d
20130303

也就是说，先把月份减一，然后检查日期，超过当前月，再向上修正月份，再向上修正年份。

所以只好蛋疼地修改成这样：

引用

date -d "-1 month ${date:0:6}01" +%m

#update: @whusnoopy补充说向后查看1个月也会有这样的情况，总之记得用月来算是有坑的，千万注意。

3. crontab的小坑

crontab默认是不会读取.bashrc，需要自己去source一下.bashrc，~~并且不支持像bash一样用反引号来启动一个子命令（这个结论是错的，是因为%前面忘了加斜杠）~~。这个不展开细说了，有兴趣的试试吧。

Mar 3

常用find + grep查找封装

felix021 @ 2013-3-3 23:45 [IT » 其他] 评论(0) , 引用(0) , 阅读(10647) | Via 本站原创

看源码的时候经常要在某一类文件里面grep一些内容，用标准的find + grep写起来很辛苦：

$ find -name "*.c" -exec grep {} -Hne "hello world" \;

所以简单封装了下，保存成 ~/bin/xgrep 然后把 ~/bin 加入到 PATH 里去，以后就只需要

$ xgrep \*.c "hello world" #注意这个 \*.c 里可以用的是*和?的通配符，不是正则

#!/bin/bash

if [ -z "$1" -o -z "$2" ]; then
echo "Usage: xgrep FilePattern WordPattern"
exit
fi

filepat="$1"
greppat="$2"

shift
shift

set -x

find -name "$filepat" -exec grep {} -Hne "$greppat" $* \;

#后来才想起grep其实有个--exclude=PATTERN（可以去掉find），但是已经这么用了挺久，习惯了。。。

Mar 3

Yet Another False-Sharing Test

felix021 @ 2013-3-3 23:31 [IT » 硬件] 评论(0) , 引用(0) , 阅读(12272) | Via 本站原创

前天在 coolshell 里看到并发框架Disruptor译文以后，才感慨了CPU娘的傲娇，没一会儿就看到 Dutor 同学的 A False-Sharing Test ，发现差距好大（4线程4倍- ，16线程8倍+ ，我用dutor的代码实测16线程性能差距接近20倍），于是也写了段小代码来测试它。跟dutor同学不一样，我用的是 c 实现的，看起来可能没那么易读。

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <sys/time.h>
#include <limits.h>

void *tester(void *arg)
{
long *nloop = (long *)arg; //这里之前笔误写成int了。
while ( (*nloop)-- );
return NULL;
}

int driver(int nthread, int nloop, int npad)
{
size_t size = npad + sizeof(long); //每个线程占用sizeof(long) + npad的空间
char buff[size * nthread];
pthread_t th[nthread];

struct timeval s, e;
gettimeofday(&s, NULL);

for (int i = 0; i < nthread; i++) {
int *arg = (int *)(buff + size * i);
*arg = nloop;
pthread_create(&th[i], NULL, tester, (void *)arg);
}

void *pret;
for (int i = 0; i < nthread; i++)
pthread_join(th[i], &pret);

gettimeofday(&e, NULL);

return (e.tv_sec - s.tv_sec) * 1000000 + e.tv_usec - s.tv_usec;
}

int main()
{
int nloop = 1024 * 1024 * 128, nthread = 16, npad, best_padding = 0, best_usage = INT_MAX;
printf("nloop = %d, nthread = %d\n\n", nloop, nthread);
for (npad = 64; npad >= 0; npad -= 8) { //之所以步长为8是为了避免非8字节对齐long可能有的性能损失
int i, usage = 0;;
for (i = 0; i < 3; i++)
usage += driver(nthread, nloop, npad);
usage /= 3;
if (usage < best_usage) {
best_usage = usage;
best_padding = npad;
}
printf("padding: %2d, time usage: %12d\n", npad, usage);
}
printf("\nbest padding: %2d, time usage: %12d\n", best_padding, best_usage);
return 0;
}

引用

$ gcc false_sharing.c -lpthread -std=c99
$ ./a.out
nloop = 134217728, nthread = 16

padding: 64, time usage: 491395
padding: 56, time usage: 477760
padding: 48, time usage: 853594
padding: 40, time usage: 834318
padding: 32, time usage: 905200
padding: 24, time usage: 940989
padding: 16, time usage: 991595
padding: 8, time usage: 1040412
padding: 0, time usage: 1112716

best padding: 56, time usage: 477760

该机器使用的是4颗4核8线程的Xeon E7520@1.87GHz （16个物理核心32个逻辑核心），64GB RAM，/proc/cpuinfo里的cache_alignment是64

可以看出来，padding=56（也就是正好对齐到一个cache行）的时候效率最高，是没有填充时的2倍+的效率，虽然明显，但是显著地没有dutor的测试那么夸张。

把dutor的代码稍微改了下，s[ith].n = NLOOP，且pthread_create的时候传入的参数改成 (void *)&(s[ith].n)，然后hook程序改成

size_t *n = (size_t *)args;
while ( (*n)-- );
return NULL;

其运行效率提升显著，padding=56的时候能快10%左右，而padding=0的时候能快达7倍之巨，最终的性能差距大约可以降至 3 倍的差距。这说明dutor的测试方法并不是测试裸的性能差距，带来的了一定的误差。

由于现在多数CPU都已经有了共享的L2或者L3 Cache，Cache Line失效的问题得到了相当的改善，不过不同物理CPU上仍然需要注意这个问题。

然而有一点我不能理解，这个修改对两种情况的影响竟相差这么大，这里头又有什么玄机呢...... #UPDATE: 后根据dutor的测试，我去掉了 for 循环中用到的循环变量 i 之后，性能差距立即将至2倍左右，修改循环的方向或者将for改成while则无效，因此这很可能是分支预测失效带来的问题了。

Mar 3

Python int缓存的那点事

felix021 @ 2013-3-3 02:42 [IT » Python] 评论(1) , 引用(0) , 阅读(15116) | Via 本站原创

早先翻python代码的时候是有注意到 intobject 里有缓存这档事，不过没细看。昨天有人在sf问起为什么有如下现象：

引用

>>> a = 1.0
>>> b = 1.0
>>> a is b
False
>>> a = 1
>>> b = 1
>>> a is b
True

于是又翻开python的源码 python2.6.8/Objects/intobject.c ，可以看到这些代码(略做简化)：

#define NSMALLPOSINTS 257
#define NSMALLNEGINTS 5
/* References to small integers are saved in this array so that they
can be shared.
The integers that are saved are those in the range
-NSMALLNEGINTS (inclusive) to NSMALLPOSINTS (not inclusive).
*/
static PyIntObject *small_ints[NSMALLNEGINTS + NSMALLPOSINTS];

PyObject *
PyInt_FromLong(long ival)
{
register PyIntObject *v;
//如果 -5 <= ival && ival < 257, 命中缓存~
if (-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS) {
v = small_ints[ival + NSMALLNEGINTS];
Py_INCREF(v);
return (PyObject *) v;
}
if (free_list == NULL) { //这个是另一个优化，相当于内存池，用链表实现
if ((free_list = fill_free_list()) == NULL)
return NULL;
}
/* Inline PyObject_New */
v = free_list;
free_list = (PyIntObject *)Py_TYPE(v);
PyObject_INIT(v, &PyInt_Type);
v->ob_ival = ival;
return (PyObject *) v;
}

而PyFloat_Object并没有（也不适合）实现这样的缓存，所以就可以解释上面的情况了。

更进一步，可以用257来验证一下，的确是超出了缓存的范围：

引用

>>> a = 257
>>> b = 257
>>> a is b
False

然后手贱做了另一个测试，蛋疼了：

引用

>>> a = 257; b = 257; a is b
True

也就是说如果让解释器一次执行的话，解释器又会再优化它，让a、b引用同一个对象。//注：这里对于float和str类型的常量效果是一样的。

为了搞清楚解释器到底是怎么实现这一点的，又把代码翻出来。之前翻的时候对解释器的执行流程已经有大致的了解了。make得到的python解释器是从 Module/python.c 里的 main() 函数开始的，调用链大约是这样：

引用

main() @Modules/python.c
Py_Main() @Modules/main.c
PyRun_AnyFileExFlags() @Python/pythonrun.c
PyRun_SimpleFileExFlags
PyRun_FileExFlags()

从 PyRun_FileExFlags 开始，才能看到底层代码正式登场：

引用

PyRun_FileExFlags()
mod_ty *mod = PyParser_ASTFromFile() //把文件转换成AST(Abstract Syntax Tree)
node *n = PyParser_ParseFileFlagsEx() //生成CST(Concrete Syntax Tree)
parsetoke() //逐个解析token
ps = PyParser_New()
for (;;)
PyTokenizer_Get() //获取下一个token
PyParser_AddToken(ps, ...) //将token加入到CST中
mod = PyAST_FromNode(n, ...) //将CST转换成AST
递归调用 ast_for_xxx 生成AST，同时过滤CST中的冗余信息
其中ast_for_atom中调用了parsenumber, 它调用PyInt_FromLong()
run_mod(mod, ...) //执行AST
co = PyAST_Compile(mod, ...) //将AST转换成CFG(Control Flow Graph) bytecode
PyFuture_FromAST()
PySymtable_Build() //创建符号表
co = compiler_mod() //编译ast为bytecode
PyEval_EvalCode(co, ...) //执行bytecode
PyEval_EvalCodeEx()

注：更详细的Python编译解释流程可参见这一系列： http://blog.csdn.net/atfield/article/category/256448

通过加入一些调试代码，可以窥探到内部的执行流。例如，在PyParser_AddToken中输出token的名称和类型编码；在PyParser_ParseFileFlagsEx()之后调用PyNode_ListTree()，可以看到生成的CST树（修改list1node()可以让它打印出更容易阅读的版本）；修改PyInt_FromLong()，让它在ival=257的时候输出创建的object的id（CPython实现中 id 其实就是指针的值）。加上一些代码以后，编译python，执行test.py可以看到如下输出：

引用

felix021@ubuntu-server:~/src/python2.7-2.7.3$ cat test.py
a = 257
b = 0x101
print a is b
felix021@ubuntu-server:~/src/python2.7-2.7.3$ ./python -d test.py
PyParser_ParseFileFlagsEx
type = 1, token: [a]
type = 22, token: [=]
type = 2, token: [257]
type = 4, token: []
type = 1, token: [b]
type = 22, token: [=]
type = 2, token: [0x101]
type = 4, token: []
type = 1, token: [print]
type = 1, token: [a]
type = 1, token: [is]
type = 1, token: [b]
type = 4, token: []
type = 4, token: []
type = 0, token: []
PyNode_ListTree:
<1>a //type=1表示是NAME
<22>=
<2>257 //type=2表示是NUMBER
<4> //这是NEWLINE
<1>b
<22>=
<2>0x101
<4>
<1>print
<1>a
<1>is
<1>b
<4>
<4>
<0> //这是ENDMARKER
Before PyAST_FromNode
name = a
ival = 257, id = 22699048 //注意这个id和下一个id不一样
name = b
ival = 257, id = 22698784
name = b
name = a
After PyAST_FromNode
True #这一行是print a is b的输出

从输出可以看到，解析源码生成CST的时候（输出的CST已经滤掉了非TERMINAL的node），每个token还保留着原始的字符串（例如0x101和257），而在CST到AST的转换过程中（PyAST_FromNode），解释器为每一个NUMBER都创建了一个PyIntObject。然而在程序运行的最终结果里可以看到，a is b的结果是True，也就是说，从AST转换到CFG并执行(run_mod)的过程中，解释器做了适量的优化，将 a 和 b 都指向了同一个 int 对象。

由于对CFG不熟，相应的代码还不太看得懂，所以暂时只能烂尾了，如果以后看懂了，再来补充。

续集：Python int缓存的那点事[续]

分页： 2/2

1 2

[ 显示模式：摘要 | 列表 ]

Felix021

记坑

常用find + grep查找封装

Yet Another False-Sharing Test

Python int缓存的那点事

联系我

最新评论

最新日志

分类

其他

统计

链接