这几天和同事们在讨论技术方案,现有的功能、服务已经实现了,主要是后续代码优化和效率提升。code review过程中发现了一个效率瓶颈。故事大概是这样的:同事需要在一大坨的数据中按照某种标准缩小范围,如:数据池有1T数据,我们根据某种条件过滤,满足条件的有1G数据。在这小范围数据集合中,再进一步做数据处理和选择,如:找最小值或者最大值。同事在缩小数据范围的时候,将所有满足条件的数据都存到了一个buffer中,做内存赋值操作;然后,第二步,从buffer中进一步找想要的数据。不过我觉得,在第一步的时候,没必要做内存赋值操作,可以直接在此时进行第二步来找理想的数据。内存赋值操作,具体是将内存中一个地址的值存放到另外一个内存地址上,走的是内存IO和总线带宽,这个比cpu的数值计算操作代价高了几个数量级。所以感觉这是一个效率瓶颈。同事的实现方法,更多的是写程序的时候舒服了,不过运行的时候难受了。
自己在本机上用fake data做个试验,来验证以上的想法。具体来说,就是申请一块大内存,且初始化成一系列整数,取所有能够整除‘3’的数字,找这些数字的最大者。带有内存赋值的c++代码如下:
// 1. create the fake data int *pSource = new int [ARRAYSIZE]; for (int i=0; i<ARRAYSIZE; i++) { if (0 == i%2) pSource[i] = i; else pSource[i] = ARRAYSIZE - i; } int *pTmpBuf = new int [ARRAYSIZE]; memset (pTmpBuf, 0, ARRAYSIZE*sizeof(int)); // 2. memory copy and compare, repeat n times clock_t tBegin1 = clock(); double dCopyMemTotalClocks = 0.0; double dComparingTotalClocks = 0.0; for (int i=0; i<300; i++) { if (0 == i%100) cout << "During " << i << " loops..." << endl; // 2.1 copy the values in memory clock_t tBegin1_1 = clock(); int iTmpBufSize = 0; for (int j=0; j<ARRAYSIZE; j++) { if (0 == pSource[j]%3) pTmpBuf[iTmpBufSize++] = pSource[j]; } clock_t tEnd1_1 = clock(); dCopyMemTotalClocks += (double)(tEnd1_1 - tBegin1_1); // 2.2 compare the items in temp buffer clock_t tBegin1_2 = clock(); double dMax = -1.0; for (int k=0; k<iTmpBufSize; k++) { if (pTmpBuf[k] > dMax) dMax = pTmpBuf[k]; } clock_t tEnd1_2 = clock(); dComparingTotalClocks += (double)(tEnd1_2 - tBegin1_2); } clock_t tEnd1 = clock(); cout << "Baseline time consuming " << (tEnd1 -tBegin1)/CLOCKS_PER_SEC << endl; cout << "copying memory time consuming " << dCopyMemTotalClocks/CLOCKS_PER_SEC << endl; cout << "comparing value time consuming " << dComparingTotalClocks/CLOCKS_PER_SEC << endl;
During 0 loops... During 100 loops... During 200 loops... Baseline time consuming 161 copying memory time consuming 153.326 comparing value time consuming 8.261
作为对比,又在下面写了如下代码:
// 3. no memory copy and compare it instantly clock_t tBegin2 = clock(); for (int i=0; i<300; i++) { if (0 == i%100) cout << "During " << i << " loops..." << endl; double dMax = -1.0; for (int j=0; j<ARRAYSIZE; j++) { if (0 == pSource[i]%3) { if (pSource[i] > dMax) dMax = pSource[i]; } } } clock_t tEnd2 = clock(); cout << "targeting time consuming " << (tEnd2 -tBegin2)/CLOCKS_PER_SEC << endl;
During 0 loops... During 100 loops... During 200 loops... targeting time consuming 28
完。
转载请注明出处:http://blog.csdn.net/xceman1997/article/details/20053363
【重新上本科】在实际问题中,内存赋值所拖累的效率,布布扣,bubuko.com
原文:http://blog.csdn.net/xceman1997/article/details/20053363