Quantcast
Channel: CodeGuru Forums - Visual C++ Programming
Viewing all articles
Browse latest Browse all 3029

SSE programming - no speedup

$
0
0
Hi everyone,

I'm trying my hand at SSE programming and at this point in time using SSE does not seem to result in a speedup for me.
This as opposed to example code where I do see such a speedup (for example http://supercomputingblog.com/optimi...e-programming/).

Could you please take a look at my code below and tell me if perhaps I'm overlooking something?

First the scalar code:

Code:

float result[1000];
float value[1000];
float weight[1000];

...

for(int i=0;i<1000;i++)result[i] += value[i] * weight[i];

I have turned this into the following SSE code:

Code:

float result[1000];
float value[1000];
float weight[1000];

...

__m128 * result128 = (__m128 *)result;
__m128 * value128 = (__m128 *)value;
__m128 * weight128 = (__m128 *)weight;

for(int i=0;i<250;i++)result128[i] = _mm_add_ps(result128[i], _mm_mul_ps(value[i] ,weight[i]));

One explanation would of course be that the compiler is already vectorizing the scalar code, meaning that there would be no additional speedup to be found. However, the same code seems to run approximately as fast when using doubles instead of floats, while with vectorization you would expect it to run at only half the speed.

Is there possibly something I'm overlooking here?
One explanation could be that too much time is wasted on moving the relevant values around in memory, which would reduce the speedup from vectorization to noise.
If that's the reason, how could that be fixed?

Thank you in advance!

Viewing all articles
Browse latest Browse all 3029

Latest Images

Trending Articles



Latest Images