GCC 4.7 includes support for auto-vectorization, a method of automatically converting scalar programs into vectorized programs to increase performance. The most common method of achieving this is loop vectorization, which converts procedural loops that iterate over multiple pairs of data and assigns an independent processing unit to each pair. Programs that utilize loops heavily have the most to gain from employing auto-vectorization.
2 Enabling Auto-VectorizationEnabling auto-vectorization is quite simple, requiring only a few CFLAGS. There are two flags that are required for basic auto-vectorization:
gcc -ftree-vectorize -maltivec
-ftree-vectorize is the default flag used to turn on auto-vectorization. The -maltivec flag is used on PowerPC architectures and informs gcc to utilize the Altivec instruction set.
It is possible that you may also wish to pass -ffast-math or -fassociative-math to enable vectorization of floating point reductions or to allow re-ordering of operations. It is important that you understand what these flags do before using them as they are not IEEE compliant and may result in the auto-vectorization process altering program behaviour.
3 ConclusionWhile GCC 4.7 is fairly intelligent in vectorizing your code there are ways to increase the performance gains even further. This article investigates how gcc behaves when dealing with different types of operations.
The gnu website contains a list of loops that are currently vectorizable




