Saturday, January 19, 2013

Problems with AMD Catalyst 13.1 on Ubuntu

AMD Catalyst 13.1 driver was released two days ago but I encountered problems when I tried to generate Ubuntu packages for both 10.04 & 12.04 releases.

In case anybody is interested, I'm providing some brief workarounds here:
One has to extract the driver files using the "--extract" option.
In both cases the problem was due to the "rules" file located under the "packages/Ubuntu/dists/{precise/lucid}" directory. Thus, the following changes had to be made in "rules" file.

In case of Ubuntu 12.04 the following line:
dh_install -p$(PKG_driver) "arch/x86_64/usr/share/ati/lib" "$(datadir)/ati"
had to be replaced with:
dh_install -p$(PKG_driver) "arch/x86/usr/share/ati/lib" "$(datadir)/ati"


In case of Ubuntu 10.04 the following line had to be appended after line 69:
 SRC_other_arch := x86_64
and the following line had to be appended after line 151:
  -e "s|#SRCOTHERARCH#|$(SRC_other_arch)|g" \

All packages then should be created as usual by giving:
 sudo ./ati-installer.sh 9.012 --buildpkg Ubuntu/precise
or
 sudo ./ati-installer.sh 9.012 --buildpkg Ubuntu/lucid




Monday, January 14, 2013

nbench on small linux devices

One of the benchmark programs that I find most convenient to use is nbench. The reason is that it's applicable on almost every device that can execute plain C code. This means that it can run on a desktop computer as well as on a smartphone (nbench is freely available on Google Play) or a flashed router with a custom firmware (e.g. DD-Wrt with optware).

Here are three devices that I have tried it on:
Raspberry PI
Raspberry PI
Asus RT-N16
ASUS RT-N16
Linksys NSLU2 
RaspPI and NSLU2 are ARM based where the RT-N16 is MIPS based.

Here you can see the results running it on a Raspberry PI (Raspbian OS):

BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :          221.64  :       5.68  :       1.87
STRING SORT         :          31.709  :      14.17  :       2.19
BITFIELD            :      8.4099e+07  :      14.43  :       3.01
FP EMULATION        :          46.363  :      22.25  :       5.13
FOURIER             :          2372.8  :       2.70  :       1.52
ASSIGNMENT          :          2.4781  :       9.43  :       2.45
IDEA                :           696.1  :      10.65  :       3.16
HUFFMAN             :          424.38  :      11.77  :       3.76
NEURAL NET          :          3.0098  :       4.83  :       2.03
LU DECOMPOSITION    :           78.72  :       4.08  :       2.94
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX       : 11.729
FLOATING-POINT INDEX: 3.761
Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU                 :
L2 Cache            :
OS                  : Linux 3.2.27+
C compiler          : gcc-4.7
libc                : /lib/arm-linux-gnueabihf/libgcc_s.so.1
MEMORY INDEX        : 2.528
INTEGER INDEX       : 3.266
FLOATING-POINT INDEX: 2.086
Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.


Here running it on Linksys nslu2 fileserver (flashed with SlugOS):

BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :          74.271  :       1.90  :       0.63
STRING SORT         :          6.9679  :       3.11  :       0.48
BITFIELD            :      1.8159e+07  :       3.11  :       0.65
FP EMULATION        :          17.645  :       8.47  :       1.95
FOURIER             :          75.723  :       0.09  :       0.05
ASSIGNMENT          :         0.96228  :       3.66  :       0.95
IDEA                :          176.19  :       2.69  :       0.80
HUFFMAN             :          104.82  :       2.91  :       0.93
NEURAL NET          :         0.10509  :       0.17  :       0.07
LU DECOMPOSITION    :          3.3757  :       0.17  :       0.13
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX       : 3.324
FLOATING-POINT INDEX: 0.136
Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU                 :
L2 Cache            :
OS                  : Linux 2.6.27.8
C compiler          : gcc version 4.2.4
libc                :
MEMORY INDEX        : 0.668
INTEGER INDEX       : 0.976
FLOATING-POINT INDEX: 0.076
Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.


And finally here running it on a Asus RT-N16 router (flashed with DD-Wrt with optware):

BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :           160.6  :       4.12  :       1.35
STRING SORT         :          3.7864  :       1.69  :       0.26
BITFIELD            :      6.3597e+07  :      10.91  :       2.28
FP EMULATION        :            28.6  :      13.72  :       3.17
FOURIER             :          19.904  :       0.02  :       0.01
ASSIGNMENT          :           1.753  :       6.67  :       1.73
IDEA                :          670.35  :      10.25  :       3.04
HUFFMAN             :          40.453  :       1.12  :       0.36
NEURAL NET          :        0.015345  :       0.02  :       0.01
LU DECOMPOSITION    :         0.43656  :       0.02  :       0.02
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX       : 5.017
FLOATING-POINT INDEX: 0.023
Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU                 :
L2 Cache            :
OS                  : Linux 2.6.24.111
C compiler          : gcc version 4.1.1
libc                : ld-uClibc-0.9.28.so
MEMORY INDEX        : 1.011
INTEGER INDEX       : 1.470
FLOATING-POINT INDEX: 0.013
Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.

It should be noted that the latter two devices do not feature a floating point unit and thus the performance on floating point intensive is extremely low.

One of the drawbacks of nbench application is that it is written as a single threaded application so it cannot exploit the extra cores of a multicore CPU. One of my future hobby projects could be porting nbench program to OpenMP or even OpenCL in order to exploit the full capabilities of a contemporary CPU or even a GPU. It would be fun of comparing a Raspberry PI with a GTX580 on nbench!

Saturday, January 5, 2013

A GPGPU comparison (K20, 7970, GTX680, M2050 & GTX580)

I found a nice GPGPU comparison on a blog. It's very interesting as it exposes some practical benchmark results of all the latest GPUs in market in a range of 4 problems of different nature (bandwidth limited or compute intensive).

The GPUs compared are:

  1. NVidia Tesla K20
  2. NVidia GTX 680
  3. NVidia Tesla M2050
  4. AMD HD 7970

The 4 problems are:
  1. Digital Hydraulics
  2. Ambient Occlusion
  3. Running Sum
  4. Geometry Sampling
The results as presented are illustrated bellow:

As can be seen, the Keplel architecture is not as great as it was expected (at least the compute-optimized K20 chip). The older Fermi architecture seems to sustain a decent performance. In addition, the AMD GPU seem to be a good opponent exposing the benefits of the Southern Islands architecture in compute applications.

For the original full article click here:
http://wili.cc/blog/gpgpu-faceoff.html

Wednesday, January 2, 2013

Raspberry PI as a home server

Now, this is my very first post.
Here is my new low energy server known as the Raspberry PI. Here it serves web pages via lighttpd and VOIP telephony via asterisk. It's definitely a low power server so one does not have to mind turning it on and off in order to save energy.


It's also very cheap (~35$) and quite popular.