Commentaires
Diaporama
Plan
1
GPU, a @lobal processing unit??
  • A project that should have changed the world, and a short introduction into an exciting area of research
2
Parallelizing is difficult!
  • A real life example would be trying to brush your teeth, read Mickey Mouse and get dressed at the same time
  • Hopefully by the time you get to school, you'll notice the hanger still stuck in your jacket
  • On a supercomputer: The Laplace PDE is solved on a polygon, in parallel with an iterative Jacobi method
  • Keep the communication between the nodes as short as possible! Otherwise, the more processors needed, the slower...


3
How do we program in parallel nowadays??
  • A set number (=p) of processors is defined before the program is run on the supercomputer.
  • OpenMP: The compiler takes over the whole parallelization task (the programmer only has to give directives).
  • MPI (Message Passing Interface): The programmer is the one who has to define how the nodes communicate.
  • But most of the time, a program is simply written, and is then started multiple times with different initial parameters!!
4
Comparing Supercomputer Configurations
  • This laptop performs about a half million operations per second = ½ gigaflop; this is equal to the performance of a supercomputer 10 years ago.
  • A supercomputer with shared memory of about 470 gigaflops (HP Superdome)
  • Linux Beowulf Cluster with distributed memory and 502 processors (every process has its own memory) about 266 gigaflops


5
Comparing Supercomputer Configurations
  • The Japanese supercomputer Earth Simulator computes with 35 teraflops (=35000 gigaflops)
  • The Seti@home Project, the first successful grid-computing project computes with 43 teraflops
6
Today‘s CPUs are almost unemployed!!
  • A CPU spends more than 80% of its time waiting for user input...
  • Current operating systems could be running processes in the background, without the user noticing anything
7
The Internet as Supercomputer
  • If all the computers in the world were clustered, what kind of performance could theoretically be achieved??
  • 400 million computers @ ½ gigaflop = 200 million gigaflops = 200 000 teraflops = approx. 2000 supercomputers!
8
Comparing Supercomputer Configurations
9
 Seti@home: searching for           E.T.!!
  • An old supercomputer distributes data from a radio-telescope to normal computers.
  • A small program installed on these computers analyzes the data in the background.
  • The analysis is done using a tedious Fourier transformation. The results of the analysis are then sent back to the old supercomputer.
  • Anyone can participate in the project! Even an 80486!
10
Cancer Research
  • Like Seti@home, United Devices distributes data sets, which are then processed by normal computers in the background.
11
Caution!
  • In these two projects, Seti@home and Cancer Research, both running over the Internet, the number of processors participating (=p) is not set in advance... Computers can come and go as they wish.
  • This is different from MPI and OpenMP!!
12
An Extension of the Client / Server Model??
  • Client / Server
  • Examples: Seti@home, Cancer Research
  • Peer-to-Peer network
  • Examples: Kazaa, Gnutella, GPU, a @lobal processing unit??
13
How does a Peer-to-Peer Network Function (roughly)??
14
How can this be improved?
  • An idea that came from simulations done by a team at Princeton University
15
The Idea behind GPU
  • Computers connected to the Internet run GPU. GPU automatically connects to a peer-to-peer network.
  • GPU makes a scientific library available.
  • Everyone, who has GPU installed, can also use other computers to perform their own distributed calculations.
16
The Idea Behind GPU (2)
  • GPU is has 3 parts.
  • The routing layer forwards the calculation packets.
  • A virtual machine interprets the calculation packets with the help of a library of plugins.
  • Plugins are compiled DLLs, which extend node functionality.
17
The Idea Behind GPU (3)
  • Polish notation is introduced in order to simplify the virtual machine: 1 + 1 becomes 1,1,+.
  • Calculation packets are disguised as file searches, e.g., a file search for "GPU:1,1+" is interpreted as a calculation task.
18
GPU in Practice
  • Two libraries (calculating pi and distributed calculation of the discrete logarithm)
  • Further development as an Open Source project failed due to the complexity of the task
  • You can download the prototype  from: http://sourceforge.net/projects/gpu
19
GPU in Practice (2)
  • Version 0.688 implemented everything described in the documentation... but it is very unstable.
  • Version 0.768 is very stable (thanks to the TGnutella components from Kamil Pogorzelski). Results are not (yet) sent back.
20
Screenshots
21
Screenshots (2)
22
Links
  • Global Grid Forum www.gridforum.org
  • EU Grid: http://eu-datagrid.web.cern.ch
  • Top 500 supercomputers: http://www.top500.org
  • Seti@home http://setiathome.berkeley.edu
  • Cancer Research http://members.ud.com/projects/cancer


23
http://gpu.sourceforge.net