The File Distributor is a simple way to distribute files (sometime referred as work units in this document) to clients (computers running a particular application). The clients are expected to process the downloaded file and to report back a result in form of a string. It is also possible to distribute data to people, as the FD provides a way to get and report data via the web interface, though the web interface is more thought as a simple way to explain how FD works.
GPU File Distributor runs on Apache webserver with the PHP module backed by a mySQL database.
Therefore, you should first download from the Apachefriends.org website XAMPP, a package which bundles Apache, mySQL and phpMyAdmin (and more... :-) for Linux and Windows. FD was developed with XAMPP 1.5.5, but any newer version might work as well. Then you should install XAMPP. Using the XAMPP Control Panel start Apache and mySQL.
Download the File Distributor PHP scripts, unzip them and copy
them to the htdocs
directory created by XAMPP, so that you get a htdocs/file_distributor
directory
full of PHP scripts.
Go to the directory htdocs/file_distributor/config
, take example_config.inc.php, copy it to
config.inc.php. Edit the variables in config.inc.php depending on your hardware
and network configuration.
As next step, you should visit http://localhost/phpmyadmin, to access phpMyAdmin,
an interface that speaks with the mySQL database. In phpMyAdmin, create a database file_distributor
,
and a user fd_user
with a password you can set in htdocs/file_distributor/conf/config.inc.php
.
If the file_distributor is published, you should also modify the constant $dns_name
with the IP number of your
computer or with its DNS name.
Now take htdocs/file_distributor/db/script.sql
and go back to
http://localhost/phpmyadmin.
Execute the SQL script with the phpMyAdmin interface inside the file_distributor
database you created.
There are three types of users: Computational Users, Project Managers and Administrators. Computational Users can only browse through projects, in particular they can watch reported results and see the progress in percent of a project. Project Managers can create, start and stop their own projects, if they can access the $root_projects directory of the server. Administrators can create, start, stop and delete any project and create new users.
Log in first as Administrator with username and password you specified in config.inc.php (variables $admin_user and $admin_pwd), so that you can create your own users.
In htdocs/conf/config.inc.php
at $root_projects
, it is possible to set an absolute path to folders, where each project is stored.
If you do not modify the configuration, you will find a directory htdocs/file_distributor/projects
. In this directory, you can create folder
that represent your project. Put the files, the work units you would like to distribute, inside these folders. As an example, a directory projects/test
already exists, with some files in it.
As next step, you should go to the FD web interface, and choose Insert project. Give a name to the project (mandatory, e.g. Test Project), and give the folder name (mandatory, test in our case).
Decide how many times the same work units will be distributed (number of passes). In distributed computing projects, it is common to choose an odd number, for example 3. So, if one client has some fault in its Floating Processing Unit, you will have three results. Probably, 2 of them are equal, while the client with faulty FPU will report something else. This approach is often called in literature the majority rule.
Once you pressed Submit, the PHP scripts will scan all files inside the folder and put them into the database, so that they are ready to be distributed.
If you go to List projects, you will see that the project Test project
is in status Ready
. You have to start it, so that clients can start getting work units.
When a new project is imported in FD, it gets the state Ready. Work units are delivered only if the project is in state Started, therefore you need to start a project in the List projects window first. Similarly, if you want to stop delivery of work unit, you have to choose the action Stop, which will put the project in state Stopped. Once all work units are delivered, the project goes to the state Delivered. Only if for all units there are reported results, the project goes in state Closed. If you want to restart a project, choose the action Reset. The project will go back to the state Ready However, you will loose all reported results.
Once a project is configured, and it is in status Started, it is possible to ask FD for work units. If you choose Get work unit on the main page of the FD, you have to provide the folder name of the project, and a string to identify yourself (called processor in FD).
FD answers with four lines of text: the first line and the third line are comments. The second line is the URL where the work unit has to be downloaded, the fourth line is the URL one has to visit to report the result. After you downloaded the work unit, you should compute the result (a string). Take the fourth line and concatenate it with the result, then visit the resulting URL with your browser. Doing so you reported the result, which is now stored permanently into the database.
You can look at delivered results in the List Projects menu, if you choose the View action on the corresponding project.
Each work unit can be in three states, Ready, Delivered and Reported. Depending on how clients access the FD, the work unit jumps from one state to another. There is no logic attached to these states, and you can look at the work unit state only to know what happend last.
Reporting work manually is a little bit more difficult, if you did not remember the URL given in the fourth line (see previous chapter).
First, you need to find out the Work Unit id as follows: go to List Projects menu, choose the action View. Find out which work unit you have and remember its id (the number leftmost in the table). Then go to the menu Report work unit, and fill the form with Work Unit Id, Project Id, current pass and result. When you are finished, click on the Submit button.
You can look at reported results in the List Projects menu, if you choose the View action on the corresponding project.
If you want to launch your application, try the Standalone File Distributor Client that you can download here. Read the included README.html instructions, to know how to configure the Standalone Client, so that it interfaces properly with the File Distributor Server.
Once you downloaded and installed the GPU Lite Client, you should go to the menu Frontends - Console Frontend. Put a check inside the checkbox Resend job each 30 seconds. The command you shold send has the following syntax:
'[FileDistributor URL]', '[Folder Name]', '[Name of the node processing data]', file_distributorAn example looks like:
'www.gpu-grid.net/file_distributor','orsa',nodename, file_distributorOnce you press on the Send to GPU button, the stack evaluator will first substitute nodename with the name of your computer, e.g. andromeda:
'www.gpu-grid.net/file_distributor','orsa', 'andromeda', file_distributor
The command will activate the file distributor plugin, exactly in the same manner as with the previous Standalone Client.
Each 30 seconds, the activation job is resent to all nodes on the network. The plugins on the nodes automatically connect to the File Distributor Server, download work units and the attached application. The application is launched. After the application finishes, all produced files are sent back to a predefined FTP server (if defined into the 'Insert Project' page).
GPU Full Version ships with a plugin called downloader.dll. This plugin contains routines to interface with the File Distributor.
Their names are download_fd and report_fd. They work with the exponential wait mechanism described in the previous chapter.
If you have problems in understanding the next chapters, you should familiarize yourself with the stack evaluation mechanism inside GPU first. It is more than enough if you read the first page with the virtual machine explanation.
download_fd takes as parameters the URL of the File Distributor, the folder and the node processing the data. The function download_fd will download locally the work unit into plugins/input subfolder. As result, the URL where to report the result and the filename of the work unit is loaded on the stack.
'spartacus.is-a-geek.net/file_distributor','orsa', nodename, download_fd
'spartacus.is-a-geek.net/file_distributor','orsa', 'andromeda', download_fd
'http://spartacus.is-a-geek.net/file_distributor/report_work.php?work_id=127&pass_id=23&processor=andromeda','C:\Program Files\GPU\plugins\input\pioneer.orb'
Crunching the work unit means replacing the file name of the work unit on the stack with some meaningful result.
As a first example, you can add to the GPU command you sent to GPU this sequence of commands: , pop, rnd, tostr. These commands will pop the filename from the stack, put a random number on the stack and convert it to a string parameter. Any other crunching command will work. In particular, you could learn how to implement a GPU plugin that does exactly the work you had in mind.
Or you could use the launch to launch your external executable placed in the binexec subfolder of GPU. Your executable will be launched with the absolute path to the filename of the work unit to be crunched. More documentation on the Applauncher plugin is here.
Therefore, our example could get
report_fd takes the URL to report results returned by download_fd and the result after crunching the work unit computed by some other plugin, concatenates both and reports them back to the File Distributor.
Therefore, it is enough to add report_fd at the end of our examples, so that results are reported back to the FD. You can watch reported results, in the List Projects menu, if you choose the View action on the corresponding project.
Our examples get then
As a resume, we repeat here the syntax of the commands implemented by downloader.dll.
Syntax is: '[URL]', download_wait_exp '[FileDistributor URL]', '[Folder Name]', '[Name of the node processing data]', download_fd '[Report URL]', '[Result to be reported]', report_fd
Please note: if you want to report a float instead of a string, you can add a call to the function tostr as it was done in the example before.
If you have written a client in the language of your choice, you should link it with libraries which can visit an URL page and download it locally.
To get a work unit, you should visit an URL which is composed by the URL of the file distributor/get_work.php? plus three parameters: the folder name, the processor name (the string that identifies the client, and some junk in form of a random number, to circumvent proxy caches. Separate parameters with the AND char.
As an example, the client should visit the URL http://www.gpu-grid.net/file_distributor/get_work.php?folder=test&processor=yourname&junk=1234.
There is also an version if the client touches the URL:
http://www.gpu-grid.net/file_distributor/get_work.php?folder=test&processor=yourname&xml=1&junk=1234.
This URL is the same as the previous one, but has a xml=1 parameter.
After accessing the URL, the client should store it into a text file. As in the manual way, the first line is the most important. If the client finds the string Error inside the first line, then it should stop and do only cleanup steps. Something happened, maybe the project is not in state Started. For example, it could be in state Delivered because all work units are already delivered.
If the Error string is not found somewhere inside the first line, then the client can drop the first and third line as they are only comments. The second line is the URL where to download the file, and the fourth line is the URL where to report the result.
The client now downloads the work unit using the URL in the second line and processes it.
The computed result is added to the fourth line, and the client visits this newly created URL to report the result.
Again, you can look at reported results in the List Projects menu, if you choose the View action on the corresponding project.
If downloading from an URL using HTTP gives a download error, then the right strategy is to wait a random amount of time between a certain time slot. Each time the download fails, you should double the size of the time slot where the random amount of wait time is picked up. This strategy is often referred in literature as exponential waiting. This simple strategy avoids to overload the webserver with requests, if all clients access it at the same time.
If you need help in setting up your self made distributed computing project with FD, you can contact us at the GPU mailing list. We appreciate your feedback.
Have fun!