The LLVM group has just got a new logo, a modernized version of the 'dragon book' dragon. But the real exciting news I saw on GPGPU.org is a project 'GPUocelot'. This program will translate compiled PTX programs (produced by nVidias CUDA) via the just-in-time LLVM compiler to any targeted backend, meaning for example a PS3 CELL processor. All you need to do is add an AMD backend to LLVM, and hey-presto, instant CUDA-for-ATI. That could potentially put a dent into OpenCL's plans..
The papers from the High Performance Graphics conference are also out, one that caught my eye was Understanding the Efficiency of Ray Traversal on GPUs, not really because I thought the paper was fundamentally groundbreaking, but because it explained a few neat tricks on nVidias part, in particular, a good explanation of persistant threads in CUDA for breaking down non-uniform workloads from a global pool. (eg: They used it on a per-ray basis, so that fast and slow rays don't "block" each other.)
Thursday, July 30, 2009
Virgin Airlines
Virgin airlines canceled my flight.
Needless to say I'm pretty annoyed.
They informed the passengers in the most elegant of ways, they changed the sign from "boarding" to "canceled".
Nice work.
Never flying virgin again, I think the extra $30 for Qantas is well worth it, I mean, you get a free meal too right?
Needless to say I'm pretty annoyed.
They informed the passengers in the most elegant of ways, they changed the sign from "boarding" to "canceled".
Nice work.
Never flying virgin again, I think the extra $30 for Qantas is well worth it, I mean, you get a free meal too right?
Wednesday, July 22, 2009
Fracture Physics
Eric Parker from Pixelux and James O’Brien have just released their paper Real-Time Deformation and Fracture in a Game Environment. It's an interesting read, and reveals some of the details behind the impressive Pixelux engine. Basically they use tetrahedral finite elements and just make it look like it has more detail than the physics really has. Still, seems to work pretty well.
Of note from the demoscene world is h4vok by Archee which features fractures in 4k, nicely timed to music. Very impressive.
Of note from the demoscene world is h4vok by Archee which features fractures in 4k, nicely timed to music. Very impressive.
Groundwater Flow on GPU
I am presently developing a proof-of-concept groundwater flow GPU program. The algorithm works similar to a finite differences algorithm, or many kernel-based image processing systems.
Initially I just ported the code over to CUDA, and thanks to having done this a few times before the CPU version I wrote was easy to transfer. A good time saver is to use macros to index arrays, etc. This makes it easy to swap in __mul24, or tex2D etc. later.
The next step was to use shared memory to buffer the input data, this gave the biggest performance boost. Finally I did some arithmetic optimizations for another small gain.
On the advice of a friend I tried re-structuring the GPU program to use 'aprons' similar to the 'Separable Convolution' SDK sample, and tried restructuring read/writes. This all made almost no difference at all, so it seems that the 'sweet-spot' can be hit quite quickly as soon as you have done the obvious shared memory and arithmetic optimizations. The overall structure of the program seems to make little difference.
A common bit of advice is to leverage the texture units on the GPU's, but a simple modification of the 'Texture-based Separable Convolution' sample program, reveals it is infact almost twice as slow as the non-texture based. Seems like the texturing unit speedups are a bit of a myth.
Benchmarking the program has been a bit of a problem, since it is very dependent on the type of GPU you have, and the problem size. If I process a few hundred thousand nodes, the speedup is around 15x over the CPU, but when I move to processing tens of millions of nodes, the speedup is over 20x. Processing the same data on a slightly older GPU will only give a 4x speedup (Compute 1.0).
All in all I've found it extremely difficult to give an overall answer on the performance gain of the GPU. It seems to be highly dependent on the problem size (the bigger the better - thank god!) and the GPU technology (this is going to make porting the software to multiple GPUs a pain!).
Initially I just ported the code over to CUDA, and thanks to having done this a few times before the CPU version I wrote was easy to transfer. A good time saver is to use macros to index arrays, etc. This makes it easy to swap in __mul24, or tex2D etc. later.
The next step was to use shared memory to buffer the input data, this gave the biggest performance boost. Finally I did some arithmetic optimizations for another small gain.
On the advice of a friend I tried re-structuring the GPU program to use 'aprons' similar to the 'Separable Convolution' SDK sample, and tried restructuring read/writes. This all made almost no difference at all, so it seems that the 'sweet-spot' can be hit quite quickly as soon as you have done the obvious shared memory and arithmetic optimizations. The overall structure of the program seems to make little difference.
A common bit of advice is to leverage the texture units on the GPU's, but a simple modification of the 'Texture-based Separable Convolution' sample program, reveals it is infact almost twice as slow as the non-texture based. Seems like the texturing unit speedups are a bit of a myth.
Benchmarking the program has been a bit of a problem, since it is very dependent on the type of GPU you have, and the problem size. If I process a few hundred thousand nodes, the speedup is around 15x over the CPU, but when I move to processing tens of millions of nodes, the speedup is over 20x. Processing the same data on a slightly older GPU will only give a 4x speedup (Compute 1.0).
All in all I've found it extremely difficult to give an overall answer on the performance gain of the GPU. It seems to be highly dependent on the problem size (the bigger the better - thank god!) and the GPU technology (this is going to make porting the software to multiple GPUs a pain!).
MAGIC 2010
Do you remember the DARAP Grand Challenge? Well now the Australian DSTO is holding its own challenge, the Multi Autonomous Ground-robotic International Challenge (MAGIC 2010), with a total of $1.6 million US in prize money. The goal is to use a multi-robot team to perform "intelligence, surveillance and reconnaissance mission in a dynamic urban environment."
The MAGIC requires entrants to complete the following tasks:
(i) Accurately and completely explore and map the challenge area;
(ii) correctly locate, classify and recognise all simulated threats; and
(iii) complete all phases within 3.5 hours.
The final event is scheduled to take place during the week of November 8, 2010, somewhere in South Australia.
I'm putting together a team from WA, if you are interested let me know.
The participants conference is in Adelaide next week. See you there!
Friday, July 10, 2009
15 ton Wiimote robot
Apparently the Transmin software team are famous now. Simon Wittber and Dan Adams modified our grapple control system to work with the Wiimote.
There is a video out on youtube:
I haven't seen it on the usual sites I watch, but I guess I'm reading the wrong stuff..
There is a video out on youtube:
I haven't seen it on the usual sites I watch, but I guess I'm reading the wrong stuff..
Tuesday, July 07, 2009
Simple bootloader
Writing your own OS is something every computing person should do at some point. The first step is of course writing a boot loader. This is actually very easy to do under linux.
With a fresh install of ubuntu, make sure you have GCC and related goodies, the other things you will probably want are nasm, and virtual machine like QEMU. Installing these in ubuntu is as simple as:
(you may need to modify your /etc/apt/sources.list - any core ubuntu mirror will do, I used au.archive.ubuntu.com/ubuntu)
Now you probably want to try qemu out before going further, so grab freedos:
http://www.freedos.org/freedos/files/ and grab an iso, I got fdbasecd.iso
Now we create a virtual hard drive to install freedos on to: (do this in the same dir as the iso)
And then we just boot up qemu:
Freedos install is a bit obtuse, you need to format the drive to FAT16, exit the formatter, then again to FAT32, but I just pretty much just went with the defaults for everything, afterall, this is just for testing. At then end, you should have a working DOS prompt.
Now that we have QEMU working and we know it, let's try our own boot loader:
Save the code to loader.asm, and assemble it with:
Then, we can run our wonderful loader with:
With a fresh install of ubuntu, make sure you have GCC and related goodies, the other things you will probably want are nasm, and virtual machine like QEMU. Installing these in ubuntu is as simple as:
sudo apt-get install nasm
sudo apt-get install qemu
(you may need to modify your /etc/apt/sources.list - any core ubuntu mirror will do, I used au.archive.ubuntu.com/ubuntu)
Now you probably want to try qemu out before going further, so grab freedos:
http://www.freedos.org/freedos/files/ and grab an iso, I got fdbasecd.iso
Now we create a virtual hard drive to install freedos on to: (do this in the same dir as the iso)
qemu-img create -f raw freedos.img 100M
And then we just boot up qemu:
qemu -localtime freedos.img -cdrom fdbasecd.iso -boot d
Freedos install is a bit obtuse, you need to format the drive to FAT16, exit the formatter, then again to FAT32, but I just pretty much just went with the defaults for everything, afterall, this is just for testing. At then end, you should have a working DOS prompt.
Now that we have QEMU working and we know it, let's try our own boot loader:
[BITS 16] ; 16 bit code generation
[ORG 0x7C00] ; ORGin location is 7C00
;Main program
main: ; Main program label
mov ah,0x0E ; This number is the number of the function in the BIOS to run.
; This function is put character on screen function
mov bh,0x00 ; Page number (I'm not 100% sure of this myself but it is best
; to leave it as zero for most of the work we will be doing)
mov bl,0x07 ; Text attribute (Controls the background and foreground colour
; and possibly some other options)
; 07 = White text, black background.
; (Feel free to play with this value as it shouldn't harm
; anything)
mov al,65 ; This should (in theory) put a ASCII value into al to be
; displayed. (This is not the normal way to do this)
int 0x10 ; Call the BIOS video interrupt.
jmp $ ; Put it into a coninuous loop to stop it running off into
; the memory running any junk it may find there.
; End matter
times 510-($-$$) db 0 ; Fill the rest of the sector with zeros
dw 0xAA55 ; Boot signature
Save the code to loader.asm, and assemble it with:
nasm loader.asm
Then, we can run our wonderful loader with:
qemu loader
Monday, July 06, 2009
Flying Winged Robots
There is just something inherently cool about miniature robots with wing's.
Take a look at these videos, the first from Shimoyama-Matsumoto Laboratory, University of Tokyo, the second from AeroVironment. DARPA is continuing funding for its Nano Air Vehicle Development Program, so I look forward to having miniature flying insect-like robots following my every move in the future.
Festo's AquaPenguins, and AirPenguins are perhaps the most impressive, although I'm pretty sure I'd see them coming. Bioinspired robots seem to be doing so well for these kinds of goals. Although these videos do make me wonder sometimes if these things haven't been photoshopped.
Take a look at these videos, the first from Shimoyama-Matsumoto Laboratory, University of Tokyo, the second from AeroVironment. DARPA is continuing funding for its Nano Air Vehicle Development Program, so I look forward to having miniature flying insect-like robots following my every move in the future.
Festo's AquaPenguins, and AirPenguins are perhaps the most impressive, although I'm pretty sure I'd see them coming. Bioinspired robots seem to be doing so well for these kinds of goals. Although these videos do make me wonder sometimes if these things haven't been photoshopped.
Friday, July 03, 2009
Paris Game AI
I stumbled across the Paris Game AI conference, there was some interesting stuff.
The most interesting was Mikko's open source automatic navmesh generator. The demo makes the technology look useable enough.
If your not convinced by the navmesh approach yet, the AI blog has an excellent discussion of thebenefits of a navmesh over waypoints. Including some amusing failed videogame pathfinding algorithms.
The most interesting presentations were Coordinating Agents with Behavior Trees for Farcry from Crytech. It's a different approach to behavior management than the standard emergent behavior approach using a blackboard. I'm not convinced its really novel, all they have done is named the approach taken by your standard undergrad robotics students, but they make a good argument for it. The big advantage is that you have far more control over how the behaviors are activated, which is something you really need in games if you have a demanding director.
Another great presentation is Killzone 2 Multiplayer Bots. Not because there is anything amazingly new in it, but just because they cover their entire process, squads, 'strategic reasoning' by annotating special AI nodes with certain behavior, the waypoint network, and influence maps.
While I am on the topic of game AI, the botprize will be having it's first round playoffs at ECU in Perth next week, should be fun!
Batch files and SVN
Batch files are useful for writing small automated scripts in windows.
You just type any command you like into a text-file with the extension ".bat" and it will do the magic for you.
Some extra commands/parameters will help:
%1,%2,%3 ... will let you pass additional paramters
echo will print something to the screen
@echo off will stop the bat file from printing everything to the screen
REM will make a line into a comment
> will let you pipe the output between commands, just like on *nix.
set /p variable= [string] will let you prompt for input
exist will let you test if a file exists
You also have programming constructs including 'if' and 'goto'. (For an example see this batch file sorting routine)
That's pretty much all you could ever need.
So a simple example, named test.bat:
@echo off
echo hello %1
If I call this with "test Adrian" it will print "hello Adrian" to the screen.
@echo off
set /p name= What is your name?
echo hello %name%
This does the same, except prompts me to enter my name.
There are more than just the %1..%n variable for commands, there is also %0 to tell you the name of the bat file. Even better, you can extract the path with %dp0
eg:
@echo off
echo %~dp0
So now you can invoke Tortoise SVN on the command line to download and install your favourite software.
For example:
tortoiseproc /command:checkout /path:%~dp0 pal /url:https://pal.svn.sourceforge.net/svnroot/pal/pal /closeonend:1
tortoiseproc /command:checkout /path:%~dp0 bullet /url:http://bullet.googlecode.com/svn/trunk /closeonend:1
Even better, we can extend this with exists to see if the code is already there, and do an update instead:
if not exist pal\NUL goto nopal
tortoiseproc /command:update /path:%~dp0pal /closeonend:1
goto NEXT
:nopal
tortoiseproc /command:checkout /path:%~dp0 pal /url:https://pal.svn.sourceforge.net/svnroot/pal/pal /closeonend:1
Enjoy!
Thursday, July 02, 2009
Getting started on the PS3
It's been a while since I set up my PS3 with Linux, but I remember it being a lengthy process. Installing Yellow Dog Linux on the PS3 is relatively straight forward, but then getting the compiler tool chain up and running took a while, mostly in just figuring out what to do. If you only have RCA out on the PS3 you need to install Linux in text-mode, which I remember being a bit dramatic (all the instructions assume you have the GUI environment).
Yellow dog has an package manager called 'yum' (Yellow Dog Updater, Modified).
After the install do a 'yum update'.
(Note, if you are behind a proxy you will need to set the http_proxy variable. For other networking issues 'ifconfig' up/down and 'dhclient' are your friends.)
Now do a search in yum for any SPU/PPU packages, ie: 'yum search spu' and install the relevant ones. (I can't recall which ones, probably libspe2,spu-binutils,spu-gcc,spu-newlib,ppu-binutils). At the end of this you should have spu-gcc, ppu-embedspu, ppu-ar and the usual suspects (gcc/g++).
You have these different compilers because the SPE and the PPE are completely different processors, it's like having two different computers in one box. So the spu-gcc compiles code only for the SPE, and the 'normal' gcc compiles code for the Power PC.
Obviously the first thing to try is 'hello world' for the PPE, but after that a little SPU/PPU program is what to try. You should be able to find some code on the web, (try the GATECH STI website, or Jeremy's SPE hello world examples) so then you just need to build it:
Or, as a concrete example:
So in Step 1, we are compiling the SPE program, in Step 2 we are embedding it into an object file by providing the symbol name (eg: extern spe_program_handle_t hello_spu), the binary name (the result from spu-gcc), and the final object file.
In Step 3 we are creating a library from the object, and in Step 4 we are linking it all together.
The entire 'embedding' thing can be a bit confusing, but Alex Chow provides a great overview.
(Note: If you are working with windows, putty and winscp will make your life a lot easier. If your new to linux, try the 'nano' editor (or 'pico'), as opposed to the more powerful, and difficult to master vim. You can make something executable with 'chmod +x filename'. If you stuff up the console, typing 'reset' will save the day)
Good luck. You might find this Cell Cheat Sheet useful.
Yellow dog has an package manager called 'yum' (Yellow Dog Updater, Modified).
After the install do a 'yum update'.
(Note, if you are behind a proxy you will need to set the http_proxy variable. For other networking issues 'ifconfig' up/down and 'dhclient' are your friends.)
export http_proxy = "http://username:password@proxy.host.com:port"
Now do a search in yum for any SPU/PPU packages, ie: 'yum search spu' and install the relevant ones. (I can't recall which ones, probably libspe2,spu-binutils,spu-gcc,spu-newlib,ppu-binutils). At the end of this you should have spu-gcc, ppu-embedspu, ppu-ar and the usual suspects (gcc/g++).
You have these different compilers because the SPE and the PPE are completely different processors, it's like having two different computers in one box. So the spu-gcc compiles code only for the SPE, and the 'normal' gcc compiles code for the Power PC.
Obviously the first thing to try is 'hello world' for the PPE, but after that a little SPU/PPU program is what to try. You should be able to find some code on the web, (try the GATECH STI website, or Jeremy's SPE hello world examples) so then you just need to build it:
spu-gcc spu-program.cpp -o spu-program ppu-embedspu -m32 symbol_name binary_name output-embedded.o ppu-ar -qcs spulib.a output-embedded.o g++ ppu-program.cpp -lspe spulib.a -o output
Or, as a concrete example:
spu-gcc spumain.cpp -o spumain ppu-embedspu -m32 hello_spu spumain hello_spu-embed32.o ppu-ar -qcs hello_spu.a hello_spu-embed32.o g++ ppu-program.cpp -lspe hello_spu.a -o helloworld
So in Step 1, we are compiling the SPE program, in Step 2 we are embedding it into an object file by providing the symbol name (eg: extern spe_program_handle_t hello_spu), the binary name (the result from spu-gcc), and the final object file.
In Step 3 we are creating a library from the object, and in Step 4 we are linking it all together.
The entire 'embedding' thing can be a bit confusing, but Alex Chow provides a great overview.
(Note: If you are working with windows, putty and winscp will make your life a lot easier. If your new to linux, try the 'nano' editor (or 'pico'), as opposed to the more powerful, and difficult to master vim. You can make something executable with 'chmod +x filename'. If you stuff up the console, typing 'reset' will save the day)
Good luck. You might find this Cell Cheat Sheet useful.
Wednesday, July 01, 2009
Catchup post
The problem with blogging is that you need to find time to keep it moving even when you are very busy. Then, when you get the time you forget everything you wanted to post.
So just a quick catch-up post:
First of all, I've noticed a number of people trying to use CUDA for,.. silly things.
This includes attempting to accelerate databases with CUDA and accelerating downloading with CUDA. I'm somewhat reminded of the Intel marketing campaign that MMX speeds up communications. Someone somewhere needs to explain that your not limited by CPU computations when your downloading things...
Anyway, some more news&articles&programs:
the netflix prize probably has a winner.
Jeff Moser covers the process of a http connection.
I always find this program useful, I've posted on it before:Dependency walker
Parallels is a deeply integrated VM, which has some impressive demonstrations on their website, and I'm hoping will be very handy for my new mac.
I've finished working on the 18m grapples, they invited the Engineers Australia - Women in Engineering to Transmin during the open day, which was great.
Back to working on PAL, ModBench, the PS3 with Jeremy, and a explicit CUDA groundwater simulation with NTEC.
So just a quick catch-up post:
First of all, I've noticed a number of people trying to use CUDA for,.. silly things.
This includes attempting to accelerate databases with CUDA and accelerating downloading with CUDA. I'm somewhat reminded of the Intel marketing campaign that MMX speeds up communications. Someone somewhere needs to explain that your not limited by CPU computations when your downloading things...
Anyway, some more news&articles&programs:
I've finished working on the 18m grapples, they invited the Engineers Australia - Women in Engineering to Transmin during the open day, which was great.
Back to working on PAL, ModBench, the PS3 with Jeremy, and a explicit CUDA groundwater simulation with NTEC.