Thursday, December 31, 2009

Physics Engines Roundup '09

Physics engines are starting to conslidate into a few key solid products. When I first began the Physics Abstraction Layer (PAL) project there were only 4 freely available physics engines: Dynamo, ODE, Tokamak, and Newton (AERO and Dynamechs were also available but weren't really stand alone engines, I guess the same could be said about Dynamo). What followed was an explosion of physics engines. This years big trend has been parallelization (usually in the form of GPU support) and porting to Mac OS X.

PAL had some big changes in 2009 as well, including the new CMake build system, generic bodies, soft bodies and more.

Here is a run-down on 2009's progress in the freely available physics engines:
  • Bullet Physics: Bullet went official in a big way, a number of movies, games, DCC packages, etc using and supporting the engine. Some big additions were on the parallelization front, with the CUDA/GPU code maturing, and the support from AMD to add OpenCL to Bullet. The engine seems to be stabilizing and appears to have clearly taken the Open Source physics engine crown.

  • Chrono Engine: Added support for CUDA/GPU, Linux, and Matlab (vaugely).
  • Havok: No major changes (faster raytracing), mostly bugfix updates. Public release of the behaviour tool.

  • Newton Game Dynamics: Newton 2.xx was made public in 2009, with a whole raft of big changes (too many to list fully here), parallelization, multiplatform (Linux, Mac OSX, iPhone), Pascal, and a slowly maturing GPU/CUDA/OpenCL port.

  • nVidia PhysX: Official WHQL drivers for PhysX GPU acceleration.

  • OpenTissue: Improved SPH support, and OS X port.
  • PhysSim: Was renamed Moby, and has progressed with more examples and improved solvers.
  • Simple Physics Engine: Improved parallelization and Linux, Mac OS X, and iPhone support.
  • True Axis: Launched an iPhone game with their physics engine.

And that about wraps it up.

Saturday, December 19, 2009

iRiver


Encoding video content for the iRiver is extermely difficult, it would be better if the manafacturer wouldn't claim to support a whole bunch of file formats that they don't.

The challenge is made more difficult if you are not using Windows. There are a few tools out there though, but all require Mplayer, and on a mac you will need Mplayer for OS X - which is rarely updated. You will need to get the old version which comes with mencoder and copy it out of the application package.

This is the command I used, which worked fine for the iRiver Χ20:
./mencoder pls_sunflower.avi -quiet -vf scale=320:240,expand=320:240 -oac mp3lame -lameopts abr=128 -ovc xvid -xvidencopts bitrate=450 -of avi -af resample=44100 -srate 44100 -ofps 10 -o test3.avi

Unfortunately that didn't work too well on the E10. But here are some links to help with that:

Friday, December 18, 2009

MAGIC 2010 - Briefing


I attended the MAGIC 2010 briefing for our team at the DSTO offices in Sydney. It was great to meet the rest of team MAGICian/WAMbot that I hadn't met yet, and some of the other Australian teams:
  • Strategic Engineering, a very large group of students from a number of different Sydney universities were there in full force. Seems like the team with the least experience, but with the most enthusiasm. A bit of a wildcard entry but they have the funding they need and certainly have plenty of man power and red t-shirts.
  • NUMINENCE, a team based from LaTrobe, with a strong commercial focus. The industry drive in this team could make it a more interesting entry compared to the other more academically focused groups, hopefully the lack of funding won't be a strong concern.
  • University of New South Wales, a strong academic contender, they have a small focused hard-wroking team. As an unfunded entry they might feel to be under dogs, but I'm confident that they will do well.

Our own team has grown in size (again), with our South Australian counterparts picking up some help from Lockheed Martin, and gaining some sponsorship from Dlink.

It was a very friendly group and it was great to see some of the same people again as from the Adelaide meeting. We drilled the organizers with questions for a couple of hours and kept the other teams waiting (sorry!), but now we have a clear(er) idea of what needs to be done and are looking forward to an exciting and challenging demonstration in June.

I wonder how the briefing in the US went, it would have been great to be there.

Monday, December 14, 2009

Gazebo with Ubuntu



Gazebo is an open source 3D robot simulator with compatibility with the Player/Stage project. (Another great simulator is OpenRAVE, and of course our very own set of robot simulators, EyeSim, SubSim and AutoSim, and USARsim).

For an introduction to Gazebo these two presentations give a good start: a player tutorial, and a gazebo tutorial.

Getting Gazebo running is a bit complex, but this is what I did: (Using Ubuntu)
  1. Download gazebo 0.8pre3
  2. apt-get install the following:
    libois-dev
    libode0-dev
    libogre-dev
    libfltk1.1-dev
    libxml2
    libxml2-dev
    scons
    libboost
    libboost-signals-dev


  3. export PATH=/usr/local/bin:$PATH
    export CPATH=/usr/local/include:$CPATH
    export LIBRARY_PATH=/usr/local/lib:$LIBRARY_PATH
    export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:$PKG_CONFIG_PATH
    export LD_LIBRARY_PATH = /usr/local/lib:$LD_LIBRARY_PATH
    

  4. Now type:
    sudo scons install
I had to do it about three times before it worked, if you get any error messages, just google them. I tried 0.9 first with no luck. (Some issue with free image). Brock Woolf modified these instructions to get Gazebo working with Ubuntu 9.10 Jaunty

Monday, December 07, 2009

Larabee and more

I predicted in a previous post that Larabee would be another Intel flop:
Having a look at Larabee, it seems immediately clear to me that it's just another Itanium (The wonderchip that wasn't).
Unfortunately, I was wrong.
The rumor started being floated a month or two ago that Larabee would be canceled, and now it has been confirmed: No more Larabee.. Why? Too hard to program, apparently. So instead of launching a doomed product, they canceled it. The news was hidden behind the news of a 48 core parallel processor being announced. Coincidence? Seems like the Larabee concept will live on, 100 copies of the former-GPU will still be produced for research purposes. It seems now that the Cell/Larabee way of dealing with graphics rendering problems is dead again, and nVidia should be in the clear for a while. This does raise some interesting questions about what the future of AMD/ATI's fusion will be..

Some other bits of news, GameDev has a great article on 2D 'spring' physics, and covers some intermediate concepts like the SAT. John Ratcliff finally made a Convex decomposition library from his excellent Convex decomposition code - something I've been meaning to do for about two years now.

Blackspawn made an interesting post about using SVG for debugging, his website has a number of gem's, including cellular textures and circular harmonics.
HPMC released a GPU-based isosurface renderer, and I'll end this post with some videos of the work:


Wednesday, November 25, 2009

Catchup post

It's been a while since I updated, reason being it is exams/assignments marking period, and I had two GPU industry projects due (3D groundwater flow fluid simulation and a pathfinding/travelling salesman with profits project). 

The biggest news item was that the ACM decided to start (over)enforcing its rules saying that you can not link to preprint and author pages. Thankfully, it started a call-to-arms and prominent pages like Ke-Sen Huang's SIGGRAPH links have been restored. I wonder how many less public pages have silently slipped away. Frankly, I can't wait until the concept of conferences and journals disappear. My websites have always had far more impact that my publications, and it can't be long until the same can be said universally.

A short update with some interesting things in the last while:


   

Monday, November 16, 2009

Learn CUDA - Perth, Western Australia

iVEC IGUP cordially invites you to a CUDA GPU tutorial tomorrow afternoon.


The iVEC Industry and Government Uptake program with Adrian Boeing from ECU will be hosting an introductory tutorial on CUDA GPU programming with a focus on graph algorithms and search trees.

17 November 2.30 - 4.30 pm

Edith Cowan University School Of Computer And Security Science

13.225 - Games & Simulation Lab

2 Bradford St

Mount Lawley WA 6050

The tutorial is free but places are limited.

Thursday, November 05, 2009

MAGIC 2010 - Team MAGICian - ECU, UWA, Flinders, Thales


Fantastic news, the team I put together for the DSTO MAGIC 2010 competition has been pre-selected into the top 10 teams, and won $50,000 USD in seed funding! Now we just need to make it past phase 2, and then take away the million dollar prize!

Our team consists of two parts, Edith Cowan University, School of Computer and Security Science, and the University of Western Australia, School of Electrical, Electronic and Computer Engineering which make up the WA team that I am coordinating, and Flinders University in Adelaide that David is coordinating.

The competition is quite tough, were up against a number of veterans from the DARPA grand challenge. This was a series of events where robot cars raced through the Mojave Desert, and then navigated through an urban environment. There is a pretty good PBS/NOVA documentary on the original challenge The Great Robot Race. I was involved on the sidelines with the Urban challenge with the TUM/UniBW/Karlsruhe AnnieWAY entry.

Some of the key competitors include Carnegie Mellon University (Who won the DARPA urban challenge, and came second in the DARPA grand challenge), Virgina Tech (3rd in urban), University of Pennsylvania and Cornell who tied at 5th. You can read about the whole list of competitors on a Dr Dobb's Journal article covering the MAGIC 2010 selection event. Naturally, these teams will be fooled into a sense of complacency and they won't stand a chance against our home-grown team.

We are building 7 robots in WA, and approx 10 robots in SA. You can see the WA robot in the picture with this post, it is based on the Pioneer All-Terrain robotic platform. The WA robots will have DGPS, inertial measurement systems (gyroscopes & accelerometers) for relative positioning, various laser range finders for mapping and collision deteciton, stereo cameras for distance measurement and object recognition and PTZ cameras for object tracking. This should be a very exciting project!

Some of the key areas for software work include building the simulator, developing the computer vision algorithms for object recognition and tracking (including tracking people), the LIDAR data processing, sensor fusion algorithms, SLAM algorithms, the multi-agent team coordination and planning, the data-communications, path planning, trajectory generation, influence mapping, the list goes on!

So if you have any interest in the project, please drop me a line! It's your chance to be famous, and to be part of this history-making event!

Thursday, October 29, 2009

Unity


One short quick post, Unity is now free. Of course, this isn't the full edition with all the tools you would need if you are developing anything bigger than a one-man game..

Still, the price is right.

If your the free as in speech kind, try the Blender game engine.

Wednesday, October 28, 2009

Boston Dynamics: PETMAN

Boston Dynamics, famous for Big Dog (check out the Big Dog youtube video), are now working on a Biped, PETMAN.

Its still early on in the project, but they seem to be making good progress on the mechanical side of things. The noisy motor might prove to be a bit of an issue.


I wonder how long until the japanese fighting-toy robots get to be using this kind of equipment. It's probably still quite some time, since robot prices don't tend to drop too much over the years. I'm predicting it will still be a very long wait till consumers see anything beyond toy bipedal robots.

Tuesday, October 27, 2009

Apple Mac OS X Utilities


Everyone needs tools to use their PC. I covered the essentials for a Windows PC previously. Some good (free) tools for your Mac:
  • StuffIt Expander, your OS X WinRAR/WinZip equivalent.
  • MacFUSE, this extension enables file systems in user space for the mac. Ever want to read NTFS?
  • NTFS 3g, the NTFS plug in for macFUSE, no more 'Items could not be moved because XXX cannot be modified'. Why OS X doesn't support NTFS is beyond me. I have been using the NTFS 3g plug in with no problems for about a year now.
  • Opera, while not a Mac-only utility, having the Opera web browser is essential. Especially when Safari acts up, or you want to use IRC, or bittorrent, or RSS, or email, or anything really. Its an all in one solution.
  • Parallels, lets you run your bootcamp Windows partition seamlessly inside the mac, and with graphics acceleration to boot. Nifty. While not free, I feel its worthwhile. The free alternative is virtualbox.
  • Nocturne, this lets you dim, and otherwise change your screen display. Very useful for late night computing.
  • Small Image, for quick image resizing.
  • Paintbrush for mac.
  • Perian and Flip4Mac WMVfor extending media file support by OS X.
Plus the usual assortment of VLC, Mplayer, Filezilla, Adobe Reader, etc. A great place to find nifty tools is I use this, for OS X. Tools that get a lot of recommendations, but I don't use much are iStat Pro, VMware, Little Snitch and Transmission. I still haven't figured out if MacPorts or Fink is better, so far, I've had poor experiences with both.

Tuesday, October 20, 2009

Timing square root on the GPU

Inspired by the post by Elan Ruskin (Valve) on x86 SQRT routines I thought I would visit this for my supercomputing platform of choice, the GPU. These kinds of low level trickery I left behind after finishing with RMM/Pixel-Juice some time around 2000, having decided that 3dNow! reciprocal square root routines were more than good enough..

Anyway, a brief overview of how we can do square roots:
  1. Calculate it with the FPU, (however that was implemented by the chip manafacturer).
  2. Calculate it from newton-raphson. This allows you to control the accuracy of the sqrt. (Or typicaly rsqrt) This comes in two flavours:
    These approaches typically approximate the inverse square root, so this means we need to:
  3. Calculate it from the inverse. This comes in two flavours:
    • Calculate the reciprical, then invert it (1/rsqrt(x)), this gives you correct results
    • Multiply it by the input value (x*rsqrt(x)), this gives you faulty results around 0, but saves you a costly divide.
      Note:
      1.0f / rsqrtf(0.0f) = 1.0f / infinity = 0.0f
      0.0f * rsqrtf(0.0f) = 0.0f * infinity = NaN
Elan's results indicated the x86 SSE units rsqrtss instruction was the fastest (no suprise - it is also a rough estimate), followed by SSE rsqrt with newton-raphson iteration for improvement, then Carmack’s Magic Number rsqrt, and finally the x86 FPU's sqrt. Note that many people don't actually get to the slowest point on Elan's scale, since they don't enable intrinsics when compiling, meaning that the C compiler will use the C library sqrt routine, and not the FPU.
I decided to test three routines for the GPU:
  • native sqrt
  • native rsqrt
  • Carmack's rsqrt
Now benchmarking and timing on the lowest level has always been somewhat of a black art (see Ryan Geiss's article on timing), but that is even more true on the GPU - you need to worry about block sizes, as well as the type of timer, etc.
I did my best at generating reliable results by testing block sizes from 2..256 and performing 2.5 million sqrt operations. Here are the results from my nVidia 9800GX2:
Method Total time Max. ticks per float Avg. ticks per float Std. Dev. Avg. Error
GPU intrinsic SQRT 1.285ms 5.99 3.99 0.00 0.00%
GPU intrinsic RSQRT * x 1.281ms 5.99 3.99 0.00 0.00%
Carmack RSQRT * x 2.759ms 6.28 4.26 0.01 0.09%
Total time is the total time measured by the CPU that the GPU took to launch the kernel and calculate the results. The clock ticks are meant to be more accurate measurements using the GPU's internal clock, but I find that to be dubious.
The conclusions to take from these results are simple: Carmack's inverse and other trickery isn't going to help, using the GPU RSQRT function as opposed to the inbuilt SQRT function saves you about a clock tick or two. (Probably because nVidias SQRT is implemented as 1/RSQRT, as opposed to X*RSQRT)
I'm happy to say, low level optimization tricks are still safely a thing of the past.
You can get the code for the CUDA benchmark here: GPU SQRT snippet.

Thursday, October 15, 2009

AMD OpenCL for GPU

Not one to be left behind by nVidias news, AMD/ATI have released the AMD OpenCL beta v4 which now supports OpenCL for AMD GPU's! Some highlights:

  • First beta release of ATI Stream SDK with OpenCL GPU support.
  • ATI Stream SDK v2.0 OpenCL is certified OpenCL 1.0 conformant by Khronos.
  • Microsoft Windows 7 and native Microsoft Windows® 64-bit support


Fabulous news! Now you can do OpenCL on OSX, Windows (32&64bit), for nVidia and ATI GPU's and AMD CPU's. It doesn't get any better than this, well, at least util next year when Intel enters the fray.

It's not all good news though, it seems some of AMD's GPUs don't support double precision:

Still, it's better than nVidias lot, and I'm happy to see AMD finally making a serious effort in this space. (Not that the previous efforts weren't impresive, just not so focused...)

nVidia has also released the new version of CG, v 2.2. I wonder how much OpenCL will replace the use of Cg..

Wednesday, October 14, 2009

Google Building Maker

Google has just released their building maker, it looks like they finally found a use for the videotrace technology they acquired from the University of Adelaide.

This should help putting content together quickly for simulators, etc.

Check it out:

Nearest Neighbor

A bunch of links on the nearest-neighbor problem, for higher dimensions:
http://www.mit.edu/~andoni/LSH/
http://www.cs.umd.edu/~mount/ANN/
http://www.cs.sunysb.edu/~algorith/implement/ranger/implement.shtml

And, for good measure a set of comparisons of optical flow algorithms:
http://vision.middlebury.edu/flow/eval/

I'll probably stick to the OpenCV defaults anyway, but its nice to know there are options. It would seem Bruhn et al. is the most accurate..

Monday, October 12, 2009

Talking Piano and AI

Daniel Wedge sent me this interesting link on a talking piano!


A great idea, I wonder if it had been done before..

In other news, the International Joint Conference on Artificial Intelligence archive has been made open to the public covering all the way from 1969 - 2007!

Thursday, October 08, 2009

nVidia: OpenCL , Nexus, Fermi

There's been a fair bit of news flowing out of nVidia, biggest first:
nVidia has released a GPU OpenCL implementation compatible with all devices that support CUDA (no surprise).
You can get the nVidia OpenCL from the nVidia OpenCL Developer Website.

Next in nVidia news, Nexus has been released. I haven't had a chance to try it, but apparently it allows you to debug GPU programs via Microsoft Visual Studio in the 'normal' way - this would certainly make GPU programming a little easier.

Finally, nVidia has released information on Fermi their next-generation architecture. Basically it seems to be more of the same (which is always good) without all the bad bits for GPGPU programming (even better!). The biggest changes are allowing multiple kernels to execute in parallel, and having decent double-precision support. This should really open up the scientific&engineering computing to GPGPU, and will probably do good things for getting accelerated raytracing happening. AnandTech has a good write-up on Fermi, although it looks like we will see Larabee before Fermi...

I had a chance to play with a 3xC1060 Tesla boards for the GPU fluid simulation project on a 64bit machine. This threw up a whole bunch of problems, since I was using MSVC express edition, which does not support 64bit (apparently.) Problem was solved by using the 32 bit CUDA wizard, redirecting the CUDA libraries to the 32bit versions (ie: c:\CUDA\lib, not C:\CUDA\lib64), and some other tweaks.

Catchup

I've been quite occupied with ECU (Exams and mid-semester marking), Transmin, GPU Pathfinding, and the DSTO MAGIC 2010 competition. The WA team I've been organizing has teamed up with Flinders university and got some significant assistance from Thales Australia. The submission has been made, will have to see if we are successful...

Thursday, September 10, 2009

GPU Collision Detection

There is a new paper from the pacific graphics conference hybrid parallel continuous collision detection (HPCCD). I've been wanting to write GPU collision detection system for a while, but was always held back on how to do this efficiently, and easily (ie: not much effort on my part). The hybrid approach proposed sounds great: do all the hard (not-very-easily-made-efficiently-parallel) on the CPU, and just use the GPU for some edge-edge and vertex-face primitive tests. This keeps the GPU doing what it does best, and the CPU doing what it does best.

The paper is available from the above link, and you can download thecollision detection API. UNC maintains some collision detection benchmark scenes.

Thursday, September 03, 2009

Shader X2


The entire Shader X2 book and source code has now been released and can be downloaded for free! Hot off the press, get it while you can! It is still a worthwhile resource, even if it is DX9-era (2004) technology.

If your in the process of looking for good graphics,games, and physics simulation related books, the Mr. Elusive website has a great list with some fantastic books, and also a few duds.

Monday, August 24, 2009

AI space

AI space has a nice set of introductory javascript applets to a number of AI learning techniques. It covers graph searching, Belief and Decision Networks, Decision Trees, Constraint satisfaction problems, Neural Networks and more. It is a more 'traditional' AI site.

I also came across this great SLAM visualization tool for Player. A valuable resource on this is the Robotic Mapping and Exploration book (Springer Tracts in Advanced Robotics).

High Performance Graphics Slides


The slides to the High Performance Graphics conference are now available. I'm going to start on a GPU based tree search project soon so "Fast Minimum Spanning Tree for Large Graphs on the GPU" certainly looks interesting. The slides for Understanding the Efficiency of Ray Traversal on GPUs are also available
and you can now get the CUDA source code from Timo Aila's website.

Sunday, August 23, 2009

Apple Hard Drive Update

I recently got a mac, and am just slowly starting to get used to it. One thing I noticed the other day was funny 'chirping' noises coming from my mac, and the hard drive would seem to write like crazy.. I thought it was just 'something wierd', but I wasn't the only one. Now Apple have released an update that is supposed to fix the issue. Just needed a reboot, let the firmware install, it shuts your computer down, power it back up and your up and running again. So far so good.

Saturday, August 22, 2009

Rigid Body Simulation & Deformables

The most brilliant idea I've seen in a long time for fast general rigid body simulation,
Statistical Simulation of Rigid Bodies
.

We begin by collecting statistical data regarding changes in linear and angular momentum for collisions of a given object. From this data we extract a statistical “signature” for the object, giving a compact representation of the object’s response to collision events. During object simulation, both the collision detection and the collision response calculations are replaced by simpler calculations based on the statistical signature.

I've always thought something like this could be done, now we know how. Genius.

In related rigid body news:
Anisotropic Friction for Deformable Surfaces and Solids, for the next time you need a friction model.

A Point-based Method for Animating Elastoplastic Solids is a paper that presents another method to combine fluid and solid simulation using some of the same concepts from Mullers SPH paper. The code is built up from Adaptively Sampled Particle Fluids source. No code for this method though (yet). Haven't had an indepth look, but seems like a nicer approach than I previously discussed.

Fluids & Volumes

Another catchup post:
Fluid Simulation with Articulated Bodies from gatech has some nice results, would could make a great basis for a new version of Karl Sims swimmers.
Sony has released Field3d, a open file format for volume data, which seems to take a lot of the low level work away from you when working with voxels, it supports slices,etc. as well.
Matthias Müller also has a new publication, Fast and Robust Tracking of Fluid Surfaces which describes how to efficiently generate surface meshes from fluid simulations.I've not read it fully yet, but presumably it is worthwhile.
I also stumbled accross some more open source fluid projects, Fire Dynamics Simulator and Smokeview from NIST which has some good looking fire & smoke simulations of houses burning, etc. and a new project nsproj, that has a parallel 3D Navier Stokes solver.

Thursday, August 20, 2009

Normal Storage


Aras Pranckevičius has recently done a very complete post on different methods for compact normal storage.

'obvious' approaches like storing X&Y and reconstructing Z, or using spherical coordinates don't perform very well, on the other hand Stereographic Projection and Sphere Mapping apparently perform very well. Just goes to show you that even having the right idea (spherical coordinates) won't necessarily work well unless you find the right encoding.

Intel Buys RapidMind


Intel has just bought RapidMind. RapidMind provided a parallel language that could tarket CPUs (AMD & Intel), GPU's (nVidia & ATI/AMD), the Cell processor (Sony, Toshiba, IBM), and various DSPs.

Now that it becomes an Intel owned product I wonder if all that wonderful support for other devices will just disappear? If they primarily want it to target their own Larrabee, then probably yes, but if they want to use it to establish a software platform for the next Playstation, or as their OpenCL interface, then it might have a good chance of good support for the Cell and perhaps still a few GPUs. I don't see why Intel would want to provide support to their competitors DSPs, and if Larrabee is meant to be a serious competitor to Cell and GPUs then this might end up being a very disappointing purchase..

Goodbye rapid mind.

Thursday, August 06, 2009

More on realtime raytracing


Furry Ball is a realtime GPU renderer for Maya, it certainly looks impressive. PICO the realtime graphics engine for maya has been making progress adding physics and a set of pico tutorials.

Also it is sounding like Caustic isn't just hype anymore, but will actually be used. Geeks3d has a post on the Caustic API.

In more graphics related news TNG viewer is a realtime opengl viewer for FBX, 3DS, OBJ, DXF, DAE, LWO and LWS scenes. Handy to have.

Wednesday, August 05, 2009

ATI Stream SDK includes OpenCL

The new ATI stream SDK includes a CPU implementation of OpenCL.

AMD released a demo of a 2d fluid simulation using 24 cores.


I have to say, I'm not very impressed. I'd be keen to see how this holds up to an OpenCL GPU implemetnation, it certainly seems like CUDA has a strong position for now. They do have a good OpenCL introductory tutorial though.

In other news, nVidia released OptiX a realtime raytracing "engine", whatever that is supposed to mean..

Tuesday, August 04, 2009

Game Jam - Top Game


I just received an email from Josh, apparently the Game Jam entry I helped make placed in the top 4 Games selected by PC Powerplay magazine. Excellent!
Check out Zeppelin Escape!

Thursday, July 30, 2009

GPGPU & LLVM

The LLVM group has just got a new logo, a modernized version of the 'dragon book' dragon. But the real exciting news I saw on GPGPU.org is a project 'GPUocelot'. This program will translate compiled PTX programs (produced by nVidias CUDA) via the just-in-time LLVM compiler to any targeted backend, meaning for example a PS3 CELL processor. All you need to do is add an AMD backend to LLVM, and hey-presto, instant CUDA-for-ATI. That could potentially put a dent into OpenCL's plans..

The papers from the High Performance Graphics conference are also out, one that caught my eye was Understanding the Efficiency of Ray Traversal on GPUs, not really because I thought the paper was fundamentally groundbreaking, but because it explained a few neat tricks on nVidias part, in particular, a good explanation of persistant threads in CUDA for breaking down non-uniform workloads from a global pool. (eg: They used it on a per-ray basis, so that fast and slow rays don't "block" each other.)

Virgin Airlines

Virgin airlines canceled my flight.
Needless to say I'm pretty annoyed.
They informed the passengers in the most elegant of ways, they changed the sign from "boarding" to "canceled".
Nice work.

Never flying virgin again, I think the extra $30 for Qantas is well worth it, I mean, you get a free meal too right?

Wednesday, July 22, 2009

Fracture Physics

Eric Parker from Pixelux and James O’Brien have just released their paper Real-Time Deformation and Fracture in a Game Environment. It's an interesting read, and reveals some of the details behind the impressive Pixelux engine. Basically they use tetrahedral finite elements and just make it look like it has more detail than the physics really has. Still, seems to work pretty well.

Of note from the demoscene world is h4vok by Archee which features fractures in 4k, nicely timed to music. Very impressive.

Groundwater Flow on GPU

I am presently developing a proof-of-concept groundwater flow GPU program. The algorithm works similar to a finite differences algorithm, or many kernel-based image processing systems.

Initially I just ported the code over to CUDA, and thanks to having done this a few times before the CPU version I wrote was easy to transfer. A good time saver is to use macros to index arrays, etc. This makes it easy to swap in __mul24, or tex2D etc. later.

The next step was to use shared memory to buffer the input data, this gave the biggest performance boost. Finally I did some arithmetic optimizations for another small gain.

On the advice of a friend I tried re-structuring the GPU program to use 'aprons' similar to the 'Separable Convolution' SDK sample, and tried restructuring read/writes. This all made almost no difference at all, so it seems that the 'sweet-spot' can be hit quite quickly as soon as you have done the obvious shared memory and arithmetic optimizations. The overall structure of the program seems to make little difference.

A common bit of advice is to leverage the texture units on the GPU's, but a simple modification of the 'Texture-based Separable Convolution' sample program, reveals it is infact almost twice as slow as the non-texture based. Seems like the texturing unit speedups are a bit of a myth.

Benchmarking the program has been a bit of a problem, since it is very dependent on the type of GPU you have, and the problem size. If I process a few hundred thousand nodes, the speedup is around 15x over the CPU, but when I move to processing tens of millions of nodes, the speedup is over 20x. Processing the same data on a slightly older GPU will only give a 4x speedup (Compute 1.0).

All in all I've found it extremely difficult to give an overall answer on the performance gain of the GPU. It seems to be highly dependent on the problem size (the bigger the better - thank god!) and the GPU technology (this is going to make porting the software to multiple GPUs a pain!).

MAGIC 2010


Do you remember the DARAP Grand Challenge? Well now the Australian DSTO is holding its own challenge, the Multi Autonomous Ground-robotic International Challenge (MAGIC 2010), with a total of $1.6 million US in prize money. The goal is to use a multi-robot team to perform "intelligence, surveillance and reconnaissance mission in a dynamic urban environment."

The MAGIC requires entrants to complete the following tasks:
(i) Accurately and completely explore and map the challenge area;
(ii) correctly locate, classify and recognise all simulated threats; and
(iii) complete all phases within 3.5 hours.

The final event is scheduled to take place during the week of November 8, 2010, somewhere in South Australia.

I'm putting together a team from WA, if you are interested let me know.
The participants conference is in Adelaide next week. See you there!

Friday, July 10, 2009

15 ton Wiimote robot

Apparently the Transmin software team are famous now. Simon Wittber and Dan Adams modified our grapple control system to work with the Wiimote.

There is a video out on youtube:


I haven't seen it on the usual sites I watch, but I guess I'm reading the wrong stuff..

Tuesday, July 07, 2009

Simple bootloader

Writing your own OS is something every computing person should do at some point. The first step is of course writing a boot loader. This is actually very easy to do under linux.

With a fresh install of ubuntu, make sure you have GCC and related goodies, the other things you will probably want are nasm, and virtual machine like QEMU. Installing these in ubuntu is as simple as:

sudo apt-get install nasm
sudo apt-get install qemu

(you may need to modify your /etc/apt/sources.list - any core ubuntu mirror will do, I used au.archive.ubuntu.com/ubuntu)

Now you probably want to try qemu out before going further, so grab freedos:
http://www.freedos.org/freedos/files/ and grab an iso, I got fdbasecd.iso

Now we create a virtual hard drive to install freedos on to: (do this in the same dir as the iso)

qemu-img create -f raw freedos.img 100M


And then we just boot up qemu:

qemu -localtime freedos.img -cdrom fdbasecd.iso -boot d

Freedos install is a bit obtuse, you need to format the drive to FAT16, exit the formatter, then again to FAT32, but I just pretty much just went with the defaults for everything, afterall, this is just for testing. At then end, you should have a working DOS prompt.

Now that we have QEMU working and we know it, let's try our own boot loader:

[BITS 16] ; 16 bit code generation
[ORG 0x7C00] ; ORGin location is 7C00

;Main program
main: ; Main program label

mov ah,0x0E ; This number is the number of the function in the BIOS to run.
; This function is put character on screen function
mov bh,0x00 ; Page number (I'm not 100% sure of this myself but it is best
; to leave it as zero for most of the work we will be doing)
mov bl,0x07 ; Text attribute (Controls the background and foreground colour
; and possibly some other options)
; 07 = White text, black background.
; (Feel free to play with this value as it shouldn't harm
; anything)
mov al,65 ; This should (in theory) put a ASCII value into al to be
; displayed. (This is not the normal way to do this)
int 0x10 ; Call the BIOS video interrupt.

jmp $ ; Put it into a coninuous loop to stop it running off into
; the memory running any junk it may find there.

; End matter
times 510-($-$$) db 0 ; Fill the rest of the sector with zeros
dw 0xAA55 ; Boot signature


Save the code to loader.asm, and assemble it with:

nasm loader.asm


Then, we can run our wonderful loader with:

qemu loader

Monday, July 06, 2009

Flying Winged Robots

There is just something inherently cool about miniature robots with wing's.

Take a look at these videos, the first from Shimoyama-Matsumoto Laboratory, University of Tokyo, the second from AeroVironment. DARPA is continuing funding for its Nano Air Vehicle Development Program, so I look forward to having miniature flying insect-like robots following my every move in the future.





Festo's AquaPenguins, and AirPenguins are perhaps the most impressive, although I'm pretty sure I'd see them coming. Bioinspired robots seem to be doing so well for these kinds of goals. Although these videos do make me wonder sometimes if these things haven't been photoshopped.

Friday, July 03, 2009

Paris Game AI


I stumbled across the Paris Game AI conference, there was some interesting stuff.

The most interesting was Mikko's open source automatic navmesh generator. The demo makes the technology look useable enough.

If your not convinced by the navmesh approach yet, the AI blog has an excellent discussion of thebenefits of a navmesh over waypoints. Including some amusing failed videogame pathfinding algorithms.

The most interesting presentations were Coordinating Agents with Behavior Trees for Farcry from Crytech. It's a different approach to behavior management than the standard emergent behavior approach using a blackboard. I'm not convinced its really novel, all they have done is named the approach taken by your standard undergrad robotics students, but they make a good argument for it. The big advantage is that you have far more control over how the behaviors are activated, which is something you really need in games if you have a demanding director.

Another great presentation is Killzone 2 Multiplayer Bots. Not because there is anything amazingly new in it, but just because they cover their entire process, squads, 'strategic reasoning' by annotating special AI nodes with certain behavior, the waypoint network, and influence maps.

While I am on the topic of game AI, the botprize will be having it's first round playoffs at ECU in Perth next week, should be fun!

Batch files and SVN


Batch files are useful for writing small automated scripts in windows.

You just type any command you like into a text-file with the extension ".bat" and it will do the magic for you.

Some extra commands/parameters will help:
%1,%2,%3 ... will let you pass additional paramters
echo will print something to the screen
@echo off will stop the bat file from printing everything to the screen
REM will make a line into a comment
> will let you pipe the output between commands, just like on *nix.
set /p variable= [string] will let you prompt for input
exist will let you test if a file exists

You also have programming constructs including 'if' and 'goto'. (For an example see this batch file sorting routine)

That's pretty much all you could ever need.

So a simple example, named test.bat:

@echo off
echo hello %1

If I call this with "test Adrian" it will print "hello Adrian" to the screen.


@echo off
set /p name= What is your name?
echo hello %name%

This does the same, except prompts me to enter my name.

There are more than just the %1..%n variable for commands, there is also %0 to tell you the name of the bat file. Even better, you can extract the path with %dp0
eg:

@echo off
echo %~dp0


So now you can invoke Tortoise SVN on the command line to download and install your favourite software.

For example:

tortoiseproc /command:checkout /path:%~dp0 pal /url:https://pal.svn.sourceforge.net/svnroot/pal/pal /closeonend:1
tortoiseproc /command:checkout /path:%~dp0 bullet /url:http://bullet.googlecode.com/svn/trunk /closeonend:1


Even better, we can extend this with exists to see if the code is already there, and do an update instead:

if not exist pal\NUL goto nopal
tortoiseproc /command:update /path:%~dp0pal /closeonend:1
goto NEXT
:nopal
tortoiseproc /command:checkout /path:%~dp0 pal /url:https://pal.svn.sourceforge.net/svnroot/pal/pal /closeonend:1


Enjoy!

Thursday, July 02, 2009

Getting started on the PS3

It's been a while since I set up my PS3 with Linux, but I remember it being a lengthy process. Installing Yellow Dog Linux on the PS3 is relatively straight forward, but then getting the compiler tool chain up and running took a while, mostly in just figuring out what to do. If you only have RCA out on the PS3 you need to install Linux in text-mode, which I remember being a bit dramatic (all the instructions assume you have the GUI environment).

Yellow dog has an package manager called 'yum' (Yellow Dog Updater, Modified).

After the install do a 'yum update'.
(Note, if you are behind a proxy you will need to set the http_proxy variable. For other networking issues 'ifconfig' up/down and 'dhclient' are your friends.)
export http_proxy = "http://username:password@proxy.host.com:port"


Now do a search in yum for any SPU/PPU packages, ie: 'yum search spu' and install the relevant ones. (I can't recall which ones, probably libspe2,spu-binutils,spu-gcc,spu-newlib,ppu-binutils). At the end of this you should have spu-gcc, ppu-embedspu, ppu-ar and the usual suspects (gcc/g++).

You have these different compilers because the SPE and the PPE are completely different processors, it's like having two different computers in one box. So the spu-gcc compiles code only for the SPE, and the 'normal' gcc compiles code for the Power PC.

Obviously the first thing to try is 'hello world' for the PPE, but after that a little SPU/PPU program is what to try. You should be able to find some code on the web, (try the GATECH STI website, or Jeremy's SPE hello world examples) so then you just need to build it:
spu-gcc spu-program.cpp -o spu-program
ppu-embedspu -m32 symbol_name binary_name output-embedded.o
ppu-ar -qcs spulib.a output-embedded.o
g++ ppu-program.cpp -lspe spulib.a -o output

Or, as a concrete example:
spu-gcc spumain.cpp -o spumain
ppu-embedspu -m32 hello_spu spumain hello_spu-embed32.o
ppu-ar -qcs hello_spu.a hello_spu-embed32.o
g++ ppu-program.cpp -lspe hello_spu.a -o helloworld

So in Step 1, we are compiling the SPE program, in Step 2 we are embedding it into an object file by providing the symbol name (eg: extern spe_program_handle_t hello_spu), the binary name (the result from spu-gcc), and the final object file.
In Step 3 we are creating a library from the object, and in Step 4 we are linking it all together.

The entire 'embedding' thing can be a bit confusing, but Alex Chow provides a great overview.

(Note: If you are working with windows, putty and winscp will make your life a lot easier. If your new to linux, try the 'nano' editor (or 'pico'), as opposed to the more powerful, and difficult to master vim. You can make something executable with 'chmod +x filename'. If you stuff up the console, typing 'reset' will save the day)

Good luck. You might find this Cell Cheat Sheet useful.

Wednesday, July 01, 2009

Catchup post

The problem with blogging is that you need to find time to keep it moving even when you are very busy. Then, when you get the time you forget everything you wanted to post.

So just a quick catch-up post:

First of all, I've noticed a number of people trying to use CUDA for,.. silly things.
This includes attempting to accelerate databases with CUDA and accelerating downloading with CUDA. I'm somewhat reminded of the Intel marketing campaign that MMX speeds up communications. Someone somewhere needs to explain that your not limited by CPU computations when your downloading things...

Anyway, some more news&articles&programs:
  • the netflix prize probably has a winner.
  • Jeff Moser covers the process of a http connection.
  • I always find this program useful, I've posted on it before:Dependency walker
  • Parallels is a deeply integrated VM, which has some impressive demonstrations on their website, and I'm hoping will be very handy for my new mac.

    I've finished working on the 18m grapples, they invited the Engineers Australia - Women in Engineering to Transmin during the open day, which was great.

    Back to working on PAL, ModBench, the PS3 with Jeremy, and a explicit CUDA groundwater simulation with NTEC.
  • Friday, June 12, 2009

    Python Setup Tools 2.6


    Setting up "easy-setup" (haha, not so easy!) for python 2.6 under windows is a bit of a convoluted process.

    These are the steps:
    1. Install Python.
    2. Download the python egg file
    3. Download ez_setup.py (Takes a while till you find a working one!)

    You may wish to place it into your python 'scripts' folder first.

    Then run the python program "ez_setup setuptools-0.6c9-py2.6.egg".

    Now you can use any setup script that relies on easy setup.

    IT Salary Levels in Perth

    I was having a look at some IT salary survey data for Australia.

    It has always annoyed me that "less difficult" tasks were valued more than more "difficult tasks". For example, a C++ programmer in Perth contracts from $40-$75, a VB developer gets the same, and a .Net developer gets more ($80), with Java winning out at $90. Even web developers get more, starting at $55.

    This has always annoyed me about Perth. However, take a look at a different city, and you will see a different story. Take our nearest neighbor, Adelaide.


    .NET $40-$75, Java to $65, VB to $65, Web developers to $65, With C++ topping it out at $85. Seems a lot more reasonable to me.

    In Adelaide, help desk staff start at $37k and get paid $58k, tops, unlike Perth, where desktop support staff start at $45k and top out at $65k. This is the same starting salary for a C++ developer in Perth. This certainly explains why people in Perth don't bother to stick around if they get a good education. If you can earn as much on help desk straight from school as you can with a university degree as a programmer, why bother?

    What is wrong with Perth?

    Simulating Sensors

    For many robotics tasks getting good readings from the sensors to estimate your current state is very important. The system may come to rely on the sensors so much that any hardware failure for the sensor would cause devastating consequences.

    A few simple noise simulations placed into your sensor readings can help you assert that your system can handle failed sensors. But what to simulate?

    There are a number of academic papers available that discuss sensor noise models, but they are typically highly detailed and specific to one sensor type.

    Here is a list of common faults that you should simulate for testing analog sensors:
  • Completely on/off, this will only happen if there is a problem with the power supply, you will either get 0 or full positive, this helps to detect shorts, etc and other common failures.
    In code: sensor = 0, or sensor = high;
  • An additional DC offset, this will simulate a slowly drifting ground, you may also wish to simulate this with a slow rise/drop over a long time.
    In code : sensor+=dc_offset;
  • Inversion fault, if someone installs a sensor the wrong way round, or wires something up the wrong way.
    In code : sensor = 1-sensor;
  • Random noise, white noise is not really a very common input from the real world, but its a cheap test routine. Adding it to your signal can help determine that your filters have an appropriate noise cut-off level.
    In code : sensor = rand(); or sensor+= signed_rand();
  • Sinewave, you will find sinusoidal interfearence coming from a number of sources, such as interfearance from a nearby radio unit, micro controller, or PWM. Often higher frequency signals will be aliased and presented as a low frequency sine wave.
    In code: sensor = sensor + sin
  • Spikes/Ramps, these will sometimes occur due to the way an analog signal is read or converted to a digital form, or some rotary encoders will droop down. Poorly designed hardware filters can also be the cause.
    In code: sensor = sensor + spike, or sensor = sensor + ramp

    The last one is a little trickier to implement, but some nice common curves can help. First, the smoothstep function:

    float smoothstep(float a, float b, float x) {
    if (x if (x>=b) return 1;
    x = (x-a)/(b-a);
    return x*x * (3 - 2*x);
    }

    (I originally found this function in Texturing & Modeling: A Procedural Approach)


    The spike function:

    double spike(double t) {
    return 1/(1+150*(t-0.5)*(t-0.5));
    }

    you may wish to play with the 150x factor to make it more 'spiky'.
  • Saturday, May 30, 2009

    Link post

    Some assorted interesting sounding tech:

    Thrust is a STL-like tech for CUDA programming. Could be interesting, certainly looks like it is easier to use than CUDA, but just from my feelings it probably won't be worth anything until nVidia let multiple CUDA functions execute in parallel..

    5 optimization tips for CUDA, a nice succinct roundup of some good performance tips for CUDA, including Arithmetic Instruction optimization.

    DANCE framework for Dynamic Animation and Control, definately something I want to check out for potential synergy with the Physics engine Abstraction Layer. Proper animation controllers is something PAL lacks.

    ReactPhysics3d a new open source physics engine, very much in the early stages of development, I've seen a number of engines grow in time, I hope this one succeeds too, but it will need to find a niche if it is to survive..

    Predictive-Corrective Incompressible SPH is a paper on estimating the SPH state without explicitly evaluating it, thereby saving some CPU cycles. I actually had a similar idea, so maybe this is a path for further research. Also on the SPH front, Co-rotated SPH for Deformable Solids, the idea is great, not convinced by their particular implementation though..

    Perhaps these ideas might make it into the SPH routines for PAL..

    and on the lighter side;

    2D boy's prototyping framework for the world of goo.