Thursday, October 29, 2009


One short quick post, Unity is now free. Of course, this isn't the full edition with all the tools you would need if you are developing anything bigger than a one-man game..

Still, the price is right.

If your the free as in speech kind, try the Blender game engine.

Wednesday, October 28, 2009

Boston Dynamics: PETMAN

Boston Dynamics, famous for Big Dog (check out the Big Dog youtube video), are now working on a Biped, PETMAN.

Its still early on in the project, but they seem to be making good progress on the mechanical side of things. The noisy motor might prove to be a bit of an issue.

I wonder how long until the japanese fighting-toy robots get to be using this kind of equipment. It's probably still quite some time, since robot prices don't tend to drop too much over the years. I'm predicting it will still be a very long wait till consumers see anything beyond toy bipedal robots.

Tuesday, October 27, 2009

Apple Mac OS X Utilities

Everyone needs tools to use their PC. I covered the essentials for a Windows PC previously. Some good (free) tools for your Mac:
  • StuffIt Expander, your OS X WinRAR/WinZip equivalent.
  • MacFUSE, this extension enables file systems in user space for the mac. Ever want to read NTFS?
  • NTFS 3g, the NTFS plug in for macFUSE, no more 'Items could not be moved because XXX cannot be modified'. Why OS X doesn't support NTFS is beyond me. I have been using the NTFS 3g plug in with no problems for about a year now.
  • Opera, while not a Mac-only utility, having the Opera web browser is essential. Especially when Safari acts up, or you want to use IRC, or bittorrent, or RSS, or email, or anything really. Its an all in one solution.
  • Parallels, lets you run your bootcamp Windows partition seamlessly inside the mac, and with graphics acceleration to boot. Nifty. While not free, I feel its worthwhile. The free alternative is virtualbox.
  • Nocturne, this lets you dim, and otherwise change your screen display. Very useful for late night computing.
  • Small Image, for quick image resizing.
  • Paintbrush for mac.
  • Perian and Flip4Mac WMVfor extending media file support by OS X.
Plus the usual assortment of VLC, Mplayer, Filezilla, Adobe Reader, etc. A great place to find nifty tools is I use this, for OS X. Tools that get a lot of recommendations, but I don't use much are iStat Pro, VMware, Little Snitch and Transmission. I still haven't figured out if MacPorts or Fink is better, so far, I've had poor experiences with both.

Tuesday, October 20, 2009

Timing square root on the GPU

Inspired by the post by Elan Ruskin (Valve) on x86 SQRT routines I thought I would visit this for my supercomputing platform of choice, the GPU. These kinds of low level trickery I left behind after finishing with RMM/Pixel-Juice some time around 2000, having decided that 3dNow! reciprocal square root routines were more than good enough..

Anyway, a brief overview of how we can do square roots:
  1. Calculate it with the FPU, (however that was implemented by the chip manafacturer).
  2. Calculate it from newton-raphson. This allows you to control the accuracy of the sqrt. (Or typicaly rsqrt) This comes in two flavours:
    These approaches typically approximate the inverse square root, so this means we need to:
  3. Calculate it from the inverse. This comes in two flavours:
    • Calculate the reciprical, then invert it (1/rsqrt(x)), this gives you correct results
    • Multiply it by the input value (x*rsqrt(x)), this gives you faulty results around 0, but saves you a costly divide.
      1.0f / rsqrtf(0.0f) = 1.0f / infinity = 0.0f
      0.0f * rsqrtf(0.0f) = 0.0f * infinity = NaN
Elan's results indicated the x86 SSE units rsqrtss instruction was the fastest (no suprise - it is also a rough estimate), followed by SSE rsqrt with newton-raphson iteration for improvement, then Carmack’s Magic Number rsqrt, and finally the x86 FPU's sqrt. Note that many people don't actually get to the slowest point on Elan's scale, since they don't enable intrinsics when compiling, meaning that the C compiler will use the C library sqrt routine, and not the FPU.
I decided to test three routines for the GPU:
  • native sqrt
  • native rsqrt
  • Carmack's rsqrt
Now benchmarking and timing on the lowest level has always been somewhat of a black art (see Ryan Geiss's article on timing), but that is even more true on the GPU - you need to worry about block sizes, as well as the type of timer, etc.
I did my best at generating reliable results by testing block sizes from 2..256 and performing 2.5 million sqrt operations. Here are the results from my nVidia 9800GX2:
Method Total time Max. ticks per float Avg. ticks per float Std. Dev. Avg. Error
GPU intrinsic SQRT 1.285ms 5.99 3.99 0.00 0.00%
GPU intrinsic RSQRT * x 1.281ms 5.99 3.99 0.00 0.00%
Carmack RSQRT * x 2.759ms 6.28 4.26 0.01 0.09%
Total time is the total time measured by the CPU that the GPU took to launch the kernel and calculate the results. The clock ticks are meant to be more accurate measurements using the GPU's internal clock, but I find that to be dubious.
The conclusions to take from these results are simple: Carmack's inverse and other trickery isn't going to help, using the GPU RSQRT function as opposed to the inbuilt SQRT function saves you about a clock tick or two. (Probably because nVidias SQRT is implemented as 1/RSQRT, as opposed to X*RSQRT)
I'm happy to say, low level optimization tricks are still safely a thing of the past.
You can get the code for the CUDA benchmark here: GPU SQRT snippet.

Thursday, October 15, 2009

AMD OpenCL for GPU

Not one to be left behind by nVidias news, AMD/ATI have released the AMD OpenCL beta v4 which now supports OpenCL for AMD GPU's! Some highlights:

  • First beta release of ATI Stream SDK with OpenCL GPU support.
  • ATI Stream SDK v2.0 OpenCL is certified OpenCL 1.0 conformant by Khronos.
  • Microsoft Windows 7 and native Microsoft Windows® 64-bit support

Fabulous news! Now you can do OpenCL on OSX, Windows (32&64bit), for nVidia and ATI GPU's and AMD CPU's. It doesn't get any better than this, well, at least util next year when Intel enters the fray.

It's not all good news though, it seems some of AMD's GPUs don't support double precision:

Still, it's better than nVidias lot, and I'm happy to see AMD finally making a serious effort in this space. (Not that the previous efforts weren't impresive, just not so focused...)

nVidia has also released the new version of CG, v 2.2. I wonder how much OpenCL will replace the use of Cg..

Wednesday, October 14, 2009

Google Building Maker

Google has just released their building maker, it looks like they finally found a use for the videotrace technology they acquired from the University of Adelaide.

This should help putting content together quickly for simulators, etc.

Check it out:

Nearest Neighbor

A bunch of links on the nearest-neighbor problem, for higher dimensions:

And, for good measure a set of comparisons of optical flow algorithms:

I'll probably stick to the OpenCV defaults anyway, but its nice to know there are options. It would seem Bruhn et al. is the most accurate..

Monday, October 12, 2009

Talking Piano and AI

Daniel Wedge sent me this interesting link on a talking piano!

A great idea, I wonder if it had been done before..

In other news, the International Joint Conference on Artificial Intelligence archive has been made open to the public covering all the way from 1969 - 2007!

Thursday, October 08, 2009

nVidia: OpenCL , Nexus, Fermi

There's been a fair bit of news flowing out of nVidia, biggest first:
nVidia has released a GPU OpenCL implementation compatible with all devices that support CUDA (no surprise).
You can get the nVidia OpenCL from the nVidia OpenCL Developer Website.

Next in nVidia news, Nexus has been released. I haven't had a chance to try it, but apparently it allows you to debug GPU programs via Microsoft Visual Studio in the 'normal' way - this would certainly make GPU programming a little easier.

Finally, nVidia has released information on Fermi their next-generation architecture. Basically it seems to be more of the same (which is always good) without all the bad bits for GPGPU programming (even better!). The biggest changes are allowing multiple kernels to execute in parallel, and having decent double-precision support. This should really open up the scientific&engineering computing to GPGPU, and will probably do good things for getting accelerated raytracing happening. AnandTech has a good write-up on Fermi, although it looks like we will see Larabee before Fermi...

I had a chance to play with a 3xC1060 Tesla boards for the GPU fluid simulation project on a 64bit machine. This threw up a whole bunch of problems, since I was using MSVC express edition, which does not support 64bit (apparently.) Problem was solved by using the 32 bit CUDA wizard, redirecting the CUDA libraries to the 32bit versions (ie: c:\CUDA\lib, not C:\CUDA\lib64), and some other tweaks.


I've been quite occupied with ECU (Exams and mid-semester marking), Transmin, GPU Pathfinding, and the DSTO MAGIC 2010 competition. The WA team I've been organizing has teamed up with Flinders university and got some significant assistance from Thales Australia. The submission has been made, will have to see if we are successful...