Adrian Boeing: Blog

Wednesday, November 25, 2009

Catchup post

It's been a while since I updated, reason being it is exams/assignments marking period, and I had two GPU industry projects due (3D groundwater flow fluid simulation and a pathfinding/travelling salesman with profits project).

The biggest news item was that the ACM decided to start (over)enforcing its rules saying that you can not link to preprint and author pages. Thankfully, it started a call-to-arms and prominent pages like Ke-Sen Huang's SIGGRAPH links have been restored. I wonder how many less public pages have silently slipped away. Frankly, I can't wait until the concept of conferences and journals disappear. My websites have always had far more impact that my publications, and it can't be long until the same can be said universally.

A short update with some interesting things in the last while:

Obama has made a comment about robotics, it's great to hear the field is getting a lot more visible and more support, but I still feel we have a very long way to go to break the western cultures distrust of robotics.
Sony/Toshiba/IBM CELL is now officially dead.
There is a new GPU computing community website, just started up: http://gpucomputing.net/
nVidia's raytracing technology Optix is finally available, but only for next-gen GPU's. The "RealityServer" was also released, to great disappointment to many.
Tech report has a more detailed report on nVidias new Fermi based GPU's.
Caustic claims to be selling raytracing accelerators, but they haven't been able to sell me one yet. Smells like vapourware.
Chromium OS was released. Not sure who wants this? Fast starts haven't been an issue for me since laptop's could sleep. I guess they are aiming for very low prices.
Peteris Krumins posted a great blog post on the highlights of MIT's algorithms lectures
Wolfire did an interesting post on generating vertex weights.
Felix von Leitner put together a very interesting presentation on compiler optimization tricks. It's good proof that compilers are much better than humans for most optimizations, plenty of new things I learnt from this.
Finally, I saw a link to this paper:Aggregate Dynamics for Dense Crowd Simulation. I've not had a chance to read it yet, but the results look fantastic and the idea is novel. (Using fluid-simulation techniques for representing crowds.) Video below:

Monday, November 16, 2009

Learn CUDA - Perth, Western Australia

iVEC IGUP cordially invites you to a CUDA GPU tutorial tomorrow afternoon.

The iVEC Industry and Government Uptake program with Adrian Boeing from ECU will be hosting an introductory tutorial on CUDA GPU programming with a focus on graph algorithms and search trees.

17 November 2.30 - 4.30 pm

Edith Cowan University School Of Computer And Security Science

13.225 - Games & Simulation Lab

2 Bradford St

Mount Lawley WA 6050

The tutorial is free but places are limited.

Thursday, November 05, 2009

MAGIC 2010 - Team MAGICian - ECU, UWA, Flinders, Thales

Fantastic news, the team I put together for the DSTO MAGIC 2010 competition has been pre-selected into the top 10 teams, and won $50,000 USD in seed funding! Now we just need to make it past phase 2, and then take away the million dollar prize!

Our team consists of two parts, Edith Cowan University, School of Computer and Security Science, and the University of Western Australia, School of Electrical, Electronic and Computer Engineering which make up the WA team that I am coordinating, and Flinders University in Adelaide that David is coordinating.

The competition is quite tough, were up against a number of veterans from the DARPA grand challenge. This was a series of events where robot cars raced through the Mojave Desert, and then navigated through an urban environment. There is a pretty good PBS/NOVA documentary on the original challenge The Great Robot Race. I was involved on the sidelines with the Urban challenge with the TUM/UniBW/Karlsruhe AnnieWAY entry.

Some of the key competitors include Carnegie Mellon University (Who won the DARPA urban challenge, and came second in the DARPA grand challenge), Virgina Tech (3rd in urban), University of Pennsylvania and Cornell who tied at 5th. You can read about the whole list of competitors on a Dr Dobb's Journal article covering the MAGIC 2010 selection event. Naturally, these teams will be fooled into a sense of complacency and they won't stand a chance against our home-grown team.

We are building 7 robots in WA, and approx 10 robots in SA. You can see the WA robot in the picture with this post, it is based on the Pioneer All-Terrain robotic platform. The WA robots will have DGPS, inertial measurement systems (gyroscopes & accelerometers) for relative positioning, various laser range finders for mapping and collision deteciton, stereo cameras for distance measurement and object recognition and PTZ cameras for object tracking. This should be a very exciting project!

Some of the key areas for software work include building the simulator, developing the computer vision algorithms for object recognition and tracking (including tracking people), the LIDAR data processing, sensor fusion algorithms, SLAM algorithms, the multi-agent team coordination and planning, the data-communications, path planning, trajectory generation, influence mapping, the list goes on!

So if you have any interest in the project, please drop me a line! It's your chance to be famous, and to be part of this history-making event!

Thursday, October 29, 2009

Unity

One short quick post, Unity is now free. Of course, this isn't the full edition with all the tools you would need if you are developing anything bigger than a one-man game..

Still, the price is right.

If your the free as in speech kind, try the Blender game engine.

Wednesday, October 28, 2009

Boston Dynamics: PETMAN

Boston Dynamics, famous for Big Dog (check out the Big Dog youtube video), are now working on a Biped, PETMAN.

Its still early on in the project, but they seem to be making good progress on the mechanical side of things. The noisy motor might prove to be a bit of an issue.

I wonder how long until the japanese fighting-toy robots get to be using this kind of equipment. It's probably still quite some time, since robot prices don't tend to drop too much over the years. I'm predicting it will still be a very long wait till consumers see anything beyond toy bipedal robots.

Tuesday, October 27, 2009

Apple Mac OS X Utilities

Everyone needs tools to use their PC. I covered the essentials for a Windows PC previously. Some good (free) tools for your Mac:

StuffIt Expander, your OS X WinRAR/WinZip equivalent.
MacFUSE, this extension enables file systems in user space for the mac. Ever want to read NTFS?
NTFS 3g, the NTFS plug in for macFUSE, no more 'Items could not be moved because XXX cannot be modified'. Why OS X doesn't support NTFS is beyond me. I have been using the NTFS 3g plug in with no problems for about a year now.
Opera, while not a Mac-only utility, having the Opera web browser is essential. Especially when Safari acts up, or you want to use IRC, or bittorrent, or RSS, or email, or anything really. Its an all in one solution.
Parallels, lets you run your bootcamp Windows partition seamlessly inside the mac, and with graphics acceleration to boot. Nifty. While not free, I feel its worthwhile. The free alternative is virtualbox.
Nocturne, this lets you dim, and otherwise change your screen display. Very useful for late night computing.
Small Image, for quick image resizing.
Paintbrush for mac.
Perian and Flip4Mac WMVfor extending media file support by OS X.

Plus the usual assortment of VLC, Mplayer, Filezilla, Adobe Reader, etc. A great place to find nifty tools is I use this, for OS X. Tools that get a lot of recommendations, but I don't use much are iStat Pro, VMware, Little Snitch and Transmission. I still haven't figured out if MacPorts or Fink is better, so far, I've had poor experiences with both.

Tuesday, October 20, 2009

Timing square root on the GPU

Inspired by the post by Elan Ruskin (Valve) on x86 SQRT routines I thought I would visit this for my supercomputing platform of choice, the GPU. These kinds of low level trickery I left behind after finishing with RMM/Pixel-Juice some time around 2000, having decided that 3dNow! reciprocal square root routines were more than good enough..

Anyway, a brief overview of how we can do square roots:

Calculate it with the FPU, (however that was implemented by the chip manafacturer).
Calculate it from newton-raphson. This allows you to control the accuracy of the sqrt. (Or typicaly rsqrt) This comes in two flavours:
- Use an initial estimate, and refine ALA Greg Walsh / John Carmack / Quake 3 approach.
- Use a lookup table, then refine. This is probably an obvious approach, but I think AMD did a lot of pioneering work on this approach. (Well, at least, I learned these tricks from them..) See nVidias lookup table sqrt code.
These approaches typically approximate the inverse square root, so this means we need to:
Calculate it from the inverse. This comes in two flavours:
- Calculate the reciprical, then invert it (1/rsqrt(x)), this gives you correct results
- Multiply it by the input value (x*rsqrt(x)), this gives you faulty results around 0, but saves you a costly divide.
  Note:
  1.0f / rsqrtf(0.0f) = 1.0f / infinity = 0.0f
  0.0f * rsqrtf(0.0f) = 0.0f * infinity = NaN

Elan's results indicated the x86 SSE units rsqrtss instruction was the fastest (no suprise - it is also a rough estimate), followed by SSE rsqrt with newton-raphson iteration for improvement, then Carmack’s Magic Number rsqrt, and finally the x86 FPU's sqrt. Note that many people don't actually get to the slowest point on Elan's scale, since they don't enable intrinsics when compiling, meaning that the C compiler will use the C library sqrt routine, and not the FPU.
I decided to test three routines for the GPU:

native sqrt
native rsqrt
Carmack's rsqrt

Now benchmarking and timing on the lowest level has always been somewhat of a black art (see Ryan Geiss's article on timing), but that is even more true on the GPU - you need to worry about block sizes, as well as the type of timer, etc.
I did my best at generating reliable results by testing block sizes from 2..256 and performing 2.5 million sqrt operations. Here are the results from my nVidia 9800GX2:

Method	Total time	Max. ticks per float	Avg. ticks per float	Std. Dev.	Avg. Error
GPU intrinsic SQRT	1.285ms	5.99	3.99	0.00	0.00%
GPU intrinsic RSQRT * x	1.281ms	5.99	3.99	0.00	0.00%
Carmack RSQRT * x	2.759ms	6.28	4.26	0.01	0.09%

Total time is the total time measured by the CPU that the GPU took to launch the kernel and calculate the results. The clock ticks are meant to be more accurate measurements using the GPU's internal clock, but I find that to be dubious.
The conclusions to take from these results are simple: Carmack's inverse and other trickery isn't going to help, using the GPU RSQRT function as opposed to the inbuilt SQRT function saves you about a clock tick or two. (Probably because nVidias SQRT is implemented as 1/RSQRT, as opposed to X*RSQRT)
I'm happy to say, low level optimization tricks are still safely a thing of the past.
You can get the code for the CUDA benchmark here: GPU SQRT snippet.

Adrian Boeing: Blog

Wednesday, November 25, 2009

Catchup post

Monday, November 16, 2009

Learn CUDA - Perth, Western Australia

Thursday, November 05, 2009

MAGIC 2010 - Team MAGICian - ECU, UWA, Flinders, Thales

Thursday, October 29, 2009

Unity

Wednesday, October 28, 2009

Boston Dynamics: PETMAN

Tuesday, October 27, 2009

Apple Mac OS X Utilities

Tuesday, October 20, 2009

Timing square root on the GPU

Popular Posts

Blog Archive

Labels

My Blog List

Links

Feedjit

About Me

Followers

Wednesday, November 25, 2009

Monday, November 16, 2009

Thursday, November 05, 2009

Thursday, October 29, 2009

Wednesday, October 28, 2009

Tuesday, October 27, 2009

Tuesday, October 20, 2009

Popular Posts

Blog Archive

Labels

My Blog List

Links

Subscribe To

Feedjit

About Me

Followers