Thursday, June 07, 2012

GPU Technology Conference 2012

nVidia's GPU Technology Conference is over, and a number of presentation slides have been uploaded. There were a quite a few interesting talks relating to graphics, robotics and simulation:
  • Simon Green from nVidia and Christopher Horvath from Pixar presented 'Flame On: Real-Time Fire Simulation for Video Games'. It starts with a recent history of research on CG fluid systems, and gives five tips on better looking fire: 1. Get the colors right (e.g. radiation model), 2. Use high quality advection (not just bilinear filtering), 3. Post process with glow and motion blur. 4. Add noise. 5. Add light scattering and embers. They then go into more detail on Tip #1 looking at the physics behind the black-body radiation in a fire, and the color spectrum.
  • Elmar Westphal of PGI/JCNS-TA Scientific IT-Systems presented 'Multiparticle Collision Dynamics on one or more GPUs', about multiparticle collision dynamics GPU code. He starts by explaining the overall algorithm, and explaining step-by-step what performs well on the GPU. Specific GPU optimisations explained include spatial subdivision lists, reordering particles in memory, hash collisions, and finally dividing workload between multiple GPU's. An interesting read.
  • Michal Januszewski from the University of Silesia in Katowice introduces 'Sailfish: Lattice Boltzmann Fluid Simulations with GPUs and Python'. He explains lattice boltzmann fluid simulation, and some of the different configurations of lattice connectivity and collision operators. Moves into code generation examples, and gives a brief explanation of how the GPU implementation works.
  • Nikos Sismanis, Nikos Pitsianis and Xiaobai Sun (Aristotle University, Duke University) cover 'Efficient k-NN Search Algorithms on GPUs'. Starts with an overview of sorting and K-Nearest Neighbour (KNN) search algorithm solutions, including ANN (approximate NN) and lshkit and moves into results including a comparison of thrust::sort with Truncated Bitonic sort. Software is available at
  • Thomas True of nVidia explains 'Best Practices in GPU-Based Video Processing' and covers overlapping copy-to-host and copy-to-device operations, and an example of processing bayer pattern images.
  • Scott Rostrup, Shweta Srivastava, and Kishore Singhal from Synopsys Inc. explain 'Tree Accumulations on GPU' using parallel scatter, parallel reduce and parallel scan algorithms.
  • Wil Braithwaite from nVidia presents an interesting talk on 'Interacting with Huge Particle Simulations in Maya using the GPU'. He begins with a brief runthrough of the workings of the CUDA SPH example, and then moves onto the particle system including Maya's body forces (uniform, radial, vortex), shape representations (implicit, covex hull, signed distance fields, displacement maps), collision response, SPH equations, and finally data transfer. Ends with a brief overview of rendering the particles in screen space. Neat.
  • David McAllister and James Bigler (nVidia) cover the OptiX internals in 'OptiX Out-of-Core and CPU Rendering' including PTX code generation and optimisation, and converting the OptiX backend to support CPU's via Ocelot and LLVM. An interesting result, LLVM does better at optimising "megafunctions" than small functions, but not entirely unexpected given how LLVM works. The presentation finishes with an overview of paging and a tip on bounding volume heirarchies. Good to see Ocelot in the mainstream.
  • Eric Enderton and Morgan McGuire from nVidia explain 'Stochastic Rasterization' (ala 'screen door transparency' rendering) via MSAA for motion blur, depth of field and order-independent transparency, by using a geometry shader to bound the shape and motion of each tri in screen space, and setting up the MSAA masks. Nice.
  • Cliff Woolley presents 'Profiling and Tuning OpenACC Code' (by adding pragmas to C / Fortran code, ala OpenMP) using an example of Jacobi iteration, and there were a number of other talks on the topic.
  • Christopher Bergström introduced 'PathScale ENZO' the alternative to CUDA and OpenCL.
  • Phillip Miller from nVidia did an broad coverage of 'GPU Ray Tracing'. He starts with a myths and claimed facts on GPU raytracing, highlights some commercial GPU raytracers (and the open source OpenCL LuxRenderer) and goes into some details that are better explained in the OptiX Out-of-Core presentation.
  • Phillip Miller follows with 'Advanced Rendering Solutions' where he takes a look at nVidia's iray, and where they believe they can introduce new capabilities for design studios and find a middle ground with re-lighting and physcially based rendering.
  • Peter Messmer presents 'CUDA Libraries and Ecosystem Overview', where he provides an overview of the linear algebra cuBLAS and cuSPARSE libraries performance, then moves to signal processing with cuFFT and NPP/VSIP for image processing, next is random numbers via cuRAND and finally ties things up with Thrust.
  • Jeremie Papon and Alexey Abramov discuss the 'Oculus real-time modular cognitive visual system' including GPU accelerated stereo disparity matching, likelihood maps and image segmentation with a parallel metropolis algorithm.
  • Jérôme Graindorge and Julien Houssay from Alyotech present 'Real Time GPU-Based Maritime Scenes Simulation' beginning with ocean simulation and rendering from FFT based wave simulation using HF and LF heightmap components. They then cover rendering the mesh, scene illumination and tone mapping, and a sneak peak at boat interaction. The ocean simulation video is neat.
  • Dan Negrut from the Simulation-Based Engineering Lab at the University of Wisconsin–Madison gives an overview of the labs multibody dynamics work in 'From Sand Dynamics to Tank Dynamics' including friction, compliant bodies, multi-physics (fluid/solid interactions), SPH, GPU solution to the cone complementary problem, ellipsoid-ellipsoid CCD, multi-CPU simulation, and finally vehicle track simulation in sand. Wow. Code is available on the Simulation-Based Engineering Lab website.
  • Max Rietmann of USI Lugano looks at seismology (earthquake simulation) in 'Faster Finite Elements for Wave Propagation Codes' and describes parallising FEM methods for GPUs in SPECFEM3D.
  • Dustin Franklin from GE introduces GE's MilSpec ruggedised Kepler-based GPU solutions and Concurrent Redhawk6 in 'Sensor Processing with Rugged Kepler GPUs'. Looks at some example applications including hyperspectral imaging, mosaicing, 360 degree vision, synthetic aperture radar processing, and space-time adaptive processing for moving target identification.
  • Graham Sanborn of FunctionBay presents 'Particle Dynamics with MBD and FEA Using CUDA' and gives a brief overview of their combined CPU/GPU multi-body FEA system and briefly describes the contact, contact force, and integration steps.
  • Ritesh Patel and Jason Mak of University of California-Davis cover the Burrows-Wheeler Transform, Move-to-Front Transform and Huffman Coding in 'Lossless Data Compression on GPUs'. They find merge sort for BWT performs best on the GPU, explain the parallel MTF transform and Huffman in illustrative detail and tie things up with benchmarks, unfortunately GPU is 2.78x slower than CPU.
  • Nikolai Sakharnykh and Nikolay Markovskiy from NVIDIA provide an indepth explanation of their GPU implementation of solving ADI with tridiagonal systems in '3D ADI Method for Fluid Simulation on Multiple GPUs'.
  • Enrico Mastrostefano, Massimo Bernaschi, and Massimiliano Fatica investigate breadth first search in 'Large Graph on multi-GPUs' and describe how best to parallelise it across multiple GPU's by using adjacency lists and level frontiers to minimise the data exchange.
  • Bob Zigon from Beckman Coulter presents '1024 bit Parallel Rational Arithmetic Operators for the GPU' and covers exact 1024 bit rational arithmetic (add,sub,mul,div) for the GPU. Get the 1024 bit arithmetic code here.
  • Roman Sokolov and Andrei Tchouprakov of D4D Technologies discuss 'Warped parallel nearest neighbor searches using kd-trees' where they take a SIMD style approach by grouping tree searches via voting (ballot)
  • David Luebke from nVidia takes a broad look at CG in 'Computational Graphics: An Overview of Graphics Research @ NVIDIA' and provides an overview of research which is featured in a number of previous talks and other GTC talks including edge aware shading, ambient occlusion via volumes and raycasting, stochastic rendering, improved image sampling and reconstruction, global illumination, and CUDA based rasterization.
  • Johanna Beyer and Markus Hadwiger from King Abdullah University of Science and Technology discuss 'Terascale Volume Visualization in Neuroscience' where each cubic mm of the brain scanned with an electron microscope generates 800 tereabytes of data. The idea here is to leverage the virtual memory manager to do all the intelligent caching work, rather than a specialised spatial datastructure for the volume rendering.
  • Mark Kilgard introduces the NV_path_rendering extension in 'GPU-Accelerated Path Rendering', and demonstrates using the GPU to render PDF, flash, clipart, etc. Contains some sample code.
  • Janusz Będkowski from the Warsaw University of Technology presented 'Parallel Computing In Mobile Robotics For RISE' a full GPGPU solution for processing mobile robot laser scan data through to navigation. Starts with data registration into a decomposed grid which is then used for scan matching with point-to-point Iterative Closest Point. Next is estimating surface normals using principle component analysis, demonstrated on velodyne datasets. This is used to achieve point-to-plane ICP and he demonstrates a 6D SLAM loop-closure. Finishes it all off with a simple gradient based GPU path planner.
Note that in recent days more presentation PDF's have been uploaded so there is still plenty to look through, and with all the content it's difficult to look through it all - take a look yourself! I'll leave you with a video from the GTC 2012 keynote on rendering colliding galaxies: