Showing posts with label GPGPU. Show all posts
Showing posts with label GPGPU. Show all posts

Thursday, June 07, 2012

GPU Technology Conference 2012

nVidia's GPU Technology Conference is over, and a number of presentation slides have been uploaded. There were a quite a few interesting talks relating to graphics, robotics and simulation:
  • Simon Green from nVidia and Christopher Horvath from Pixar presented 'Flame On: Real-Time Fire Simulation for Video Games'. It starts with a recent history of research on CG fluid systems, and gives five tips on better looking fire: 1. Get the colors right (e.g. radiation model), 2. Use high quality advection (not just bilinear filtering), 3. Post process with glow and motion blur. 4. Add noise. 5. Add light scattering and embers. They then go into more detail on Tip #1 looking at the physics behind the black-body radiation in a fire, and the color spectrum.
  • Elmar Westphal of PGI/JCNS-TA Scientific IT-Systems presented 'Multiparticle Collision Dynamics on one or more GPUs', about multiparticle collision dynamics GPU code. He starts by explaining the overall algorithm, and explaining step-by-step what performs well on the GPU. Specific GPU optimisations explained include spatial subdivision lists, reordering particles in memory, hash collisions, and finally dividing workload between multiple GPU's. An interesting read.
  • Michal Januszewski from the University of Silesia in Katowice introduces 'Sailfish: Lattice Boltzmann Fluid Simulations with GPUs and Python'. He explains lattice boltzmann fluid simulation, and some of the different configurations of lattice connectivity and collision operators. Moves into code generation examples, and gives a brief explanation of how the GPU implementation works.
  • Nikos Sismanis, Nikos Pitsianis and Xiaobai Sun (Aristotle University, Duke University) cover 'Efficient k-NN Search Algorithms on GPUs'. Starts with an overview of sorting and K-Nearest Neighbour (KNN) search algorithm solutions, including ANN (approximate NN) and lshkit and moves into results including a comparison of thrust::sort with Truncated Bitonic sort. Software is available at http://autogpu.ee.auth.gr/.
  • Thomas True of nVidia explains 'Best Practices in GPU-Based Video Processing' and covers overlapping copy-to-host and copy-to-device operations, and an example of processing bayer pattern images.
  • Scott Rostrup, Shweta Srivastava, and Kishore Singhal from Synopsys Inc. explain 'Tree Accumulations on GPU' using parallel scatter, parallel reduce and parallel scan algorithms.
  • Wil Braithwaite from nVidia presents an interesting talk on 'Interacting with Huge Particle Simulations in Maya using the GPU'. He begins with a brief runthrough of the workings of the CUDA SPH example, and then moves onto the particle system including Maya's body forces (uniform, radial, vortex), shape representations (implicit, covex hull, signed distance fields, displacement maps), collision response, SPH equations, and finally data transfer. Ends with a brief overview of rendering the particles in screen space. Neat.
  • David McAllister and James Bigler (nVidia) cover the OptiX internals in 'OptiX Out-of-Core and CPU Rendering' including PTX code generation and optimisation, and converting the OptiX backend to support CPU's via Ocelot and LLVM. An interesting result, LLVM does better at optimising "megafunctions" than small functions, but not entirely unexpected given how LLVM works. The presentation finishes with an overview of paging and a tip on bounding volume heirarchies. Good to see Ocelot in the mainstream.
  • Eric Enderton and Morgan McGuire from nVidia explain 'Stochastic Rasterization' (ala 'screen door transparency' rendering) via MSAA for motion blur, depth of field and order-independent transparency, by using a geometry shader to bound the shape and motion of each tri in screen space, and setting up the MSAA masks. Nice.
  • Cliff Woolley presents 'Profiling and Tuning OpenACC Code' (by adding pragmas to C / Fortran code, ala OpenMP) using an example of Jacobi iteration, and there were a number of other talks on the topic.
  • Christopher Bergström introduced 'PathScale ENZO' the alternative to CUDA and OpenCL.
  • Phillip Miller from nVidia did an broad coverage of 'GPU Ray Tracing'. He starts with a myths and claimed facts on GPU raytracing, highlights some commercial GPU raytracers (and the open source OpenCL LuxRenderer) and goes into some details that are better explained in the OptiX Out-of-Core presentation.
  • Phillip Miller follows with 'Advanced Rendering Solutions' where he takes a look at nVidia's iray, and where they believe they can introduce new capabilities for design studios and find a middle ground with re-lighting and physcially based rendering.
  • Peter Messmer presents 'CUDA Libraries and Ecosystem Overview', where he provides an overview of the linear algebra cuBLAS and cuSPARSE libraries performance, then moves to signal processing with cuFFT and NPP/VSIP for image processing, next is random numbers via cuRAND and finally ties things up with Thrust.
  • Jeremie Papon and Alexey Abramov discuss the 'Oculus real-time modular cognitive visual system' including GPU accelerated stereo disparity matching, likelihood maps and image segmentation with a parallel metropolis algorithm.
  • Jérôme Graindorge and Julien Houssay from Alyotech present 'Real Time GPU-Based Maritime Scenes Simulation' beginning with ocean simulation and rendering from FFT based wave simulation using HF and LF heightmap components. They then cover rendering the mesh, scene illumination and tone mapping, and a sneak peak at boat interaction. The ocean simulation video is neat.
  • Dan Negrut from the Simulation-Based Engineering Lab at the University of Wisconsin–Madison gives an overview of the labs multibody dynamics work in 'From Sand Dynamics to Tank Dynamics' including friction, compliant bodies, multi-physics (fluid/solid interactions), SPH, GPU solution to the cone complementary problem, ellipsoid-ellipsoid CCD, multi-CPU simulation, and finally vehicle track simulation in sand. Wow. Code is available on the Simulation-Based Engineering Lab website.
  • Max Rietmann of USI Lugano looks at seismology (earthquake simulation) in 'Faster Finite Elements for Wave Propagation Codes' and describes parallising FEM methods for GPUs in SPECFEM3D.
  • Dustin Franklin from GE introduces GE's MilSpec ruggedised Kepler-based GPU solutions and Concurrent Redhawk6 in 'Sensor Processing with Rugged Kepler GPUs'. Looks at some example applications including hyperspectral imaging, mosaicing, 360 degree vision, synthetic aperture radar processing, and space-time adaptive processing for moving target identification.
  • Graham Sanborn of FunctionBay presents 'Particle Dynamics with MBD and FEA Using CUDA' and gives a brief overview of their combined CPU/GPU multi-body FEA system and briefly describes the contact, contact force, and integration steps.
  • Ritesh Patel and Jason Mak of University of California-Davis cover the Burrows-Wheeler Transform, Move-to-Front Transform and Huffman Coding in 'Lossless Data Compression on GPUs'. They find merge sort for BWT performs best on the GPU, explain the parallel MTF transform and Huffman in illustrative detail and tie things up with benchmarks, unfortunately GPU is 2.78x slower than CPU.
  • Nikolai Sakharnykh and Nikolay Markovskiy from NVIDIA provide an indepth explanation of their GPU implementation of solving ADI with tridiagonal systems in '3D ADI Method for Fluid Simulation on Multiple GPUs'.
  • Enrico Mastrostefano, Massimo Bernaschi, and Massimiliano Fatica investigate breadth first search in 'Large Graph on multi-GPUs' and describe how best to parallelise it across multiple GPU's by using adjacency lists and level frontiers to minimise the data exchange.
  • Bob Zigon from Beckman Coulter presents '1024 bit Parallel Rational Arithmetic Operators for the GPU' and covers exact 1024 bit rational arithmetic (add,sub,mul,div) for the GPU. Get the 1024 bit arithmetic code here.
  • Roman Sokolov and Andrei Tchouprakov of D4D Technologies discuss 'Warped parallel nearest neighbor searches using kd-trees' where they take a SIMD style approach by grouping tree searches via voting (ballot)
  • David Luebke from nVidia takes a broad look at CG in 'Computational Graphics: An Overview of Graphics Research @ NVIDIA' and provides an overview of research which is featured in a number of previous talks and other GTC talks including edge aware shading, ambient occlusion via volumes and raycasting, stochastic rendering, improved image sampling and reconstruction, global illumination, and CUDA based rasterization.
  • Johanna Beyer and Markus Hadwiger from King Abdullah University of Science and Technology discuss 'Terascale Volume Visualization in Neuroscience' where each cubic mm of the brain scanned with an electron microscope generates 800 tereabytes of data. The idea here is to leverage the virtual memory manager to do all the intelligent caching work, rather than a specialised spatial datastructure for the volume rendering.
  • Mark Kilgard introduces the NV_path_rendering extension in 'GPU-Accelerated Path Rendering', and demonstrates using the GPU to render PDF, flash, clipart, etc. Contains some sample code.
  • Janusz Będkowski from the Warsaw University of Technology presented 'Parallel Computing In Mobile Robotics For RISE' a full GPGPU solution for processing mobile robot laser scan data through to navigation. Starts with data registration into a decomposed grid which is then used for scan matching with point-to-point Iterative Closest Point. Next is estimating surface normals using principle component analysis, demonstrated on velodyne datasets. This is used to achieve point-to-plane ICP and he demonstrates a 6D SLAM loop-closure. Finishes it all off with a simple gradient based GPU path planner.
Note that in recent days more presentation PDF's have been uploaded so there is still plenty to look through, and with all the content it's difficult to look through it all - take a look yourself! I'll leave you with a video from the GTC 2012 keynote on rendering colliding galaxies:

Saturday, April 07, 2012

Game Developers Conference 2012 - Technical summary

GDC2012 is over, and this year there are a huge number of available presentations. You can download the Game Developer Conference 2012 presentations from the GDC vault, Jare / Iguana has also kept a link collection from GDC 2012. I've looked over all the technical publications available and put together this summary post. (Edit: I've updated this to cover some maths, physics, and graphics material I missed on the first pass - thanks Johan & Eric)

I'll start with graphics.

Louis Bavoil / nVidia and Johan Andersson / DICE have a presentation on "Stable SSAO in Battlefield 3 with Selective Temporal Filtering", ambient occlusion is a well established technique now, but they apply a quick way to use past data and the differences in Z buffer states between frames to intelligently reuse the AO results. They also look at filters and optimising blur functions. Similar to established tricks in the realtime raytracing demoscene.

Eban Cook / Naughty Dog presented "Creating Flood Effects in Uncharted 3", a technical artist look at water effects. Unfortunately realtime fluid simulation wasn't used, instead Houdini was used to pre-generate the game content. An overview of the shaders for water, water particles, froth particles, and lighting is given.
Light probe interpolation

Robert Cupisz / Unity discussed light probes, "Light probe interpolation using tetrahedral tessellations", in terms of choosing the appropriate probe and weights using Delaunay Triangulation / Tetrahedrons and Barycentric Coordinates by dividing scenes into convex hulls. Also covers projecting onto the nearest convex hull, covers it all with a fair bit of maths, this would be of interest to physics / collision detection programmers too. There is a collection of nice links and some sample code at the end.

Matthijs De Smedt / Nixxes covers "Deus Ex is in the Details" using DX11 tech. Covers AA (FXAA DLAA MLAA), SSAO, DOF (Gaussian blur), tessellation and soft-shadows.

Colt McAnlis / Google investigates post-compressing DXT textures in his talk "DXT is not enough", trying to out-do zipped DXT's with delta encoding. More info at this blog post or skip it all and download the DXT CRUNCH compressor here.

Matt Swoboda / Sony & Fairlight delves into Signed Distance Fields, a demoscene hot-topic last year, with the talk "Advanced Procedural Rendering in DirectX 11". Investigates converting polgyon mesh data and particle data into signed distance fields. Takes an in-depth look into a optimised marching cubes implementation for a fluid simulation with smooth particle hydrodynamics (SPH), and how to use signed distance fields to do ambient occlusion.
Physically based rendering in realtime 

Yoshiharu Gotanda / tri-Ace research makes a case for physically based rendering with a Blinn-Phong model in the presentation "Practical Physically Based Rendering in Real-time". An indepth look at the BRDF formulation they use.

Wolfgang Engel, Igor Lobanchikov and Timothy Martin / Confetti present "Dynamic Global Illumination from many Lights", just a bunch of pictures, not much information.

Carlos Gonzalez Ochoa / Naughty Dog covers "Water Technology of Uncharted". Covers the shader, animating the normal maps flow, and simulating the ocean water with Gerstner waves, b-spline waves, and wave particles. They go on to look at LOD with "Irregular Geometry Clipmaps" including fixing T joints, and then culling, skylights and underwater fog. Next physics, attaching objects (buoyant), and point queries. Finally, SPU optimization. Quite comprehensive.
Water technology of Uncharted

Ben Hanke / SlantSixGames describes the bone code in "Rigging a Resident Evil". Transforms are described with 9 functions and processed with an optimising compiler, allowing fast retargeting of animations.

Scott Kircher / Volition Inc expands on Inferred Lighting in "Lighting & Simplifying Saints Row: The Third" by looking into lighting for rain, foliage, dynamic decals, and radial ambient occlusion. Then moves on to automated mesh simplification using iterative edge contraction and takes an indepth look at selecting an appropriate error metric.

Nathan Reed / Sucker Punch Productions discusses "Ambient Occlusion Fields and Decals in Infamous 2", going into depth on how to solve the artifacts of this approach.

Marshall Robin / Naughty Dog covers the effects system tools in "Effects Techniques Used in Uncharted 3: Drake’s Deception".

Niklas Smedberg / Epic Games looks at PowerVR GPU processing pipeline and capabilities in "Bringing AAA graphics to mobile platforms" and provides a number of tricks'n'tips on optimising the performance of the mobile GPU, and highlights the cheap operations. In short: AA (fast), hidden surface removal (fast), alpha test (slow), render targets (slow), texture lookups (slow). Takes a more detailed look at material shaders, god rays, and character shadows. All in all, pretend its ~2002, and you'll be right.

Mickael Gilabert / Ubisoft and Nikolay Stefanov / Massive cover the GI system in Far Cry 3 in "Deferred Radiance Transfer Volumes". Light probes get precomputed directional radiance transfer data from a custom raytracer stored using spherical harmonics. Source code for the relighting system is presented, along with optimisations by using volume textures.

John McDonald / nVidia explains CPU/GPU synching for buffers in "Don’t Throw it all Away: Efficient Buffer Management" and provides advice on buffer creation flags.

Bryan Dudash / nVidia suggests using average normals to overcome tesselation issues in "My Tessellation Has Cracks!".
Mastering DX11 with Unity

Renaldas Zioma / Unity and Simon Green / nVidia present "Mastering DirectX 11 with Unity". Starts by looking into Unity's physically based shaders (Oren-Nayar, Cook-Torrance, and energy conservation, then blurry reflections and combining normal maps). Next up, Catmull-Clark Subdivision, tetrahedra light probes (See Robert Cupisz's talk), HBAO, APEX destruction, Hair simulation with guide hairs, Explosions using signed distance fields with noise and color gradients,and finally velocity buffer motion blur.

Tobias Persson / Bitsquid discusses lighting billboards in "Practical Particle Lighting". Looks at normal generation and per-pixel lighting for billboards (including code snippets), applying shadow maps with a domain shader, and shadow casting

Karl Hillesland / AMD investigates realtime Ptex (per-face texture mapping) in "Ptex and Vector Displacement in AMD Demos", and efficient retrieval from the texture atlas, including all MIPs.

Jay McKee / AMD presents the "Technology Behind AMD’s
Leo Demo". He details some of the code behind the forward rendering of 3000 dynamic light sources using a depth pre-pass, light culling (tile-based compute shader to output light list), and light accumulation with materials phase. Basically moves the light management code from CPU to GPU.
Terrain in Battlefield 3

Mattias Widmark / DICE presents "Terrain in Battlefield 3: A modern, complete and scalable system". Begins with an overview of the features for the terrain system (heightfield based, procedurally generated, spline decals, decoration (tree,rock,grass), destruction, water), and presents their quadtree terrain data structure, paying particular attention to LOD. Next, CPU/GPU performance is investigated, and a clip-map based virtual texture system is presented. The large terrain data set is managed by intelligently streaming data to the required detail ('blurriness'), and co-locating data (heighfield/color/mask lumped together, next to the next level of relevant LOD data), which is also compressed (RLE/DXT1). Nodes are then prioritized based on distance, culling, and updates (e.g. destruction). Finally, mesh generation, stitching and tessellation with displacement on the GPU.




Moving on to physics.

Erin Catto covers "Diablo 3 Ragdolls", including representing ragdoll bones, initialising ragdolls from animations, and interacting kinematic and dynamic objects.

François Antoine / Epic talks about Gears of Wars 3 destruction physics in "Pushing for Large Scale Destruction FX" and suggests using particles for dust and debrie, and simplifying meshes for destruction.

Stephen Frye / EA looks at ragdolls in the presentation "Tackling Physics". Highlights aspects of ragdolls that look unrealistic, and suggests adding joint limits and motorized constraints at joints to simulate muscles. Gives two approaches to solving the control problem, first using external forces, second calculating the appropriate torque from world space.

Graham Rhodes / Applied Research Associates presents "Computational Geometry" where he looks at half-edge data structures for triangulating a polygon, splitting a face, splitting an edge, intersection of an edge and a plane and generating a convex hull.

Richard Tonge / nVidia covers "Solving Rigid Body Contacts" and starts with a gentle introduction to rigid body state space and progressively builds a signal-block-diagram of solving a single contact restraint. Then looks at each block in the diagram and deciphers the physics behind it. He then looks at solving multiple contacts, and explains why you can't apply a linear solver to the problem (contacts break), and presents the LCP, and an alternative approach; sequential impulses. He then gives a whirlwind tour of GPU solvers.

Gino van den Bergen / DTECTA presents "Collision Detection", first covering shapes, then configuration space, distance tests, Seperating Axis Tests, and takes a closer look at the GJK algorithm.

Jim Van Verth / Insomniac gives a nice introduction to Navier Stokes in "Fluid Techniques", breaking down the terms for external forces, viscocity, advection and pressure visually. Then looks at three major representations for fluids, grid, particle and surface (wave) based.

Takahiro Harada / AMD examines how heterogeneous compute architectures can achieve large scale dynamic simulations in "Toward A Large Scale Simulation". Begins with an overview of GPU architecture, and GPU rigid body simulation in three key phases: broad-phase, narrow-phase and constraint solving for a system of 128,000 particles and 12,000 convex bodies. He presents a design for overcoming data transfer and minimising synchronisation points whilst dividing the workload between CPU and GPU.

Erwin Coumans / AMD investigates destructive physics in the aptly titled "Destruction". He begins with generating voroni diagrams for shattering geometry and boolean operations, and moves into generating collision shapes with convex decomposition and tetrahedralization. Then moves on to realtime approaches with real-time booleans and breakable constraints and finite element
methods.



Looking at AI.

Bobby Anguelov / IO Interactive, Gabriel Leblanc / Eidos-Montréal and Shawn Harris / Big Huge Games present "Animation-Driven Locomotion For Smoother Navigation". They start with the standard motion graphs and transitioning/blending between animation cycles. Then take an indepth look at footstep planning (IK, Foot sliding) and come up with a system for deciding where steps should be taken to fulfil the navigation goal. They then investigate modifying navigation paths to better fit the animation cycles, and finish by looking into collision avoidance.

Daniel Brewer / Digital Extremes looks at agent perception, reaction, combat chatter, buddy systems and collision avoidance using velocity space Optimal Reciprocal Collision Avoidance in "Building Better Baddies".

Brian Magerko / Georgia Tech covers "How to Teach Game AI from Scratch" including competitions (Mario AI, Google Ants, Poker AI, Starcraft AI).

Dave Mark / Intrinsic and Kevin Dill / Lockheed Martin investigate some examples (snipers, guards) of Utility-Based AI in "Embracing The Dark Art of Mathematical Modeling in Game AI".

Kasper Fauerby / IO Interactive explains "Crowds in Hitman:Absolution" including cell maps, boids, animation and PS3 implementation details. The crowd AI uses a state machine with steering behaviours (pending walk, walk, panic), and behaviour 'zones' with information from the navigation system to select behaviours. Near-optimal Character Animation with Continuous Control was used for animation.

Elan Ruskin / Valve looks at empowering writers and dialog in TF2, Left4Dead, etc, in "Rule Databases for Contextual Dialog and Game Logic". Begins with player triggered lines (extended by environment, memory etc.) and avoiding fill-in-the-blank dialog by using databases. Rules, queries, responses and writers tools are examined next, and ties things off with database query optimisations.

Mike Robbins / Gas Powered Games examines "Neural Networks in Supreme Commander 2", with 34 inputs and 15 output actions and a single hidden layer (98 neurons), with a fitness function composed from 17 inputs trained to control combat platoons.

Ben Sunshine-Hill / Havok investigates LOD for AI in "Perceptually Driven Simulation", and makes a case for using probability of noticing a difference instead of distance as a LOD measure, and presents a market-based "LOD trader" for selecting the appropriate LOD given the constraints on hand.

Moving along to programming and math.

Adisak Pochanayon / Netherrealm covers debugging and timing issues in "Runtime CPU Spike Detection using Manual and Compiler-Automated Instrumentation". First up, manual instrumentation and wrapper functions, Then detours, and automated instrumentation (compiler flags) with an indepth look at the 360. Finally, profiling with threshold functions.

Pete Isensee / Microsoft details how rvalue in C++11 (T&&) can eliminate temporaries in "Faster C++: Move Construction and Perfect Forwarding".

Scott Selfon / Microsoft reviews audio compression technologies in "The State of Ady0 Cmprshn", starting with time-domain compression with PCM (raw, A-Law, U-Law, ADPCM), then frequency-domain compression and discusses the artifacts generated by both, then evaluates the performance of different codecs.

Robin Green / Microsoft and Manny Ko / Dreamworks present "Frames, Quadratures and Global Illumination: New Math for Games". Beings with a review of spherical harmonics, Haar wavelets, and Radial basis functions. Builds up to 'Spherical Needlet' wavelets, by exploring different basis functions ('frames')

Gino van den Bergen / DTECTA presents dual-numbers in "Math for Game Programmers: Dual Numbers", beginning with a look at complex numbers. Automatic differentiation with dual numbers is then described, with code, and examined in curve tangents, directed line geometry (triangle/ray intersections, plucker coordinates, angles), and rigid body transforms/skinning (dual quaternions).

Jim Van Verth / Insomniac explains rotation formats in "Understanding Rotations", including angle (2d) Euler angles, Axis-angle, Matrix (2d/3d), complex (2d) and Quaternion (3d). Interpolation is considered for each case (including slerp).

Eric Lengyel / Terathon presents exterior (Grassmann) algebra in "Fundamentals of Grassmann Algebra". This includes the wedge product, bivectors, trivectors and multivectors. Moves on to cross product transforms, dual-basis 'anti-vectors', regressive 'antiwedge' product, and demonstrates how these can be used in homogeneous and plucker coordinate systems. This leads on to basic intersections (line, plane, point) and distances (point plane, two lines) and finally ray-triangle intersection using bivectors to avoid barycentric coordinates.

Squirrel Eiserloh / TrueThought presents "Interpolation and Splines". Takes us back to basics by looking at averaging and blending, and moves onto interpolation. Begins with quadratic and cubic Bézier curves, then moves into splines and discusses continuity. Cubic Hermite splines are up next, and how to convert between Bézier and Hermite, then Catmull-Rom splines and finishes with the more general Caridnal splines.

John O’Brien / Insomniac covers "Math for Gameplay / AI". Starts with object intersection tests (sphere-sphere, sphere-plane, AABB-AABB, AABB-ray, capsules-capsule, capsule-ray) and projecting onto a plane in a gun turrent AI example. Next up, Bayes' Theorem and conditional probability, followed by fuzzy logic.

The Web up next

Corey Clark and Daniel Montgomery present "Building a Multi-threaded Web-Based Game Engine" covering both client side (WebGL, WebSockets, etc) and server side (NodeJS, Hosting, etc).

Michael Weilbacher / Microsoft looks at server issues in "Dedicated Servers in Gears of War 3".

Michael Goddard "Developing a Javascript Game Engine"
using component based architecture. Takes an indepth look at events/promises and loading content.

Mike Dailly / YoYo investigates packing textures and command list execution for improving performance in "The Voodoo Art of Dynamic WebGL".

Marc O’Morain / Swrve takes a look at a number of issues (including iOS multitouch) in "Building Browser Based Games Using HTML5".

And rounding up everything else

Caruso Daniel explains the "Forza Motorsport Pipeline". Importing assets into the game.

GuayvJean-Francois investigates sound diffraction and absorption in "Real-time sound propagation".

Mike Lewis presents the challenges of multithreading for MMOs "Managing the Masses".

Sean Ahern looks at building better game engine tools in "It stinks and I don't like it"

Clara Fernández-Vara, Jesper Juul, and Noah Wardrip-Fruin make a case that "Game Education Needs Game History"

Chris Jurney presents his idea "Motion Blobs", a fast and crude kinect data "gesture" system, essentially an extension of the typical 2D approach to 3D. Steps are to calculate motion via background subtraction, filtering (open/close), labeling, and then correlation.

Alexander Lucas explains automated testing at Bioware in "The Automation Trap And How Bioware Engineers Quality"

Alex Mejia looks at camera movement in "Saints Row : The Third real time capture tools".

Scott Philips presents "Designing Over the Top SAINTS ROW: THE THIRD Postmortem", and highlights the importance of pre-visualization and playtesting.

Ron Pieket / Insomniac looks at eliminating downtime in "Developing Imperfect Software" via a 'Structured Binary' approach to building engine data by taking advantage of a Data Definition Language.

Benson Russell takes a look at Naughty Dog's approach to polish in "The last 10, going from good to awesome", in essence longer alpha and beta tests.

Luke Muscat takes a look at the lessons learnt while updating Fruit Ninja in "Iterating Design And Fighting Fires: Updating Fruit Ninja And Jetpack Joyride"

Tatyana Dyshlova talks about managing 300+ artists working on 500 car models in "Racing to the Finish"

Summary
Quite a collection this year, but overall seems to be less exciting content than previous years. For graphics, it seems that signed distance fields and physically based rendering is the new theme, AI is still playing catchup and character animation cycles are still a hot topic, following that theme, physics is also looking at characters and ragdolls, with destruction being the hot topic, and the web is focusing on WebGL.

Monday, December 12, 2011

GPU and Graphics catchup post

It has been a long while since I've done a graphics related post, so here is a bit of a backlog from the last few months of graphics and GPU links:
Finally, I'll leave you with a fantastic 64kb demo by Fairlight and Alcatraz that placed 2nd at Assembly 2011.

Friday, July 29, 2011

Half year catchup on Graphics, GPUs, Compilers, etc.

Another slow month on the blog. More than half way through the year, so its time to catch up on the backlog of news. Only covering graphics, games, physics and GPGPU and compilers. Expect a number of posts on robotics soon!
Finally, here is the SIGGRAPH 2011 technical papers highlights video, it contains a number of interesting advances in physics simulations and modeling.

Friday, January 07, 2011

Future CPU/GPU architectures and OpenCL

For a while x86 CPUs were just aimed at getting faster, more and more ticks. Then, there was a shift from faster to more efficient (in terms of power used / heat generated vs. performance) in the Pentium 3/4 era. Then, another shift towards multiple cores. Now, the next shift has occurred: CPU and GPU fusion. (Despite AMD talking about it for longer than I can recall. I think I was talking to an AMD VP shortly before they purchased ATI about how GPU and CPU fusion was the future.)

Recently Intel released Sandy Bridge their CPU/GPU processor, and AMD's fusion APU's have been around. Interestingly, both are targeted at laptops (the bulk of the market), and still expect discrete GPU's for high-end performance. The fusion is just a bit of a bonus.

Interestingly all those 2009 rumours about nVidia building an x86 chip turned out to be wrong, with the real product being far more interesting. An nVidia fusion product of an ARM CPU with nVidia GPU's. This might prove to be a very interesting result considering the amount of effort being put into improving the ARM architecture and the power usage concerns in both smartphones/tablets and supercomputing. With the next version of windows running on ARM there will certainly be interesting times ahead.

Intel/AMD aren't standing still on the CPU front and are pushing ahead with AVX but the real interesting part is that OpenCL is being pushed across the board. Recent publications from the UK GPU computing conference demonstrate that ARM are pushing OpenCL as their platform of choice (for both CPU and their Mali GPU, it is Clang/LLVM based), and AMD and Intel have been strong supporters of OpenCL too. Did you know Samsung supports OpenCL too?

It would seem that OpenCL has a very strong support base, and is likely the platform of choice for developers on Intel, AMD, ARM, Apple, IBM, etc. What hope does CUDA stand? It seems nVidia will eventually be forced to drop CUDA, or invest heavily into CPU-based CUDA (Hello Ocelot!) or OpenCL translation. Eventually people will tire of writing and re-writing CUDA programs as they swap between platforms.

Envision the not-so-far future where you have a workstation with multiple CPU/GPU fusion cores [non-CUDA], and a discrete fusion GPU (GPU [CUDA]/ARM [non-CUDA] core) - are you really willing to write a specific CUDA routine for just the nVidia GPU, or are you more likely to try a less optimal OpenCL routine that will then also then run on the CPU, CPU SIMD, CPU/GPU fusion and ARM cores?

It seems that the argument being put forward by nVidia to write everything in CUDA for optimal performance will not hold as you will gain more from all the other devices helping out in the computation compared to the loss from using OpenCL over CUDA.

Of course there are other choices out there such as Accelerator and RapidMind.

The UK conference reveals Microsoft's GPGPU language "Accelerator" has made progress, and now is no longer only GPU limited (it now supports SSE3 on modern CPU's). I'm still not aware of anyone using this for anything practical though, which seems to be a bit of a shame. And ever since Intel bought RapidMind (now Array Building Blocks) the tech became very Intel-specific. So neither of those choices sound too promising.

It will be interesting to see what nVidia decide to do with CUDA, how developers will adjust to fusion CPU / GPUs and what higher-level language (or language extension?) will become de-facto in the future. Stay tuned..

Wednesday, December 29, 2010

Catchup Post: Graphics

ShaderToy
Another set of interesting links:

Thursday, November 25, 2010

GPGPU catchup

Another month, another catchup post on the world of GPU's. A few interesting things have happened:

Sunday, September 12, 2010

SIGGRAPH 2010 Course Papers Overview

I've managed to work through the SIGGRAPH 2010 course content that relates to realtime rendering. I quite liked the Toy Story 3 rendering techniques and realtime rendering survey by nVidia- just because they give a nice overview. As always, there are a number of great presentations, and I've listed the ones that I found interesting below.
  • Toy Story 3 : The Video Game Rendering Techniques from Avalanche Software gives a great (211 page!) overview of a number of lighting issues for the game including SSAO (various optimizations/approximations for how/where to sample, faking more samples and dealing with large differences in depth), ambient lighting (without needing to bake it or do GI) and various aspects on shadows. A great read!
  • Surveying Real-Time Rendering Algorithms by David Luebke from nVidia gives an excellent short overview of a number of recent developments in realtime rendering algorithms including stochastic transparency (ie : transparency via random sampling), sample distribution for shadow maps (partitioning the scene in light-space), alias-free shadow maps, append-consume order-independent-transparency (sorting per-pixel & linked-lists), progressive photon mapping, image-space photon mapping, ambient occlusion volumes (how to speed it up with bitwise occlusion mask for each triangle - one per edge, and one for the triangle plane), stochastic rasterization (of 4d triangles)
  • Keeping Many Cores Busy: Scheduling the Graphics Pipeline by Jonathan Ragan-Kelly from MIT gives a great overview of the graphics pipeline stages (from Input Assembler, Vertex Shader, Primitive Assembler, Tesselation, Geometry Shader, Rasterizer, Pixel Shader, and finally Output Blending) and load balancing.
  • Uncharted 2 - Character Lighting and Shading by John Hable from Naughty Dog gives a fabulous overview of rendering issues with skin (in a lot of detail!), hair and clothes.
  • Bending the Graphics Pipeline by Johan Andersson from DICE describes tile-based deferred shading (for GPU and SPU), morphological antialiasing and analytical ambient occlusion.
  • A real-time Radiosity Architecture for Video Games by Sam Martin and Per Einarsson from DICE/Geomerics introduce the 'Enlighten' system for realtime GI - it gives a nice overview.
  • Evolving the Direct3D Pipeline for Real-­time Micropolygon Rendering by Kayvon Fatahalian from Stanford University gives an interesting insight on Micropolygon rendering on current GPU pipelines.
  • Water Flow in Portal 2 by Alex Vlachos - I've already written about this previously, just another realtime technique for faking the simulation and rendering of water flow.
  • Making Concept Real For Borderlands by Gearbox Software contains some nice examples of their concept art, the development change from photorealistic to stylistic rendering and art (and the code/artist balance), and the sobel edge filter they used.
  • The notes from the volumetric course was broken into parts:

    1. "Resolution Independent Volumes" - which describes the "Field Expression Language Toolkit", cloud modelling (via density displacement), morphing objects (by using the Nacelle Algorithm to generate warp fields), cutting models, fluid dynamics, gridless advection, and semi-lagrangian mapping (balancing between grids and non-grids).
    2. "Mantra Volume Rendering" - this describes the volume rendering engine for Houdini.
    3. "Volumetric Modeling and Rendering" describes the volumetrics library/API developed at Dreamworks.
    4. "Double Negative Renderer B" (used in Batman Begins, Harry Potter, etc.) describes the workflow and various shaders (Fluid, Particle, Voxel, Fire, Smoke) in DNB
    5. "Volume Rendering at Sony Pictures Imageworks". The section from Sony Imageworks included an overview of their pipeline and content on their open source field/fluid tools.

Wednesday, September 08, 2010

Catchup Post: Graphics & GPU's & Physics

A long overdue catchup post for various interesting things I've spotted over the last two or three months. SIGGRAPH recently finished, so it really deserves a round-up, although I haven't had time to review all the interesting things.
The annual tutorial on
realtime collision had some interesting presentations, I quite liked the one from Erwin Coumans (Bullet) this year - it gives a good overview of the recent advances in the Bullet engine including the GPU optimizations. Simon Green also has a presentation on CUDA SPH rendering (also see Jihun Yu's particle fluid surface reconstruction) and another open source GPU SPH simulation from HPC lab.
The realtime graphics tutorial, stylized rendering, volumetrics, and programmable shaders has some great stuff that I'll look into in more detail in a future post. Ofcourse there is also always the SIGGRAPH 2010 papers list, and SIGGRAPH asia papers (again more on this later..).

Some GPGPU software/links:
and MPTL is a parallel version of the STL algorithms.


Some physics links:
Some graphics links:
And finally a documentary on the history of the Future Crew demo group and Second Reality. Brings back the memories.

Friday, August 06, 2010

Computing events in Perth

GameJam is on again in Perth - I didn't see much about it this time, but it is in Joondalup again. Also, there is a GPU computing workshop on the 19th of August in the ARRC auditorium. More information on the workshop is available here.

Wednesday, August 04, 2010

Using Parallel Reduction to Optimize Dijkstra

In a previous blog post I wrote about Dijkstra's algorithm. One optimisation to the naive implementation is to calculate the minimum cost node in parallel - since this can often be the most time consuming operation for large graphs. This is a perfect candidate for a parallel reduction operation.

In serial, if we wished to sum up an array (x) of numbers, the code would be:
int sum = 0;
for(int i=0; i < N; ++i) 
 sum += x[i];

However, in parallel we could speed this up by summing parts of the array and then adding those parts together. For example, to add four numbers a,b,c,d we could first add a+b and c+d in parallel, then sum ab+cd. This is called a 'reduction' operation, and it allows us to complete an otherwise O(N) operation in O(log(N)) steps.

One particularly nifty pattern is the "Butterfly" pattern of adding elements. Best described with an animation:


In code this can be done as:
for (b=n/2;b>0;b/=2) {
 for (i=0;i<n;i++) { 
         t[i] = OP(sum[i],sum[i^b]); 
 } 
 for (i=0;i<n;i++) { 
         sum[i]=t[i]; 
 } 
}
(Where 'OP' can be any serial commutative operation, e.g. add, subtract, etc.)


To see how this works assume we are adding 8 elements (n=8) then for the first step:
b is 0100
and each subsequent operation will be operating on the following elements:
0000^0100:0100
0001^0100:0101
0010^0100:0110
0011^0100:0111
0100^0100:0000
0101^0100:0001
0110^0100:0010
0111^0100:0011
or in decimal:
t[0] = [0] + [4]
t[1] = [1] + [5] 
t[2] = [2] + [6]
t[3] = [3] + [7]
t[4] = [4] + [0]
t[5] = [5] + [1]
t[6] = [6] + [2]
t[7] = [7] + [3]
Now, for the next step:
b is 0010
0000^0010:0010
0001^0010:0011
0010^0010:0000
0011^0010:0001
0100^0010:0110
0101^0010:0111
0110^0010:0100
0111^0010:0101
Again, in decimal:
t[0] = [0] + [2]
t[1] = [1] + [3]
t[2] = [2] + [0]
t[3] = [3] + [1]
t[4] = [4] + [6]
t[5] = [5] + [7]
t[6] = [6] + [4]
t[7] = [7] + [5]
Finally,
b is 0001
0000^0001:0001
0001^0001:0000
0010^0001:0011
0011^0001:0010
0100^0001:0101
0101^0001:0100
0110^0001:0111
0111^0001:0110
And in decimal:
t[0] = [0] + [1]
t[1] = [1] + [0]
t[2] = [2] + [3]
t[3] = [3] + [2]
t[4] = [4] + [5]
t[5] = [5] + [4]
t[6] = [6] + [7]
t[7] = [7] + [6]

Thus, in only 3 parallel operations we have added 8 elements together. We also have the additional bonus of having broadcast the result to all of the eight parallel adding elements - allowing future operations for each processor to use the result, or in the case of a SIMD system, to mask the appropriate processors.

Here is the code in CUDA:

int i = threadIdx.x; // Thread i holds value x_i, one thread per element
 __shared__ int sum[blocksize]; 
sum[i] = x_i; //copy to shared memory 
__syncthreads(); //synchronise 
for(int bit=blocksize/2; bit>0; bit/=2) {
    int t=OP(sum[i],sum[i^bit]);  
    __syncthreads(); 
    sum[i]=t;  
    __syncthreads(); 
}

Nifty!

Monday, March 01, 2010

Catchup: Graphics Links Post

Again, a set of interesting links from the last few weeks:
And, as per usual, I'll finish off the catchup post with some neat videos. First, one that I think does a great job of explaining the animation process, and then a neat video showing just how far CG has come in film. (I recall speaking to Paul Debevec about Spiderman, and he said the director decided to redo all the actors in all scenes in CG since they looked better and they had more control over facial expressions,etc.)
Cirkus Animation ABC: Stargate studios reel:

Thursday, February 11, 2010

Drishti

Drishti is a real-time interactive volume rendering and animation tool. Paul Bourke
organized a tutorial at WASP/iVEC. The tool is developed by Ajay, a very friendly guy, and very open to user feature-requests.

Drishti has three parts, the renderer, the importer, and the painter. We only covered the renderer and importer.


The importer can import from various file formats, including standard image stacks and raw data. (unsigned characters, Z=1 .. ns
, Y=1 .. wd, 
X=1 .. ht)

To use the importer you just
 drag and drop (raw) data
, then you can adjust the top slider nob to alter contrast
, and left click to add an additional point that you can move to compress the range. You can view the data in different color spaces, and use the sliders to inspect the data. When you save you have a number of additional options including sub-sampling and filtering.



Once you have finished with importing your data and generated the pvl.nc you can drag and drop this into the renderer. Pressing F2 swaps you between high and low-resolution mode.

You can edit the transfer functions to explore the volume. The 2D version depicts the gradients of the data set, and takes a bit of playing around with before you get used to it. You can left/right click to shift the points, add points, make curves, etc throughout the selected volume. Space will bring up additional color maps. You can add new transfer functions to highlight different parts of the volume. The two sliders on the side can be used to set the alpha, and 0.5 each for a gaussian influence instead.

In low-resolution mode you can alter the bounds for the volume by draging on the sides of the box, or using the arrow keys for fine change movements.

Under the preferences tab you can set the step size (ie: quality of the render), or add an axis and labels, etc. Strange things seem to happen when you set the steps too low (< 0.2).

The final thing we discussed was creating keyframe animations. Selecting View, Keyframe editor displays the dialog. You can then click anywhere on the keyframe line, and set the viewport however you like (ie: rotate/zoom) and then click 'add keyframe'. Select another keyframe position, move the camera and add another keyframe, etc, etc. until you have the animation you like. You can move individual keyframes, or shift-left mouse to select an entire region and drag/reposition a whole group of keyframes.



To rotate the camera in another axis or to manually modify the camera positions, etc. use the brick-editor. Press 'a' to show the axis, and you can modify the axis of rotation (eg: alter 1,0,0 to 0,0,1, etc.).

Thats a fast-short introduction to Drishti. Take a look at the gallery for more screenshots and videos - unfortunately few of the fanciest features have videos.

Thursday, February 04, 2010

Catchup Post

A short collection of interesting links/articles recently:
Finally, a video of the top Tron AI from the Google AI Challenge: