Showing posts with label fluid. Show all posts
Showing posts with label fluid. Show all posts

Thursday, June 07, 2012

GPU Technology Conference 2012

nVidia's GPU Technology Conference is over, and a number of presentation slides have been uploaded. There were a quite a few interesting talks relating to graphics, robotics and simulation:
  • Simon Green from nVidia and Christopher Horvath from Pixar presented 'Flame On: Real-Time Fire Simulation for Video Games'. It starts with a recent history of research on CG fluid systems, and gives five tips on better looking fire: 1. Get the colors right (e.g. radiation model), 2. Use high quality advection (not just bilinear filtering), 3. Post process with glow and motion blur. 4. Add noise. 5. Add light scattering and embers. They then go into more detail on Tip #1 looking at the physics behind the black-body radiation in a fire, and the color spectrum.
  • Elmar Westphal of PGI/JCNS-TA Scientific IT-Systems presented 'Multiparticle Collision Dynamics on one or more GPUs', about multiparticle collision dynamics GPU code. He starts by explaining the overall algorithm, and explaining step-by-step what performs well on the GPU. Specific GPU optimisations explained include spatial subdivision lists, reordering particles in memory, hash collisions, and finally dividing workload between multiple GPU's. An interesting read.
  • Michal Januszewski from the University of Silesia in Katowice introduces 'Sailfish: Lattice Boltzmann Fluid Simulations with GPUs and Python'. He explains lattice boltzmann fluid simulation, and some of the different configurations of lattice connectivity and collision operators. Moves into code generation examples, and gives a brief explanation of how the GPU implementation works.
  • Nikos Sismanis, Nikos Pitsianis and Xiaobai Sun (Aristotle University, Duke University) cover 'Efficient k-NN Search Algorithms on GPUs'. Starts with an overview of sorting and K-Nearest Neighbour (KNN) search algorithm solutions, including ANN (approximate NN) and lshkit and moves into results including a comparison of thrust::sort with Truncated Bitonic sort. Software is available at http://autogpu.ee.auth.gr/.
  • Thomas True of nVidia explains 'Best Practices in GPU-Based Video Processing' and covers overlapping copy-to-host and copy-to-device operations, and an example of processing bayer pattern images.
  • Scott Rostrup, Shweta Srivastava, and Kishore Singhal from Synopsys Inc. explain 'Tree Accumulations on GPU' using parallel scatter, parallel reduce and parallel scan algorithms.
  • Wil Braithwaite from nVidia presents an interesting talk on 'Interacting with Huge Particle Simulations in Maya using the GPU'. He begins with a brief runthrough of the workings of the CUDA SPH example, and then moves onto the particle system including Maya's body forces (uniform, radial, vortex), shape representations (implicit, covex hull, signed distance fields, displacement maps), collision response, SPH equations, and finally data transfer. Ends with a brief overview of rendering the particles in screen space. Neat.
  • David McAllister and James Bigler (nVidia) cover the OptiX internals in 'OptiX Out-of-Core and CPU Rendering' including PTX code generation and optimisation, and converting the OptiX backend to support CPU's via Ocelot and LLVM. An interesting result, LLVM does better at optimising "megafunctions" than small functions, but not entirely unexpected given how LLVM works. The presentation finishes with an overview of paging and a tip on bounding volume heirarchies. Good to see Ocelot in the mainstream.
  • Eric Enderton and Morgan McGuire from nVidia explain 'Stochastic Rasterization' (ala 'screen door transparency' rendering) via MSAA for motion blur, depth of field and order-independent transparency, by using a geometry shader to bound the shape and motion of each tri in screen space, and setting up the MSAA masks. Nice.
  • Cliff Woolley presents 'Profiling and Tuning OpenACC Code' (by adding pragmas to C / Fortran code, ala OpenMP) using an example of Jacobi iteration, and there were a number of other talks on the topic.
  • Christopher Bergström introduced 'PathScale ENZO' the alternative to CUDA and OpenCL.
  • Phillip Miller from nVidia did an broad coverage of 'GPU Ray Tracing'. He starts with a myths and claimed facts on GPU raytracing, highlights some commercial GPU raytracers (and the open source OpenCL LuxRenderer) and goes into some details that are better explained in the OptiX Out-of-Core presentation.
  • Phillip Miller follows with 'Advanced Rendering Solutions' where he takes a look at nVidia's iray, and where they believe they can introduce new capabilities for design studios and find a middle ground with re-lighting and physcially based rendering.
  • Peter Messmer presents 'CUDA Libraries and Ecosystem Overview', where he provides an overview of the linear algebra cuBLAS and cuSPARSE libraries performance, then moves to signal processing with cuFFT and NPP/VSIP for image processing, next is random numbers via cuRAND and finally ties things up with Thrust.
  • Jeremie Papon and Alexey Abramov discuss the 'Oculus real-time modular cognitive visual system' including GPU accelerated stereo disparity matching, likelihood maps and image segmentation with a parallel metropolis algorithm.
  • Jérôme Graindorge and Julien Houssay from Alyotech present 'Real Time GPU-Based Maritime Scenes Simulation' beginning with ocean simulation and rendering from FFT based wave simulation using HF and LF heightmap components. They then cover rendering the mesh, scene illumination and tone mapping, and a sneak peak at boat interaction. The ocean simulation video is neat.
  • Dan Negrut from the Simulation-Based Engineering Lab at the University of Wisconsin–Madison gives an overview of the labs multibody dynamics work in 'From Sand Dynamics to Tank Dynamics' including friction, compliant bodies, multi-physics (fluid/solid interactions), SPH, GPU solution to the cone complementary problem, ellipsoid-ellipsoid CCD, multi-CPU simulation, and finally vehicle track simulation in sand. Wow. Code is available on the Simulation-Based Engineering Lab website.
  • Max Rietmann of USI Lugano looks at seismology (earthquake simulation) in 'Faster Finite Elements for Wave Propagation Codes' and describes parallising FEM methods for GPUs in SPECFEM3D.
  • Dustin Franklin from GE introduces GE's MilSpec ruggedised Kepler-based GPU solutions and Concurrent Redhawk6 in 'Sensor Processing with Rugged Kepler GPUs'. Looks at some example applications including hyperspectral imaging, mosaicing, 360 degree vision, synthetic aperture radar processing, and space-time adaptive processing for moving target identification.
  • Graham Sanborn of FunctionBay presents 'Particle Dynamics with MBD and FEA Using CUDA' and gives a brief overview of their combined CPU/GPU multi-body FEA system and briefly describes the contact, contact force, and integration steps.
  • Ritesh Patel and Jason Mak of University of California-Davis cover the Burrows-Wheeler Transform, Move-to-Front Transform and Huffman Coding in 'Lossless Data Compression on GPUs'. They find merge sort for BWT performs best on the GPU, explain the parallel MTF transform and Huffman in illustrative detail and tie things up with benchmarks, unfortunately GPU is 2.78x slower than CPU.
  • Nikolai Sakharnykh and Nikolay Markovskiy from NVIDIA provide an indepth explanation of their GPU implementation of solving ADI with tridiagonal systems in '3D ADI Method for Fluid Simulation on Multiple GPUs'.
  • Enrico Mastrostefano, Massimo Bernaschi, and Massimiliano Fatica investigate breadth first search in 'Large Graph on multi-GPUs' and describe how best to parallelise it across multiple GPU's by using adjacency lists and level frontiers to minimise the data exchange.
  • Bob Zigon from Beckman Coulter presents '1024 bit Parallel Rational Arithmetic Operators for the GPU' and covers exact 1024 bit rational arithmetic (add,sub,mul,div) for the GPU. Get the 1024 bit arithmetic code here.
  • Roman Sokolov and Andrei Tchouprakov of D4D Technologies discuss 'Warped parallel nearest neighbor searches using kd-trees' where they take a SIMD style approach by grouping tree searches via voting (ballot)
  • David Luebke from nVidia takes a broad look at CG in 'Computational Graphics: An Overview of Graphics Research @ NVIDIA' and provides an overview of research which is featured in a number of previous talks and other GTC talks including edge aware shading, ambient occlusion via volumes and raycasting, stochastic rendering, improved image sampling and reconstruction, global illumination, and CUDA based rasterization.
  • Johanna Beyer and Markus Hadwiger from King Abdullah University of Science and Technology discuss 'Terascale Volume Visualization in Neuroscience' where each cubic mm of the brain scanned with an electron microscope generates 800 tereabytes of data. The idea here is to leverage the virtual memory manager to do all the intelligent caching work, rather than a specialised spatial datastructure for the volume rendering.
  • Mark Kilgard introduces the NV_path_rendering extension in 'GPU-Accelerated Path Rendering', and demonstrates using the GPU to render PDF, flash, clipart, etc. Contains some sample code.
  • Janusz Będkowski from the Warsaw University of Technology presented 'Parallel Computing In Mobile Robotics For RISE' a full GPGPU solution for processing mobile robot laser scan data through to navigation. Starts with data registration into a decomposed grid which is then used for scan matching with point-to-point Iterative Closest Point. Next is estimating surface normals using principle component analysis, demonstrated on velodyne datasets. This is used to achieve point-to-plane ICP and he demonstrates a 6D SLAM loop-closure. Finishes it all off with a simple gradient based GPU path planner.
Note that in recent days more presentation PDF's have been uploaded so there is still plenty to look through, and with all the content it's difficult to look through it all - take a look yourself! I'll leave you with a video from the GTC 2012 keynote on rendering colliding galaxies:

Sunday, June 05, 2011

Catchup Post: Robotics and Physical Simulations

A number of robotics related bits of interest from the last few months:
Finally Heat-1 the space rocket build by danish amateurs Copenhagen Suborbitals successfully launched. (TED talk here). See the video below.

Monday, February 07, 2011

Ripple effect in WebGL




There is really nothing special to the ripple effect, in essence it is just an evaluation of the sombrero function. This is just a 2D version of the sinc function.




Below is a graph of the sin function, followed by sinc.
Left: sin(x);   Right: sin(x) / x

Extending this to 2D we have:
r = sqrt (x*x + y*y);
z = sin (r) / r;

Plotting this as (x,y,z) we have the sombrero function:

To turn it into a ripple-like looking effect we just need to scale down the "height" of the function, and animate it by adding time into the sin term to generate a repeating pulse. The complete GLSL code is:

#ifdef GL_ES
precision highp float;
#endif

uniform float time;
uniform vec2 resolution;
uniform sampler2D tex;

void main(void) {
vec2 cPos = -1.0 + 2.0 * gl_FragCoord.xy / resolution.xy;
float cLength = length(cPos);

vec2 uv = gl_FragCoord.xy/resolution.xy+(cPos/cLength)*cos(cLength*12.0-time*4.0)*0.03;
vec3 col = texture2D(tex,uv).xyz;

gl_FragColor = vec4(col,1.0);
}

Your browser doesn't appear to support the HTML5 <canvas> element.

Wednesday, December 29, 2010

Catchup Post: Graphics

ShaderToy
Another set of interesting links:

Sunday, September 12, 2010

SIGGRAPH 2010 Course Papers Overview

I've managed to work through the SIGGRAPH 2010 course content that relates to realtime rendering. I quite liked the Toy Story 3 rendering techniques and realtime rendering survey by nVidia- just because they give a nice overview. As always, there are a number of great presentations, and I've listed the ones that I found interesting below.
  • Toy Story 3 : The Video Game Rendering Techniques from Avalanche Software gives a great (211 page!) overview of a number of lighting issues for the game including SSAO (various optimizations/approximations for how/where to sample, faking more samples and dealing with large differences in depth), ambient lighting (without needing to bake it or do GI) and various aspects on shadows. A great read!
  • Surveying Real-Time Rendering Algorithms by David Luebke from nVidia gives an excellent short overview of a number of recent developments in realtime rendering algorithms including stochastic transparency (ie : transparency via random sampling), sample distribution for shadow maps (partitioning the scene in light-space), alias-free shadow maps, append-consume order-independent-transparency (sorting per-pixel & linked-lists), progressive photon mapping, image-space photon mapping, ambient occlusion volumes (how to speed it up with bitwise occlusion mask for each triangle - one per edge, and one for the triangle plane), stochastic rasterization (of 4d triangles)
  • Keeping Many Cores Busy: Scheduling the Graphics Pipeline by Jonathan Ragan-Kelly from MIT gives a great overview of the graphics pipeline stages (from Input Assembler, Vertex Shader, Primitive Assembler, Tesselation, Geometry Shader, Rasterizer, Pixel Shader, and finally Output Blending) and load balancing.
  • Uncharted 2 - Character Lighting and Shading by John Hable from Naughty Dog gives a fabulous overview of rendering issues with skin (in a lot of detail!), hair and clothes.
  • Bending the Graphics Pipeline by Johan Andersson from DICE describes tile-based deferred shading (for GPU and SPU), morphological antialiasing and analytical ambient occlusion.
  • A real-time Radiosity Architecture for Video Games by Sam Martin and Per Einarsson from DICE/Geomerics introduce the 'Enlighten' system for realtime GI - it gives a nice overview.
  • Evolving the Direct3D Pipeline for Real-­time Micropolygon Rendering by Kayvon Fatahalian from Stanford University gives an interesting insight on Micropolygon rendering on current GPU pipelines.
  • Water Flow in Portal 2 by Alex Vlachos - I've already written about this previously, just another realtime technique for faking the simulation and rendering of water flow.
  • Making Concept Real For Borderlands by Gearbox Software contains some nice examples of their concept art, the development change from photorealistic to stylistic rendering and art (and the code/artist balance), and the sobel edge filter they used.
  • The notes from the volumetric course was broken into parts:

    1. "Resolution Independent Volumes" - which describes the "Field Expression Language Toolkit", cloud modelling (via density displacement), morphing objects (by using the Nacelle Algorithm to generate warp fields), cutting models, fluid dynamics, gridless advection, and semi-lagrangian mapping (balancing between grids and non-grids).
    2. "Mantra Volume Rendering" - this describes the volume rendering engine for Houdini.
    3. "Volumetric Modeling and Rendering" describes the volumetrics library/API developed at Dreamworks.
    4. "Double Negative Renderer B" (used in Batman Begins, Harry Potter, etc.) describes the workflow and various shaders (Fluid, Particle, Voxel, Fire, Smoke) in DNB
    5. "Volume Rendering at Sony Pictures Imageworks". The section from Sony Imageworks included an overview of their pipeline and content on their open source field/fluid tools.

Thursday, February 11, 2010

Drishti

Drishti is a real-time interactive volume rendering and animation tool. Paul Bourke
organized a tutorial at WASP/iVEC. The tool is developed by Ajay, a very friendly guy, and very open to user feature-requests.

Drishti has three parts, the renderer, the importer, and the painter. We only covered the renderer and importer.


The importer can import from various file formats, including standard image stacks and raw data. (unsigned characters, Z=1 .. ns
, Y=1 .. wd, 
X=1 .. ht)

To use the importer you just
 drag and drop (raw) data
, then you can adjust the top slider nob to alter contrast
, and left click to add an additional point that you can move to compress the range. You can view the data in different color spaces, and use the sliders to inspect the data. When you save you have a number of additional options including sub-sampling and filtering.



Once you have finished with importing your data and generated the pvl.nc you can drag and drop this into the renderer. Pressing F2 swaps you between high and low-resolution mode.

You can edit the transfer functions to explore the volume. The 2D version depicts the gradients of the data set, and takes a bit of playing around with before you get used to it. You can left/right click to shift the points, add points, make curves, etc throughout the selected volume. Space will bring up additional color maps. You can add new transfer functions to highlight different parts of the volume. The two sliders on the side can be used to set the alpha, and 0.5 each for a gaussian influence instead.

In low-resolution mode you can alter the bounds for the volume by draging on the sides of the box, or using the arrow keys for fine change movements.

Under the preferences tab you can set the step size (ie: quality of the render), or add an axis and labels, etc. Strange things seem to happen when you set the steps too low (< 0.2).

The final thing we discussed was creating keyframe animations. Selecting View, Keyframe editor displays the dialog. You can then click anywhere on the keyframe line, and set the viewport however you like (ie: rotate/zoom) and then click 'add keyframe'. Select another keyframe position, move the camera and add another keyframe, etc, etc. until you have the animation you like. You can move individual keyframes, or shift-left mouse to select an entire region and drag/reposition a whole group of keyframes.



To rotate the camera in another axis or to manually modify the camera positions, etc. use the brick-editor. Press 'a' to show the axis, and you can modify the axis of rotation (eg: alter 1,0,0 to 0,0,1, etc.).

Thats a fast-short introduction to Drishti. Take a look at the gallery for more screenshots and videos - unfortunately few of the fanciest features have videos.

Sunday, January 10, 2010

GPU update

A few updates in the GPU world for the last month:
Tony Albrecht from Sony released his slides on Pitfalls of Object Oriented Programming, which describes how to write optimal C++ code. (I'm a strong believer in C++ for all problems). Tony demonstrates how to convert 'typical' C++ code for managing a scene graph into optimal code getting a 6x speedup from using a custom allocator to ensure linear SOA data, removing conditionals (doing more calculations since compute is faster than branch), flattening the scene hierarchy, and prefetching data. It's a great read! (a bit light on some details though!)
Finally, this excellent video of a DirectCompute fluid simulation was released by Jan, you can get the fluid source code from his website.

Wednesday, November 25, 2009

Catchup post

It's been a while since I updated, reason being it is exams/assignments marking period, and I had two GPU industry projects due (3D groundwater flow fluid simulation and a pathfinding/travelling salesman with profits project). 

The biggest news item was that the ACM decided to start (over)enforcing its rules saying that you can not link to preprint and author pages. Thankfully, it started a call-to-arms and prominent pages like Ke-Sen Huang's SIGGRAPH links have been restored. I wonder how many less public pages have silently slipped away. Frankly, I can't wait until the concept of conferences and journals disappear. My websites have always had far more impact that my publications, and it can't be long until the same can be said universally.

A short update with some interesting things in the last while:


   

Saturday, August 22, 2009

Rigid Body Simulation & Deformables

The most brilliant idea I've seen in a long time for fast general rigid body simulation,
Statistical Simulation of Rigid Bodies
.

We begin by collecting statistical data regarding changes in linear and angular momentum for collisions of a given object. From this data we extract a statistical “signature” for the object, giving a compact representation of the object’s response to collision events. During object simulation, both the collision detection and the collision response calculations are replaced by simpler calculations based on the statistical signature.

I've always thought something like this could be done, now we know how. Genius.

In related rigid body news:
Anisotropic Friction for Deformable Surfaces and Solids, for the next time you need a friction model.

A Point-based Method for Animating Elastoplastic Solids is a paper that presents another method to combine fluid and solid simulation using some of the same concepts from Mullers SPH paper. The code is built up from Adaptively Sampled Particle Fluids source. No code for this method though (yet). Haven't had an indepth look, but seems like a nicer approach than I previously discussed.

Fluids & Volumes

Another catchup post:
Fluid Simulation with Articulated Bodies from gatech has some nice results, would could make a great basis for a new version of Karl Sims swimmers.
Sony has released Field3d, a open file format for volume data, which seems to take a lot of the low level work away from you when working with voxels, it supports slices,etc. as well.
Matthias Müller also has a new publication, Fast and Robust Tracking of Fluid Surfaces which describes how to efficiently generate surface meshes from fluid simulations.I've not read it fully yet, but presumably it is worthwhile.
I also stumbled accross some more open source fluid projects, Fire Dynamics Simulator and Smokeview from NIST which has some good looking fire & smoke simulations of houses burning, etc. and a new project nsproj, that has a parallel 3D Navier Stokes solver.

Wednesday, August 05, 2009

ATI Stream SDK includes OpenCL

The new ATI stream SDK includes a CPU implementation of OpenCL.

AMD released a demo of a 2d fluid simulation using 24 cores.


I have to say, I'm not very impressed. I'd be keen to see how this holds up to an OpenCL GPU implemetnation, it certainly seems like CUDA has a strong position for now. They do have a good OpenCL introductory tutorial though.

In other news, nVidia released OptiX a realtime raytracing "engine", whatever that is supposed to mean..

Wednesday, July 22, 2009

Groundwater Flow on GPU

I am presently developing a proof-of-concept groundwater flow GPU program. The algorithm works similar to a finite differences algorithm, or many kernel-based image processing systems.

Initially I just ported the code over to CUDA, and thanks to having done this a few times before the CPU version I wrote was easy to transfer. A good time saver is to use macros to index arrays, etc. This makes it easy to swap in __mul24, or tex2D etc. later.

The next step was to use shared memory to buffer the input data, this gave the biggest performance boost. Finally I did some arithmetic optimizations for another small gain.

On the advice of a friend I tried re-structuring the GPU program to use 'aprons' similar to the 'Separable Convolution' SDK sample, and tried restructuring read/writes. This all made almost no difference at all, so it seems that the 'sweet-spot' can be hit quite quickly as soon as you have done the obvious shared memory and arithmetic optimizations. The overall structure of the program seems to make little difference.

A common bit of advice is to leverage the texture units on the GPU's, but a simple modification of the 'Texture-based Separable Convolution' sample program, reveals it is infact almost twice as slow as the non-texture based. Seems like the texturing unit speedups are a bit of a myth.

Benchmarking the program has been a bit of a problem, since it is very dependent on the type of GPU you have, and the problem size. If I process a few hundred thousand nodes, the speedup is around 15x over the CPU, but when I move to processing tens of millions of nodes, the speedup is over 20x. Processing the same data on a slightly older GPU will only give a 4x speedup (Compute 1.0).

All in all I've found it extremely difficult to give an overall answer on the performance gain of the GPU. It seems to be highly dependent on the problem size (the bigger the better - thank god!) and the GPU technology (this is going to make porting the software to multiple GPUs a pain!).

Saturday, May 30, 2009

Link post

Some assorted interesting sounding tech:

Thrust is a STL-like tech for CUDA programming. Could be interesting, certainly looks like it is easier to use than CUDA, but just from my feelings it probably won't be worth anything until nVidia let multiple CUDA functions execute in parallel..

5 optimization tips for CUDA, a nice succinct roundup of some good performance tips for CUDA, including Arithmetic Instruction optimization.

DANCE framework for Dynamic Animation and Control, definately something I want to check out for potential synergy with the Physics engine Abstraction Layer. Proper animation controllers is something PAL lacks.

ReactPhysics3d a new open source physics engine, very much in the early stages of development, I've seen a number of engines grow in time, I hope this one succeeds too, but it will need to find a niche if it is to survive..

Predictive-Corrective Incompressible SPH is a paper on estimating the SPH state without explicitly evaluating it, thereby saving some CPU cycles. I actually had a similar idea, so maybe this is a path for further research. Also on the SPH front, Co-rotated SPH for Deformable Solids, the idea is great, not convinced by their particular implementation though..

Perhaps these ideas might make it into the SPH routines for PAL..

and on the lighter side;

2D boy's prototyping framework for the world of goo.

Wednesday, May 20, 2009

Waterboard


Designer Mike Burton has developed a wall of water, called ‘the Waterboard’. Is is in interactive whiteboard that allows users to manipulate the flow of water. Nifty!

Check out the video on youtube, and some more info here.

Friday, May 08, 2009

Particle Systems

When implementing a large particle system, chances are that will have to interact with something. Often, with other particles (as in the case of SPH).

The most obvious approach is a uniform grid. As soon as you think you might want a large range, the idea of a hashed grid will cross your mind. Turns out, these are great data structures for a GPU. If your new to the topic, Christer Ericson's Real-time collision detection book offers a nice introduction to the topic and some optimizations.

My first implementation for my particle system was a straight froward uniform grid. To avoid dynamic memory allocation I just had a static maximum sized array, (which turned out only to need to be able to store 8 elements max), so this isn't too bad. If you start having larger cell sizes, or start hashing the cells then you might end up with a problem if you limit how many cells you can store. Otherwise from information I've gathered in the CUDA forums, this approach should be relatively fast on the GPU. (Trading a lot of GPU memory for speed)

So this approach basically goes like:
1. Clear, then populate a grid
(calculate the particles grid position, add it to the 'list' for this cell in the grid)
2. For each particle, look up its grid position, and pull out a list of other particles in this cell, do the same for all neighboring cells.
3. Run through this list, and do your comparisons with your current particle.

You might need to sort this list and eliminate duplicates, depending on how you handle corner/border cases. (Since this list is usually <8 elements, you don't need a fancy sort routine.)

Some nice websites on sorting routines, animated sorting and not-animated sorting. (ie : Bubble/Insert should do just fine.)

If you start hashing cells and having larger numbers of particles occupying one cell, (or from now on, 'bucket') you might want a different approach. This is where the technique nVidia uses shines.

The best resource on this I've found is Simon Green's GDC08 presentation. Also, Mark Harris' CUDA fluids presentation is worthwhile as it goes into a bit more depth on the CUDA/sorting front. (well, actually the best resource was Mark Harris himself, but I didn't actually know he worked on this specifically (PhysX Fluids) until after asking him stupid questions. I probably should have googled a bit more first to save the embarrassment).

Anyway, the approach goes like this:
First you build the data structures you need:
1. Populate a list with a pair of values, corresponding to the particle ID and the cell it occupies.
2. Sort this list (so that you can figure out which particles are in the same cell)
3. Create a grid structure which contains the index to this list where the first particle entry appears

Then to find neighbors for each particle:
1. For each neighboring cell
2. Calculate the cell id, look it up in the grid to get the start index in the list
3. Step through the list doing your particle-particle calculations until a new cell id occurs.

Again, with pictures:

For each particle, we can figure out a cell id.
eg: Particle 0, has a cell id of 9.

Now we can add this to a list, which contains the cell id, and particle pairs.


Then we sort this list, and populate our grid structure with -1 if it is empty, or the earliest index into this list if it contains a particle.



Great, again, how to use this?

For each particle, we look it up into the grid. This gives us the starting index in our list. Then, all the particles in this cell can be referenced from the pairs in the list.

For example, particle 4 is in cell 6, so we look up cell 6 in the grid. This gives us a list index of 2.

Looking into our list at index 2, we see particle 1, and following until we no longer have a cell id of 6, we get particles 1,2,4. (Not particle 0, it has cell index 9!)

So thats how we have a nice efficiently growing grid on the GPU with one big hash bucket. All we need to do is have a GPU friendly sorting routine.

How convenient, that we already have a fast radix GPU sort in the CUDA sdk:
CUDA Radix sort (again, from Mark Harris & co.). (No comparison to GPU QuickSort, who knows which is faster..)

Wednesday, May 06, 2009

GPU Fluids - iVEC report


iVEC have released their student project reports, including the report by Niel Osborne on our work on GPU fluid simulation.

Monday, February 16, 2009

Smoothed Particle Hydrodynamics on the GPU

I've been working together with Neil Osbourne on a GPU version of my SPH code. He tried a number of things and managed to get a significant speedup. He presented his results at the iVEC eResearch Forum 2009. Main lessons learnt were to keep the number of CUDA threads approx equal to the number of hardware processors, not to do memory optimizations too early on, and ofcourse, to do shared memory optimizations later on.

Never managed to get a hashed grid going within the time we had, but it was still a cool project. With a bit more effort it should find it's way into PAL or Bullet.