Adrian Boeing: Blog: fluid

Showing posts with label fluid. Show all posts

Thursday, June 07, 2012

GPU Technology Conference 2012

nVidia's GPU Technology Conference is over, and a number of presentation slides have been uploaded. There were a quite a few interesting talks relating to graphics, robotics and simulation:

Simon Green from nVidia and Christopher Horvath from Pixar presented 'Flame On: Real-Time Fire Simulation for Video Games'. It starts with a recent history of research on CG fluid systems, and gives five tips on better looking fire: 1. Get the colors right (e.g. radiation model), 2. Use high quality advection (not just bilinear filtering), 3. Post process with glow and motion blur. 4. Add noise. 5. Add light scattering and embers. They then go into more detail on Tip #1 looking at the physics behind the black-body radiation in a fire, and the color spectrum.
Elmar Westphal of PGI/JCNS-TA Scientific IT-Systems presented 'Multiparticle Collision Dynamics on one or more GPUs', about multiparticle collision dynamics GPU code. He starts by explaining the overall algorithm, and explaining step-by-step what performs well on the GPU. Specific GPU optimisations explained include spatial subdivision lists, reordering particles in memory, hash collisions, and finally dividing workload between multiple GPU's. An interesting read.

Michal Januszewski from the University of Silesia in Katowice introduces 'Sailfish: Lattice Boltzmann Fluid Simulations with GPUs and Python'. He explains lattice boltzmann fluid simulation, and some of the different configurations of lattice connectivity and collision operators. Moves into code generation examples, and gives a brief explanation of how the GPU implementation works.
Nikos Sismanis, Nikos Pitsianis and Xiaobai Sun (Aristotle University, Duke University) cover 'Efficient k-NN Search Algorithms on GPUs'. Starts with an overview of sorting and K-Nearest Neighbour (KNN) search algorithm solutions, including ANN (approximate NN) and lshkit and moves into results including a comparison of thrust::sort with Truncated Bitonic sort. Software is available at http://autogpu.ee.auth.gr/.
Thomas True of nVidia explains 'Best Practices in GPU-Based Video Processing' and covers overlapping copy-to-host and copy-to-device operations, and an example of processing bayer pattern images.
Scott Rostrup, Shweta Srivastava, and Kishore Singhal from Synopsys Inc. explain 'Tree Accumulations on GPU' using parallel scatter, parallel reduce and parallel scan algorithms.

Wil Braithwaite from nVidia presents an interesting talk on 'Interacting with Huge Particle Simulations in Maya using the GPU'. He begins with a brief runthrough of the workings of the CUDA SPH example, and then moves onto the particle system including Maya's body forces (uniform, radial, vortex), shape representations (implicit, covex hull, signed distance fields, displacement maps), collision response, SPH equations, and finally data transfer. Ends with a brief overview of rendering the particles in screen space. Neat.
David McAllister and James Bigler (nVidia) cover the OptiX internals in 'OptiX Out-of-Core and CPU Rendering' including PTX code generation and optimisation, and converting the OptiX backend to support CPU's via Ocelot and LLVM. An interesting result, LLVM does better at optimising "megafunctions" than small functions, but not entirely unexpected given how LLVM works. The presentation finishes with an overview of paging and a tip on bounding volume heirarchies. Good to see Ocelot in the mainstream.

Eric Enderton and Morgan McGuire from nVidia explain 'Stochastic Rasterization' (ala 'screen door transparency' rendering) via MSAA for motion blur, depth of field and order-independent transparency, by using a geometry shader to bound the shape and motion of each tri in screen space, and setting up the MSAA masks. Nice.
Cliff Woolley presents 'Profiling and Tuning OpenACC Code' (by adding pragmas to C / Fortran code, ala OpenMP) using an example of Jacobi iteration, and there were a number of other talks on the topic.
Christopher Bergström introduced 'PathScale ENZO' the alternative to CUDA and OpenCL.
Phillip Miller from nVidia did an broad coverage of 'GPU Ray Tracing'. He starts with a myths and claimed facts on GPU raytracing, highlights some commercial GPU raytracers (and the open source OpenCL LuxRenderer) and goes into some details that are better explained in the OptiX Out-of-Core presentation.
Phillip Miller follows with 'Advanced Rendering Solutions' where he takes a look at nVidia's iray, and where they believe they can introduce new capabilities for design studios and find a middle ground with re-lighting and physcially based rendering.

Peter Messmer presents 'CUDA Libraries and Ecosystem Overview', where he provides an overview of the linear algebra cuBLAS and cuSPARSE libraries performance, then moves to signal processing with cuFFT and NPP/VSIP for image processing, next is random numbers via cuRAND and finally ties things up with Thrust.
Jeremie Papon and Alexey Abramov discuss the 'Oculus real-time modular cognitive visual system' including GPU accelerated stereo disparity matching, likelihood maps and image segmentation with a parallel metropolis algorithm.

Jérôme Graindorge and Julien Houssay from Alyotech present 'Real Time GPU-Based Maritime Scenes Simulation' beginning with ocean simulation and rendering from FFT based wave simulation using HF and LF heightmap components. They then cover rendering the mesh, scene illumination and tone mapping, and a sneak peak at boat interaction. The ocean simulation video is neat.
Dan Negrut from the Simulation-Based Engineering Lab at the University of Wisconsin–Madison gives an overview of the labs multibody dynamics work in 'From Sand Dynamics to Tank Dynamics' including friction, compliant bodies, multi-physics (fluid/solid interactions), SPH, GPU solution to the cone complementary problem, ellipsoid-ellipsoid CCD, multi-CPU simulation, and finally vehicle track simulation in sand. Wow. Code is available on the Simulation-Based Engineering Lab website.
Max Rietmann of USI Lugano looks at seismology (earthquake simulation) in 'Faster Finite Elements for Wave Propagation Codes' and describes parallising FEM methods for GPUs in SPECFEM3D.
Dustin Franklin from GE introduces GE's MilSpec ruggedised Kepler-based GPU solutions and Concurrent Redhawk6 in 'Sensor Processing with Rugged Kepler GPUs'. Looks at some example applications including hyperspectral imaging, mosaicing, 360 degree vision, synthetic aperture radar processing, and space-time adaptive processing for moving target identification.
Graham Sanborn of FunctionBay presents 'Particle Dynamics with MBD and FEA Using CUDA' and gives a brief overview of their combined CPU/GPU multi-body FEA system and briefly describes the contact, contact force, and integration steps.
Ritesh Patel and Jason Mak of University of California-Davis cover the Burrows-Wheeler Transform, Move-to-Front Transform and Huffman Coding in 'Lossless Data Compression on GPUs'. They find merge sort for BWT performs best on the GPU, explain the parallel MTF transform and Huffman in illustrative detail and tie things up with benchmarks, unfortunately GPU is 2.78x slower than CPU.
Nikolai Sakharnykh and Nikolay Markovskiy from NVIDIA provide an indepth explanation of their GPU implementation of solving ADI with tridiagonal systems in '3D ADI Method for Fluid Simulation on Multiple GPUs'.
Enrico Mastrostefano, Massimo Bernaschi, and Massimiliano Fatica investigate breadth first search in 'Large Graph on multi-GPUs' and describe how best to parallelise it across multiple GPU's by using adjacency lists and level frontiers to minimise the data exchange.
Bob Zigon from Beckman Coulter presents '1024 bit Parallel Rational Arithmetic Operators for the GPU' and covers exact 1024 bit rational arithmetic (add,sub,mul,div) for the GPU. Get the 1024 bit arithmetic code here.
Roman Sokolov and Andrei Tchouprakov of D4D Technologies discuss 'Warped parallel nearest neighbor searches using kd-trees' where they take a SIMD style approach by grouping tree searches via voting (ballot)
David Luebke from nVidia takes a broad look at CG in 'Computational Graphics: An Overview of Graphics Research @ NVIDIA' and provides an overview of research which is featured in a number of previous talks and other GTC talks including edge aware shading, ambient occlusion via volumes and raycasting, stochastic rendering, improved image sampling and reconstruction, global illumination, and CUDA based rasterization.
Johanna Beyer and Markus Hadwiger from King Abdullah University of Science and Technology discuss 'Terascale Volume Visualization in Neuroscience' where each cubic mm of the brain scanned with an electron microscope generates 800 tereabytes of data. The idea here is to leverage the virtual memory manager to do all the intelligent caching work, rather than a specialised spatial datastructure for the volume rendering.
Mark Kilgard introduces the NV_path_rendering extension in 'GPU-Accelerated Path Rendering', and demonstrates using the GPU to render PDF, flash, clipart, etc. Contains some sample code.

Janusz Będkowski from the Warsaw University of Technology presented 'Parallel Computing In Mobile Robotics For RISE' a full GPGPU solution for processing mobile robot laser scan data through to navigation. Starts with data registration into a decomposed grid which is then used for scan matching with point-to-point Iterative Closest Point. Next is estimating surface normals using principle component analysis, demonstrated on velodyne datasets. This is used to achieve point-to-plane ICP and he demonstrates a 6D SLAM loop-closure. Finishes it all off with a simple gradient based GPU path planner.

Note that in recent days more presentation PDF's have been uploaded so there is still plenty to look through, and with all the content it's difficult to look through it all - take a look yourself! I'll leave you with a video from the GTC 2012 keynote on rendering colliding galaxies:

Sunday, June 05, 2011

Catchup Post: Robotics and Physical Simulations

A number of robotics related bits of interest from the last few months:

Articulated Swimming Creatures by Jie Tan et. al actually performed work I had planned for my thesis, but never did complete. They simulated various creatures using different fluid simulation methods, and found they get different results. This was the same thing I found, and generalised via PAL and used to generate control algorithms that can transfer to the real world for an underwater vehicle.
Predator: A Visual Tracker that Learns from its Errors by Zdenek Kalal went viral. I think it was a bit hyped (thanks to a clever name), but the TLD algorithm is interesting, and there are some worthwhile items to note. First of all it is an open source visual tracker, and that there is a freely available
Google Tech Talk on TLD. In essence, Zdenek asserts that a tracker based on similarity of image patches will always lose its target and demonstrates trackers with learnt models from image data perform very well. (Similar to how models improve tracking in OpenTL). Zdenek et al. describe a way to generate more test sets using two experts (positive - objects must remain nearby, negative - only one true location per frame) and leverages boosting to get results. The total flow of operations are then: 1. Tracking and prediction, 2. Expert systems plus Detection, into Learning. 3. Re-Detection, using learnings.
Paul Firth published a great introductory article on Physics Engines. Very handy. While your at it, Wolfgang Engel did a great post on some maths basics.
Hierarchical Approximate Convex Decomposition of 3D Meshes (HACD) (HACD paper) is a method for representing objects as compound convex hulls (speeds up collision detection), that seems to outperform any other method I have seen.
C2A by Min Tang et. al is a method for speeding up continuous collision detection with conservative advancement. A good paper, and source code is available.
PAPPE is a Pascal Physics Engine.
Barbara Solenthaler and Markus Gross published work on Two-Scale SPH Particle Simulation, an obvious idea (speedup simulations by mixing level of detail for SPH, they get about 2.2x speedups), someone had to do it. The videos are interesting nevertheless.
On the topic of SPH, PhysXinfo has an article on the latests nVidia PhysX/Novodex/ETH research on fluid systems, of particular interest is Solid Simulation with Oriented Particles. Another interesting post is by Miles Macklin on fluid simulations.
Intel released 'Colony' a crowd-simulation system, comes with source code.
Lectures on machine learning with Matlab. Covers SVM, NN, EM, Bayesian methods, and image analysis.
Ant colony optimization will be the next google AI challenge.
An XML Unified Robot Description Format (URDF), again, like many other formats before it it has inherit limitations. COLLADA 1.5 Kinematics (library_kinematics_models) is probably a better choice.
On the topic of XML, and other data interchange formats JSON cpp is a C++ library to handle exchanging JSON data.
ACRA, the Australasian Conference on Robotics and Automation will take place in December in Melbourne.
And just incase you've been hiding under a rock for the last while, the next big DARPA competition is the ARM manipulation program. Lots of interesting results are coming out of that project, but thats for another time.

Finally Heat-1 the space rocket build by danish amateurs Copenhagen Suborbitals successfully launched. (TED talk here). See the video below.

Monday, February 07, 2011

Ripple effect in WebGL

There is really nothing special to the ripple effect, in essence it is just an evaluation of the sombrero function. This is just a 2D version of the sinc function.

Below is a graph of the sin function, followed by sinc.

Left: sin(x); Right: sin(x) / x

Extending this to 2D we have:

r = sqrt (x*x + y*y);
z = sin (r) / r;

Plotting this as (x,y,z) we have the sombrero function:

To turn it into a ripple-like looking effect we just need to scale down the "height" of the function, and animate it by adding time into the sin term to generate a repeating pulse. The complete GLSL code is:



#ifdef GL_ES

precision highp float;

#endif



uniform float time;

uniform vec2 resolution;

uniform sampler2D tex;



void main(void) {

vec2 cPos = -1.0 + 2.0 * gl_FragCoord.xy / resolution.xy;

float cLength = length(cPos);



vec2 uv = gl_FragCoord.xy/resolution.xy+(cPos/cLength)*cos(cLength*12.0-time*4.0)*0.03;

vec3 col = texture2D(tex,uv).xyz;



gl_FragColor = vec4(col,1.0);

}

Wednesday, December 29, 2010

Catchup Post: Graphics

ShaderToy

Another set of interesting links:

Inigo Quilez's Demoscene Shader Toy is absolutely fantastic. It is like the modern day version of Grant Smith's (Denthor) Asphyxia Tutorials. Many demoscene effects in glsl shaders, including a whole bunch with raytracing and ambient occlusion, etc. You will need a webgl compatible browser (e.g. the new version of Chrome (v9 - currently beta)). With any luck I'll spend some time in future describing how each effect is achieved.
A web based mandelbrot fractal viewer with a google-maps interface.
A short explanation with some nice visual examples of how procedural buildings are generated in Unreal Engine 3.
Once you have built your city, you can use more procedural tools to destroy a city.
Visual glBlendFunc tool, for helping you decide what blending you would like. (Additive glBlendFunc(GL_ONE,GL_ONE) and Screen glBlendFunc(GL_ONE,GL_ONE_MINUS_SRC_COLOR) are useful ones)
Color Scheme Designer 3 helps you pick good colors.
An interesting one kilobyte javascript christmas tree with an explanation of how its done.
The tiny 3d modeler Qoob has made great progress, and the library is now public. Should help with those 64kb intros/games.
Open Asset Import Library has made good progress and can now load a large number of 3d model formats.
The OpenGL / OpenCL gDEBugger is now free.
AMD's triangle order optimization tool has had its final release.
AMD OpenCL presentation on Smooth Particle Hydrodynamics. Now everybody is doing it...

Sunday, September 12, 2010

SIGGRAPH 2010 Course Papers Overview

I've managed to work through the SIGGRAPH 2010 course content that relates to realtime rendering. I quite liked the Toy Story 3 rendering techniques and realtime rendering survey by nVidia- just because they give a nice overview. As always, there are a number of great presentations, and I've listed the ones that I found interesting below.

Toy Story 3 : The Video Game Rendering Techniques from Avalanche Software gives a great (211 page!) overview of a number of lighting issues for the game including SSAO (various optimizations/approximations for how/where to sample, faking more samples and dealing with large differences in depth), ambient lighting (without needing to bake it or do GI) and various aspects on shadows. A great read!
Surveying Real-Time Rendering Algorithms by David Luebke from nVidia gives an excellent short overview of a number of recent developments in realtime rendering algorithms including stochastic transparency (ie : transparency via random sampling), sample distribution for shadow maps (partitioning the scene in light-space), alias-free shadow maps, append-consume order-independent-transparency (sorting per-pixel & linked-lists), progressive photon mapping, image-space photon mapping, ambient occlusion volumes (how to speed it up with bitwise occlusion mask for each triangle - one per edge, and one for the triangle plane), stochastic rasterization (of 4d triangles)
Keeping Many Cores Busy: Scheduling the Graphics Pipeline by Jonathan Ragan-Kelly from MIT gives a great overview of the graphics pipeline stages (from Input Assembler, Vertex Shader, Primitive Assembler, Tesselation, Geometry Shader, Rasterizer, Pixel Shader, and finally Output Blending) and load balancing.
Uncharted 2 - Character Lighting and Shading by John Hable from Naughty Dog gives a fabulous overview of rendering issues with skin (in a lot of detail!), hair and clothes.
Bending the Graphics Pipeline by Johan Andersson from DICE describes tile-based deferred shading (for GPU and SPU), morphological antialiasing and analytical ambient occlusion.
A real-time Radiosity Architecture for Video Games by Sam Martin and Per Einarsson from DICE/Geomerics introduce the 'Enlighten' system for realtime GI - it gives a nice overview.
Evolving the Direct3D Pipeline for Real-time Micropolygon Rendering by Kayvon Fatahalian from Stanford University gives an interesting insight on Micropolygon rendering on current GPU pipelines.
Water Flow in Portal 2 by Alex Vlachos - I've already written about this previously, just another realtime technique for faking the simulation and rendering of water flow.

Making Concept Real For Borderlands by Gearbox Software contains some nice examples of their concept art, the development change from photorealistic to stylistic rendering and art (and the code/artist balance), and the sobel edge filter they used.
The notes from the volumetric course was broken into parts:
1. "Resolution Independent Volumes" - which describes the "Field Expression Language Toolkit", cloud modelling (via density displacement), morphing objects (by using the Nacelle Algorithm to generate warp fields), cutting models, fluid dynamics, gridless advection, and semi-lagrangian mapping (balancing between grids and non-grids).
2. "Mantra Volume Rendering" - this describes the volume rendering engine for Houdini.
3. "Volumetric Modeling and Rendering" describes the volumetrics library/API developed at Dreamworks.
4. "Double Negative Renderer B" (used in Batman Begins, Harry Potter, etc.) describes the workflow and various shaders (Fluid, Particle, Voxel, Fire, Smoke) in DNB
5. "Volume Rendering at Sony Pictures Imageworks". The section from Sony Imageworks included an overview of their pipeline and content on their open source field/fluid tools.

Thursday, February 11, 2010

Drishti

Drishti is a real-time interactive volume rendering and animation tool. Paul Bourke
organized a tutorial at WASP/iVEC. The tool is developed by Ajay, a very friendly guy, and very open to user feature-requests.

Drishti has three parts, the renderer, the importer, and the painter. We only covered the renderer and importer.

The importer can import from various file formats, including standard image stacks and raw data. (unsigned characters, Z=1 .. ns , Y=1 .. wd,  X=1 .. ht)

To use the importer you just  drag and drop (raw) data , then you can adjust the top slider nob to alter contrast , and left click to add an additional point that you can move to compress the range. You can view the data in different color spaces, and use the sliders to inspect the data. When you save you have a number of additional options including sub-sampling and filtering.

Once you have finished with importing your data and generated the pvl.nc you can drag and drop this into the renderer. Pressing F2 swaps you between high and low-resolution mode.

You can edit the transfer functions to explore the volume. The 2D version depicts the gradients of the data set, and takes a bit of playing around with before you get used to it. You can left/right click to shift the points, add points, make curves, etc throughout the selected volume. Space will bring up additional color maps. You can add new transfer functions to highlight different parts of the volume. The two sliders on the side can be used to set the alpha, and 0.5 each for a gaussian influence instead.

In low-resolution mode you can alter the bounds for the volume by draging on the sides of the box, or using the arrow keys for fine change movements.

Under the preferences tab you can set the step size (ie: quality of the render), or add an axis and labels, etc. Strange things seem to happen when you set the steps too low (< 0.2).

The final thing we discussed was creating keyframe animations. Selecting View, Keyframe editor displays the dialog. You can then click anywhere on the keyframe line, and set the viewport however you like (ie: rotate/zoom) and then click 'add keyframe'. Select another keyframe position, move the camera and add another keyframe, etc, etc. until you have the animation you like. You can move individual keyframes, or shift-left mouse to select an entire region and drag/reposition a whole group of keyframes.

To rotate the camera in another axis or to manually modify the camera positions, etc. use the brick-editor. Press 'a' to show the axis, and you can modify the axis of rotation (eg: alter 1,0,0 to 0,0,1, etc.).

Thats a fast-short introduction to Drishti. Take a look at the gallery for more screenshots and videos - unfortunately few of the fanciest features have videos.

Sunday, January 10, 2010

GPU update

A few updates in the GPU world for the last month:

AMD released OpenCL SDK, bringing the Stream SDK 2.0 out of beta.
AMD released mobile DX11 GPU's
CUDA Database cracker for Oracle and Microsoft SQL was released, doing about 200 millions passwords / second.
MD5 Crack, a brute force GPU-MD5 cracker was updated.

Tony Albrecht from Sony released his slides on Pitfalls of Object Oriented Programming, which describes how to write optimal C++ code. (I'm a strong believer in C++ for all problems). Tony demonstrates how to convert 'typical' C++ code for managing a scene graph into optimal code getting a 6x speedup from using a custom allocator to ensure linear SOA data, removing conditionals (doing more calculations since compute is faster than branch), flattening the scene hierarchy, and prefetching data. It's a great read! (a bit light on some details though!)
Finally, this excellent video of a DirectCompute fluid simulation was released by Jan, you can get the fluid source code from his website.

Wednesday, November 25, 2009

Catchup post

It's been a while since I updated, reason being it is exams/assignments marking period, and I had two GPU industry projects due (3D groundwater flow fluid simulation and a pathfinding/travelling salesman with profits project).

The biggest news item was that the ACM decided to start (over)enforcing its rules saying that you can not link to preprint and author pages. Thankfully, it started a call-to-arms and prominent pages like Ke-Sen Huang's SIGGRAPH links have been restored. I wonder how many less public pages have silently slipped away. Frankly, I can't wait until the concept of conferences and journals disappear. My websites have always had far more impact that my publications, and it can't be long until the same can be said universally.

A short update with some interesting things in the last while:

Obama has made a comment about robotics, it's great to hear the field is getting a lot more visible and more support, but I still feel we have a very long way to go to break the western cultures distrust of robotics.
Sony/Toshiba/IBM CELL is now officially dead.
There is a new GPU computing community website, just started up: http://gpucomputing.net/
nVidia's raytracing technology Optix is finally available, but only for next-gen GPU's. The "RealityServer" was also released, to great disappointment to many.
Tech report has a more detailed report on nVidias new Fermi based GPU's.
Caustic claims to be selling raytracing accelerators, but they haven't been able to sell me one yet. Smells like vapourware.
Chromium OS was released. Not sure who wants this? Fast starts haven't been an issue for me since laptop's could sleep. I guess they are aiming for very low prices.
Peteris Krumins posted a great blog post on the highlights of MIT's algorithms lectures
Wolfire did an interesting post on generating vertex weights.
Felix von Leitner put together a very interesting presentation on compiler optimization tricks. It's good proof that compilers are much better than humans for most optimizations, plenty of new things I learnt from this.
Finally, I saw a link to this paper:Aggregate Dynamics for Dense Crowd Simulation. I've not had a chance to read it yet, but the results look fantastic and the idea is novel. (Using fluid-simulation techniques for representing crowds.) Video below:

Saturday, August 22, 2009

Rigid Body Simulation & Deformables

The most brilliant idea I've seen in a long time for fast general rigid body simulation,
Statistical Simulation of Rigid Bodies.

We begin by collecting statistical data regarding changes in linear and angular momentum for collisions of a given object. From this data we extract a statistical “signature” for the object, giving a compact representation of the object’s response to collision events. During object simulation, both the collision detection and the collision response calculations are replaced by simpler calculations based on the statistical signature.

I've always thought something like this could be done, now we know how. Genius.

In related rigid body news:
Anisotropic Friction for Deformable Surfaces and Solids, for the next time you need a friction model.

A Point-based Method for Animating Elastoplastic Solids is a paper that presents another method to combine fluid and solid simulation using some of the same concepts from Mullers SPH paper. The code is built up from Adaptively Sampled Particle Fluids source. No code for this method though (yet). Haven't had an indepth look, but seems like a nicer approach than I previously discussed.

Fluids & Volumes

Another catchup post:
Fluid Simulation with Articulated Bodies from gatech has some nice results, would could make a great basis for a new version of Karl Sims swimmers.
Sony has released Field3d, a open file format for volume data, which seems to take a lot of the low level work away from you when working with voxels, it supports slices,etc. as well.
Matthias Müller also has a new publication, Fast and Robust Tracking of Fluid Surfaces which describes how to efficiently generate surface meshes from fluid simulations.I've not read it fully yet, but presumably it is worthwhile.
I also stumbled accross some more open source fluid projects, Fire Dynamics Simulator and Smokeview from NIST which has some good looking fire & smoke simulations of houses burning, etc. and a new project nsproj, that has a parallel 3D Navier Stokes solver.

Wednesday, August 05, 2009

ATI Stream SDK includes OpenCL

The new ATI stream SDK includes a CPU implementation of OpenCL.

AMD released a demo of a 2d fluid simulation using 24 cores.

I have to say, I'm not very impressed. I'd be keen to see how this holds up to an OpenCL GPU implemetnation, it certainly seems like CUDA has a strong position for now. They do have a good OpenCL introductory tutorial though.

In other news, nVidia released OptiX a realtime raytracing "engine", whatever that is supposed to mean..

Wednesday, July 22, 2009

Groundwater Flow on GPU

I am presently developing a proof-of-concept groundwater flow GPU program. The algorithm works similar to a finite differences algorithm, or many kernel-based image processing systems.

Initially I just ported the code over to CUDA, and thanks to having done this a few times before the CPU version I wrote was easy to transfer. A good time saver is to use macros to index arrays, etc. This makes it easy to swap in __mul24, or tex2D etc. later.

The next step was to use shared memory to buffer the input data, this gave the biggest performance boost. Finally I did some arithmetic optimizations for another small gain.

On the advice of a friend I tried re-structuring the GPU program to use 'aprons' similar to the 'Separable Convolution' SDK sample, and tried restructuring read/writes. This all made almost no difference at all, so it seems that the 'sweet-spot' can be hit quite quickly as soon as you have done the obvious shared memory and arithmetic optimizations. The overall structure of the program seems to make little difference.

A common bit of advice is to leverage the texture units on the GPU's, but a simple modification of the 'Texture-based Separable Convolution' sample program, reveals it is infact almost twice as slow as the non-texture based. Seems like the texturing unit speedups are a bit of a myth.

Benchmarking the program has been a bit of a problem, since it is very dependent on the type of GPU you have, and the problem size. If I process a few hundred thousand nodes, the speedup is around 15x over the CPU, but when I move to processing tens of millions of nodes, the speedup is over 20x. Processing the same data on a slightly older GPU will only give a 4x speedup (Compute 1.0).

All in all I've found it extremely difficult to give an overall answer on the performance gain of the GPU. It seems to be highly dependent on the problem size (the bigger the better - thank god!) and the GPU technology (this is going to make porting the software to multiple GPUs a pain!).

Saturday, May 30, 2009

Link post

Some assorted interesting sounding tech:

Thrust is a STL-like tech for CUDA programming. Could be interesting, certainly looks like it is easier to use than CUDA, but just from my feelings it probably won't be worth anything until nVidia let multiple CUDA functions execute in parallel..

5 optimization tips for CUDA, a nice succinct roundup of some good performance tips for CUDA, including Arithmetic Instruction optimization.

DANCE framework for Dynamic Animation and Control, definately something I want to check out for potential synergy with the Physics engine Abstraction Layer. Proper animation controllers is something PAL lacks.

ReactPhysics3d a new open source physics engine, very much in the early stages of development, I've seen a number of engines grow in time, I hope this one succeeds too, but it will need to find a niche if it is to survive..

Predictive-Corrective Incompressible SPH is a paper on estimating the SPH state without explicitly evaluating it, thereby saving some CPU cycles. I actually had a similar idea, so maybe this is a path for further research. Also on the SPH front, Co-rotated SPH for Deformable Solids, the idea is great, not convinced by their particular implementation though..

Perhaps these ideas might make it into the SPH routines for PAL..

and on the lighter side;

2D boy's prototyping framework for the world of goo.

Wednesday, May 20, 2009

Waterboard

Designer Mike Burton has developed a wall of water, called ‘the Waterboard’. Is is in interactive whiteboard that allows users to manipulate the flow of water. Nifty!

Check out the video on youtube, and some more info here.

Friday, May 08, 2009

Particle Systems

When implementing a large particle system, chances are that will have to interact with something. Often, with other particles (as in the case of SPH).

The most obvious approach is a uniform grid. As soon as you think you might want a large range, the idea of a hashed grid will cross your mind. Turns out, these are great data structures for a GPU. If your new to the topic, Christer Ericson's Real-time collision detection book offers a nice introduction to the topic and some optimizations.

My first implementation for my particle system was a straight froward uniform grid. To avoid dynamic memory allocation I just had a static maximum sized array, (which turned out only to need to be able to store 8 elements max), so this isn't too bad. If you start having larger cell sizes, or start hashing the cells then you might end up with a problem if you limit how many cells you can store. Otherwise from information I've gathered in the CUDA forums, this approach should be relatively fast on the GPU. (Trading a lot of GPU memory for speed)

So this approach basically goes like:
1. Clear, then populate a grid
(calculate the particles grid position, add it to the 'list' for this cell in the grid)
2. For each particle, look up its grid position, and pull out a list of other particles in this cell, do the same for all neighboring cells.
3. Run through this list, and do your comparisons with your current particle.

You might need to sort this list and eliminate duplicates, depending on how you handle corner/border cases. (Since this list is usually <8 elements, you don't need a fancy sort routine.)

Some nice websites on sorting routines, animated sorting and not-animated sorting. (ie : Bubble/Insert should do just fine.)

If you start hashing cells and having larger numbers of particles occupying one cell, (or from now on, 'bucket') you might want a different approach. This is where the technique nVidia uses shines.

The best resource on this I've found is Simon Green's GDC08 presentation. Also, Mark Harris' CUDA fluids presentation is worthwhile as it goes into a bit more depth on the CUDA/sorting front. (well, actually the best resource was Mark Harris himself, but I didn't actually know he worked on this specifically (PhysX Fluids) until after asking him stupid questions. I probably should have googled a bit more first to save the embarrassment).

Anyway, the approach goes like this:
First you build the data structures you need:
1. Populate a list with a pair of values, corresponding to the particle ID and the cell it occupies.
2. Sort this list (so that you can figure out which particles are in the same cell)
3. Create a grid structure which contains the index to this list where the first particle entry appears

Then to find neighbors for each particle:
1. For each neighboring cell
2. Calculate the cell id, look it up in the grid to get the start index in the list
3. Step through the list doing your particle-particle calculations until a new cell id occurs.

Again, with pictures:

For each particle, we can figure out a cell id.
eg: Particle 0, has a cell id of 9.

Now we can add this to a list, which contains the cell id, and particle pairs.

Then we sort this list, and populate our grid structure with -1 if it is empty, or the earliest index into this list if it contains a particle.

Great, again, how to use this?

For each particle, we look it up into the grid. This gives us the starting index in our list. Then, all the particles in this cell can be referenced from the pairs in the list.

For example, particle 4 is in cell 6, so we look up cell 6 in the grid. This gives us a list index of 2.

Looking into our list at index 2, we see particle 1, and following until we no longer have a cell id of 6, we get particles 1,2,4. (Not particle 0, it has cell index 9!)

So thats how we have a nice efficiently growing grid on the GPU with one big hash bucket. All we need to do is have a GPU friendly sorting routine.

How convenient, that we already have a fast radix GPU sort in the CUDA sdk:
CUDA Radix sort (again, from Mark Harris & co.). (No comparison to GPU QuickSort, who knows which is faster..)

Wednesday, May 06, 2009

GPU Fluids - iVEC report

iVEC have released their student project reports, including the report by Niel Osborne on our work on GPU fluid simulation.

Monday, February 16, 2009

Smoothed Particle Hydrodynamics on the GPU

I've been working together with Neil Osbourne on a GPU version of my SPH code. He tried a number of things and managed to get a significant speedup. He presented his results at the iVEC eResearch Forum 2009. Main lessons learnt were to keep the number of CUDA threads approx equal to the number of hardware processors, not to do memory optimizations too early on, and ofcourse, to do shared memory optimizations later on.

Never managed to get a hashed grid going within the time we had, but it was still a cool project. With a bit more effort it should find it's way into PAL or Bullet.

Thursday, June 07, 2012

Sunday, June 05, 2011

Monday, February 07, 2011

Wednesday, December 29, 2010

Sunday, September 12, 2010

Thursday, February 11, 2010

Sunday, January 10, 2010

Wednesday, November 25, 2009

Saturday, August 22, 2009

Wednesday, August 05, 2009

Wednesday, July 22, 2009

Saturday, May 30, 2009

Wednesday, May 20, 2009

Friday, May 08, 2009

Wednesday, May 06, 2009

Monday, February 16, 2009

Popular Posts

Blog Archive

Labels

My Blog List

Links

Subscribe To

Feedjit

About Me

Followers