When implementing a large particle system, chances are that will have to interact with something. Often, with other particles (as in the case of SPH).
The most obvious approach is a uniform grid. As soon as you think you might want a large range, the idea of a hashed grid will cross your mind. Turns out, these are great data structures for a GPU. If your new to the topic, Christer Ericson's Real-time collision detection book offers a nice introduction to the topic and some optimizations.
My first implementation for my particle system was a straight froward uniform grid. To avoid dynamic memory allocation I just had a static maximum sized array, (which turned out only to need to be able to store 8 elements max), so this isn't too bad. If you start having larger cell sizes, or start hashing the cells then you might end up with a problem if you limit how many cells you can store. Otherwise from information I've gathered in the CUDA forums, this approach should be relatively fast on the GPU. (Trading a lot of GPU memory for speed)
So this approach basically goes like:
1. Clear, then populate a grid
(calculate the particles grid position, add it to the 'list' for this cell in the grid)
2. For each particle, look up its grid position, and pull out a list of other particles in this cell, do the same for all neighboring cells.
3. Run through this list, and do your comparisons with your current particle.
You might need to sort this list and eliminate duplicates, depending on how you handle corner/border cases. (Since this list is usually <8 elements, you don't need a fancy sort routine.)
Some nice websites on sorting routines, animated sorting and not-animated sorting. (ie : Bubble/Insert should do just fine.)
If you start hashing cells and having larger numbers of particles occupying one cell, (or from now on, 'bucket') you might want a different approach. This is where the technique nVidia uses shines.
The best resource on this I've found is Simon Green's GDC08 presentation. Also, Mark Harris' CUDA fluids presentation is worthwhile as it goes into a bit more depth on the CUDA/sorting front. (well, actually the best resource was Mark Harris himself, but I didn't actually know he worked on this specifically (PhysX Fluids) until after asking him stupid questions. I probably should have googled a bit more first to save the embarrassment).
Anyway, the approach goes like this:
First you build the data structures you need:
1. Populate a list with a pair of values, corresponding to the particle ID and the cell it occupies.
2. Sort this list (so that you can figure out which particles are in the same cell)
3. Create a grid structure which contains the index to this list where the first particle entry appears
Then to find neighbors for each particle:
1. For each neighboring cell
2. Calculate the cell id, look it up in the grid to get the start index in the list
3. Step through the list doing your particle-particle calculations until a new cell id occurs.
Again, with pictures:
For each particle, we can figure out a cell id.
eg: Particle 0, has a cell id of 9.
Now we can add this to a list, which contains the cell id, and particle pairs.
Then we sort this list, and populate our grid structure with -1 if it is empty, or the earliest index into this list if it contains a particle.
Great, again, how to use this?
For each particle, we look it up into the grid. This gives us the starting index in our list. Then, all the particles in this cell can be referenced from the pairs in the list.
For example, particle 4 is in cell 6, so we look up cell 6 in the grid. This gives us a list index of 2.
Looking into our list at index 2, we see particle 1, and following until we no longer have a cell id of 6, we get particles 1,2,4. (Not particle 0, it has cell index 9!)
So thats how we have a nice efficiently growing grid on the GPU with one big hash bucket. All we need to do is have a GPU friendly sorting routine.
How convenient, that we already have a fast radix GPU sort in the CUDA sdk:
CUDA Radix sort (again, from Mark Harris & co.). (No comparison to GPU QuickSort, who knows which is faster..)