Wednesday, August 25, 2010

Kalman filters - Part 2

In a previous post I described kalman filters and gave a very simple example of a 1D filter implemented in C. In the real world, having a filter with three or more inputs is common. A standard 3rd order example would be a filter that considers the position, velocity and acceleration of a system.

In this case you would have a 3x3 measurement matrix (H) (with the diagonal set to what you can measure), a 3x3 process noise matrix (Q), a 3x3 measurement noise matrix (R), and a 3x3 state transition matrix. The standard transition matrix would simply contain v+=a*dt, x+=v*dt, but we can use a 2nd order leapfrog integrator to get better accuracy of our position from our acceleration data. So in total we have:





You can either solve this using the standard kalman matrix maths, but you will find that an SVD approach will give you more stable solutions. OpenCV is one library that already has the kalman matrix maths built in. To set up a kalman filter with OpenCV we just need to setup the matrices we need and call the appropriate cvKalman functions.

(ugly) Code snippet follows:
#ifndef KALMAN_H
#define KALMAN_H

/** OpenCV Kalman filter for 1st, 2nd and 3rd order position, velocity and acceleration data.
    (c) Adrian Boeing
    www.adrianboeing.com
 */
#include <cv.h>

typedef enum {
    K_P = 1, //position only
    K_V = 2, //velocity only
    K_A = 4, //acceleration only
    K_PV = 3, //position & velocity
    K_PA = 5, //position & acceleration
    K_VA = 6, //velocity & acceleration
    K_PVA = 7 //position, velocity and acceleration
} Kalman_e;

class KalmanPVA {
public:
    KalmanPVA(int order, Kalman_e type, float initial_pos, float initial_vel, float initial_accel, float dt, float process_noise[3], float measurement_noise[3]) {
        m_order = order;
        //create the OpenCV kalman structures
        kalman = cvCreateKalman(order, 3, 0 );
        z = cvCreateMat(3,1,CV_32FC1);
        
        //construct the transition matrix
        SetDT(dt);
        
        //enable only the measurements we have!
        for (int i=0;i<m_order;i++) {
            cvmSet(kalman->measurement_matrix,i,i,((type&7)>>i)&1);
        }
        //configure the process noise matrix
        cvSetIdentity( kalman->process_noise_cov,    cvRealScalar(1) ); //zero the matrix 
        for (int i=0;i<m_order;i++) {
            cvmSet( kalman->process_noise_cov,i,i, process_noise[i] ); 
        }
        //configure the measurment noise matrix
        cvSetIdentity( kalman->measurement_noise_cov, cvRealScalar(1) ); //zero the matrix
        for (int i=0;i<3;i++) {
            cvmSet(kalman->measurement_noise_cov,i,i,measurement_noise[i]);
        }
        cvSetIdentity( kalman->error_cov_post,cvRealScalar(10) ); //large initial unknown
        
        //setup the initial state
        cvmSet(kalman->state_post,0,0,initial_pos );
        if (m_order>1)
            cvmSet(kalman->state_post,1,0,initial_vel);
        if (m_order>2)
            cvmSet(kalman->state_post,2,0,initial_accel);
        
        
    };
    void SetDT(float dt) {
        //3rd, 2nd, and 1st order transition matrix models
        
        //3rd order uses 2nd order leapfrog integration
        float F3[] = {   1, dt, 0.5*dt*dt,
            0,  1, dt,
            0,  0,  1};
        
        float F2[] = {   1, dt, 
            0,  1};
        
        float F1[] = {1};
        
        switch (m_order) {
            case 3:
                memcpy( kalman->transition_matrix->data.fl, F3, sizeof(F3));
                break;
            case 2:
                memcpy( kalman->transition_matrix->data.fl, F2, sizeof(F2));
                break;
            case 1:
                memcpy( kalman->transition_matrix->data.fl, F1, sizeof(F1));
                break;
        }
    }
    void SetMeasurements(float pos, float vel, float accel) {
        cvmSet(z,0,0,pos);
        //if (z->rows>1)
            cvmSet(z,1,0,vel);
        //if (z->rows>2)
            cvmSet(z,2,0,accel);
    }
    
    void Update(){
        const CvMat* y = cvKalmanPredict( kalman, 0 );
        cvKalmanCorrect( kalman, z );
    }
    float GetPositionEstimate() {
        float v = cvmGet(kalman->state_post,0,0);
        return v;
    }
    float GetVelocityEstimate() {
        if (kalman->state_post->rows>1)
            return cvmGet(kalman->state_post,1,0);
        return 0;
    }
    float GetAccelerationEstimate() {
        if (kalman->state_post->rows>2)
            return cvmGet(kalman->state_post,2,0);
        return 0;
    }
    float GetPositionError() {
        return cvmGet(kalman->error_cov_post,0,0);
    }
    CvKalman* kalman; //openCV Kalman filter
    CvMat* z; //measurement matrix
private:

    void cvKalmanNoObservation() { 
        cvCopy(kalman->error_cov_pre, kalman->error_cov_post); 
        cvCopy(kalman->state_pre, kalman->state_post); 
    } 
    int m_order;
    
};

#endif

Tuesday, August 17, 2010

CAD packages

Some basic knowledge of CAD packages never hurt anyone. Two of the most common ones used in industry are AutoCAD and SolidEdge. Most of the time you will just need to take measurements from some diagrams rather than build a complete model.

AutoCAD has a command line interface (from way back in the DOS days). Two useful commands for measurement are:
  • di - Measure a distance between two points
  • id - Print the coordinates of a point of interest

Other things you might find yourself doing include drawing lines and circles, which are all relatively self-explanatory. Occasionally you need to measure something relative to a reference line or base line that you want to project through the whole drawing. You can do this by using the 'extend' option to lengthen the line. Simply choose the 'Extend' option from the modify menu and draw a line to represent the boundary of the lines extension, right click, and select the line you wish to lengthen. AutoCAD will lengthen the selected line up to the boundary you specified.

Like Photoshop AutoCAD also has layers which you can work with. You can turn layers on/off by clicking on the lightbulb function.


With SolidEdge you can use the dimension menu to make measurements (Make sure you are working on the correct sheet!). These can be either horizontal/vertical, by two points, or you can select an element in the drawing to use as the axis (easier than in Autocad!).

Friday, August 06, 2010

Computing events in Perth

GameJam is on again in Perth - I didn't see much about it this time, but it is in Joondalup again. Also, there is a GPU computing workshop on the 19th of August in the ARRC auditorium. More information on the workshop is available here.

Thursday, August 05, 2010

A star (A*) pathplanning

Previously I wrote about Dijkstra, this time I'm going to look at heuristic or "AI" style approach the A* algorithm. This is a single-source single-goal pathfinding algorithm. In essence this algorithm "guesses" or estimates the distance that remain to get from a node to the destination node. These estimates guide the direction of the path (i.e. search), with each node that is visited updating the true cost of the path, until a solution is found.

A* considers what to do at each node depending on an overall heuristic score. The total score is a function of the cost of the path that has already been traversed (the "g" cost) and the estimated cost of traveling from the node to the destination (the "h" cost). There are many ways to estimate the "h" cost, typically for pathfinding problems the euclidian distance is used, however it is always better to over-estimate the remaining cost (so you will explore more feasible solutions), and so a common measure used is the manhattan distance.

The A* algorithm works on two lists, a "Closed" list of already explored nodes, and an "Open" list of nodes yet-to-be-explored.

In essence the algorithm is:
  • Select a source node to operate with (eg: the start node, or the node with the lowest overall cost)
  • For each node connected to the source node:
    • Calculate the already-traveresed cost
    • If the node is on the closed list, and the newly calculated traversed cost exceeds the existing traversed cost for the node, skip the node, otherwise remove it from closed.
    • Calculate the estimate cost to find the total cost, and store all these costs with the node.
    • Add the node to the Open list
  • Add the source node to the Closed list
In Pseudo-Code (from Bryan Stout):
priorityqueue Open
 list Closed 
s.g = 0 // s is the start node 
s.h = GoalDistEstimate( s ) 
s.f = s.g + s.h 
s.parent = null 
push s on Open 
while Open is not empty 
  pop node n from Open // n has the lowest f 
  if n is a goal node 
    construct path 
    return success 
  for each successor n' of n 
    newg = n.g + cost(n,n') 
    if n' is in Open or Closed, and n'.g < = newg 
      skip 
    n'.parent = n 
    n'.g = newg 
    n'.h = GoalDistEstimate( n' ) 
    n'.f = n'.g + n'.h 
    if n' is in Closed 
      remove it from Closed 
    if n' is not yet in Open 
      push n' on Open 
  push n onto Closed 
return failure // if no path found 
A great interactive demonstration program can be found here: http://www.policyalmanac.org/games/aStarTutorial.htm And some source code for A* is available from the GrinningLizard website (The same author as the fantastic tinyXML library!) Keep in mind that waypoint-based AI navigation for games is a very mid-nineties approach, nav-mesh's are certainly the way to go! For an example of issues with waypoint approaches take a look at this video: Your best bet for navmesh research is Mikko Mononen's blog.

Wednesday, August 04, 2010

Using Parallel Reduction to Optimize Dijkstra

In a previous blog post I wrote about Dijkstra's algorithm. One optimisation to the naive implementation is to calculate the minimum cost node in parallel - since this can often be the most time consuming operation for large graphs. This is a perfect candidate for a parallel reduction operation.

In serial, if we wished to sum up an array (x) of numbers, the code would be:
int sum = 0;
for(int i=0; i < N; ++i) 
 sum += x[i];

However, in parallel we could speed this up by summing parts of the array and then adding those parts together. For example, to add four numbers a,b,c,d we could first add a+b and c+d in parallel, then sum ab+cd. This is called a 'reduction' operation, and it allows us to complete an otherwise O(N) operation in O(log(N)) steps.

One particularly nifty pattern is the "Butterfly" pattern of adding elements. Best described with an animation:


In code this can be done as:
for (b=n/2;b>0;b/=2) {
 for (i=0;i<n;i++) { 
         t[i] = OP(sum[i],sum[i^b]); 
 } 
 for (i=0;i<n;i++) { 
         sum[i]=t[i]; 
 } 
}
(Where 'OP' can be any serial commutative operation, e.g. add, subtract, etc.)


To see how this works assume we are adding 8 elements (n=8) then for the first step:
b is 0100
and each subsequent operation will be operating on the following elements:
0000^0100:0100
0001^0100:0101
0010^0100:0110
0011^0100:0111
0100^0100:0000
0101^0100:0001
0110^0100:0010
0111^0100:0011
or in decimal:
t[0] = [0] + [4]
t[1] = [1] + [5] 
t[2] = [2] + [6]
t[3] = [3] + [7]
t[4] = [4] + [0]
t[5] = [5] + [1]
t[6] = [6] + [2]
t[7] = [7] + [3]
Now, for the next step:
b is 0010
0000^0010:0010
0001^0010:0011
0010^0010:0000
0011^0010:0001
0100^0010:0110
0101^0010:0111
0110^0010:0100
0111^0010:0101
Again, in decimal:
t[0] = [0] + [2]
t[1] = [1] + [3]
t[2] = [2] + [0]
t[3] = [3] + [1]
t[4] = [4] + [6]
t[5] = [5] + [7]
t[6] = [6] + [4]
t[7] = [7] + [5]
Finally,
b is 0001
0000^0001:0001
0001^0001:0000
0010^0001:0011
0011^0001:0010
0100^0001:0101
0101^0001:0100
0110^0001:0111
0111^0001:0110
And in decimal:
t[0] = [0] + [1]
t[1] = [1] + [0]
t[2] = [2] + [3]
t[3] = [3] + [2]
t[4] = [4] + [5]
t[5] = [5] + [4]
t[6] = [6] + [7]
t[7] = [7] + [6]

Thus, in only 3 parallel operations we have added 8 elements together. We also have the additional bonus of having broadcast the result to all of the eight parallel adding elements - allowing future operations for each processor to use the result, or in the case of a SIMD system, to mask the appropriate processors.

Here is the code in CUDA:

int i = threadIdx.x; // Thread i holds value x_i, one thread per element
 __shared__ int sum[blocksize]; 
sum[i] = x_i; //copy to shared memory 
__syncthreads(); //synchronise 
for(int bit=blocksize/2; bit>0; bit/=2) {
    int t=OP(sum[i],sum[i^bit]);  
    __syncthreads(); 
    sum[i]=t;  
    __syncthreads(); 
}

Nifty!

Tuesday, August 03, 2010

Dijkstra

Dijkstra's algorithm is a well known algorithm for solving the single-source shortest path problem in a graph. It is a "global" pathfinding algorithm, in that it will find the optimal solution to all the other nodes in the graph (provided the graph has no negative cost edges or cycles). Unfortunately the algorithm is not very easy to run in parallel.

The algorithm requires two lists, a visited list, and a pending list, and an array of the current distances to each node, plus the precursors.

In essence the algorithm begins at a starting node, and all other nodes are added to a 'pending' list. The distances are initialised to 'infinite'. We begin at the starting node that has been specified.

1. From this node we calculate the distance to each neighbouring node. If the distance is less than the current record, we update the distance measurement (cost of getting to current node plus cost to get to neighbour node), and update the precursor to the neighbour node to be the present node.

2. From all nodes, the node with the minimum distance on the pending list is selected as the next node to visit. This new node is then added to the visited list and removed from the pending list..

It's quite simple, but an animation will really help get the point across:


In code, the algorithm is as follows:
(This code stores the graph in the "dist" array, with dist[i][j] representing an edge cost between i and j. If it contains '0' then there is no connection).

The initial loop sets the distances to infinity. The first inner loop finds the minimum cost node, the second inner loop updates the distances based on the current node.
The minimum distance to each node from the starting node is stored in the 'd' array. The visited array is used to record which nodes have already been processed.
void dijkstra(int s) {        
    int i, k, mini;         
    int visited[GRAPHSIZE];         
//initialize all distances to infinite, and set the visited nodes to none.              
    for (i = 1; i <= n; ++i) { 
     d[i] = INFINITY; visited[i] = 0; 
    }         
//set the distance to the starting node 's' to zero 
    d[s] = 0; 
//outer loop - iterate through all nodes 
    for (k = 1; k <= n; ++k) {                
//find the minimum cost node 
//if the node has not been visited, and the minimum cost node is not initialised, 
//then set the minimum cost node to the current node 
//if the distance to the current node is less than the current minimum,
//then update the minimum 
     mini = -1;                 
     for (i = 1; i <= n; ++i)                      
          if (!visited[i]&&((mini==-1)||(d[i]<[mini])))                                        
          mini = i; 
//now we have the current minimum node, set it to visited 
     visited[mini] = 1; //visit node               
//for each node, if there is a valid edge between the current minimum node 
//then calculate the cost to visit this neighbour node via the current node.
//if the cost is less than the current record of minimum costs, update the minimum cost.
     for (i = 1; i <= n; ++i)                               
        if (dist[mini][i]) //update distances                                    
        if (d[mini] + dist[mini][i] < d[i])                                                        
              d[i] = d[mini] + dist[mini][i];         
    } 
}
In future blog posts I'll look at the A* algorithm, a parallel reduction for Dijkstra and some SIMD/GPU/CUDA optimisations for an 8-connected Dijkstra.

Monday, August 02, 2010

C++ debugging tools: Gdb, CppCheck, Valgrind

One of the most important parts of programming is the debugging phase, and for development, the testing phase. Some of the tools that can be used under OSX are the standard linux tools - but ofcourse, there are always a few tweeks.

CppCheck is a static analysis tool, it can find formatting issues, unused variables or sub-optimally used variables, pointer/memory issues, etc.

The install procedure for CppCheck is:
git clone git://github.com/danmar/cppcheck.git
make
sudo make install

sudo easy_install Pygments-1.3.1-py2.6.egg 
http://pypi.python.org/pypi/Pygments

cd htmlreport
sudo ./setup.py install
(The python aspect you only need for a nice html report)


Note, I had to check out v 1.44 for it to work with OSX, so after the git checkout command add the following:
git checkout 1.44
git checkout -b 144

Here is an example shell script to run cpp-check (thanks John):
#!/bin/bash
cppcheck --enable=all --xml --verbose -f -I math -I boost/boost -I src/framework src 2> err.xml
cppcheck-htmlreport --file err.xml --report-dir="CodeAnalysisReport" --title="My Project"
Just alter it to match your own include directories and source.

If you need to do any runtime debugging, GDB is the standard tool. It can be quite complex, but here is its basic usage:
  • gdb (program name) [this will start GDB with your program loaded]
  • set args (arguments) [this will set the command line parameters to your program]
  • break (function/bp) [eg: break lib/preprocessor.cpp:610]
  • r [run]
  • bt [trace]
  • q [quit]
Not as good as the MS debug tools, but its still far better than not having a debugger at all! Finally, Valgrind is a tool for memory-checking. It normally works out of the box, but I had to make some changes for OSX. Luckily Greg Parker already wrote a patch, and Cristian Draghici wrote up a blog post on compiling and installing valgrind on OSX.