## Monday, January 31, 2011

### Twist Effect in WebGL

The twist/twirl effect is achieved with a combination of techniques I've already discussed:
Combining these will give us a 'twisting' effect. Unfortunately, this is almost impossible to explain without pictures.

Unlike the previous examples where I used shaders to explain each component, it will be easier to explain this with some graphs. In the graph each color represents a different height, so from our circle-equation we get:
` float r = sqrt(dot(p,p))/sqrt(2.0); `

From the tangent-equation we get:
` float a = atan(p.y,p.x) / (2.0*3.1416); `
 Angle (a)

Twisting is essentially a spiral, and we can generate a spiral by stepping out a constant amount towards the radius as we progress around a circle. For example, we start at 0 degrees, at 0% of a step away, then move to 45 degrees at 12.5% of a step towards the radius, then move to 90 degrees at 25% of the radius, and so on.

This is shown in the image below:

Now, to generate a twist we can simply add our constant step away (in the form of the 'r'(radius) graph/image) with our rotation around a circle (in the form of the 'a'(angle) graph/image). This generates our final graph:

We use this equation to sample from our texture.
In GLSL shader code, we have:
``` uniform vec2 resolution; uniform sampler2D tex; void main(void) { vec2 p = -1.0 + 2.0 * gl_FragCoord.xy / resolution.xy; vec2 uv; float a = atan(p.y,p.x) / (2.0*3.1416); float r = sqrt(dot(p,p))/sqrt(2.0); uv.x = r; uv.y = a+r; vec3 col = texture2D(tex,uv).xyz; gl_FragColor = vec4(col,1.0); } ```
 Square color texture

To animate our spiral and make it spin, we just need to add a offset to our "a+r" value. To make the spiral scroll the texture "inwards" we can just add an offset to the "r" value.
 Twist with the square color texture

This will give us our final animating spiral:

Your browser doesn't appear to support the HTML5 `<canvas>` element.

## Friday, January 28, 2011

### Raycasting two planes in WebGL

I always implemented this effect by actually writing a complete raytracer/caster, however the version of this effect in ShaderToy greatly simplifies this effect using some old fashioned trigonometry. With some help from my old democoding partner Fractoid we generated a mathematical explanation for this routine.

In essence we use the properties of self-similar triangles to find the intersection distance from the viewer to the plane (see image below).

The camera is shown in red, and the ray from the viewer to the plane is shown in green. The height of the plane above the viewer is 'h', and the distance along the plane to the intersection point is 'd'. We can use our friend from last time, the tangent function, to find the intersection point.

For the lower triangle, if we assume one unit of travel (x) then the angle (theta) will be y / 1.
```tan(theta) = y/1;
```
For the upper triangle, theta will be h / d.
```tan(theta) = h/d;
```
Since the triangles are similar triangles, both theta values are the same, so we can equate these two equations. That is:
```y / 1 = h / d
y = h / d
d = h / y
```

Now, to solve the same problem but flipped by 90 degrees for the horizontal axis we have the same formulation, except we no longer have a unit travel distance, but 'x' instead. This gives:
```y / x = h / d
y = h.x / d
d = h.x / y
```

Now we can plug these into our texture map to give us the overall effect:
``` uniform vec2 resolution; uniform float time; uniform sampler2D tex; void main(void) { vec2 p = -1.0 + 2.0 * gl_FragCoord.xy / resolution.xy; vec2 uv; float h = 0.25; uv.x = h*p.x / p.y; uv.y = 0.1*time + h / abs(p.y); gl_FragColor = vec4(texture2D(tex,uv).xyz * p.y*p.y, 1.0); } ```

You can see the result below. (You will need a WebGL enabled browser)
Your browser doesn't appear to support the HTML5 `<canvas>` element.

### Disruptive Innovations

Everyone wants to build a disruptive technology or a disruptive business. But what does that really mean? An article by Clayton Christensen, Mark Johnson and Darrell Rigby 'Foundations for Growth: How To Identify and Build Disruptive New Businesses' provides a description of disruptive, and identifies key requirements for disruptive businesses.

Christensen et al. distinguish disruptive innovations from staining innovations in that the latter only target existing customers in established markets. In contrast, disruptive innovations create new markets and generate new ways of doing business that effects established markets. Further clarifications:
• Industry incumbents aren't always the first to market with a sustaining innovation, but they almost always end up on top due to superior resources and high stakes.
• Innovations that help incumbent companies sell better products to their best customers are sustaining, not disruptive.
• Established businesses move towards the more profitable customers (see above), providing asymmetric motivation for entrants to attack less profitable customers (down-market)
• 'Down-market' are typically small and undefined but hold the best prospects for growth
According to the article, new growth business has a 10x better chance of success with a disruptive strategy. How do you know if you can follow a disruptive strategy? In essence the recommended requirement is a open, simple, low-cost product.
1. Can you create a new market base?
• Does it enable customers who could not previously DIY for lack of money or skills? That is, can it become mainstream rather than specialised?
• Is it targeted at customers who want a simple product?
• Will it allow customers to do more easily and effectively what they are already trying to do? i.e. Is it fulfilling an existing need/want, rather than creating a new one?
2. Can you disrupt from the low end?
• Are prevailing performance good enough? If not, then a simpler product will not succeed.
• Do you have a business model that targets thin profit margins and high net asset turns. Do competitors already outsource fabrication and assembly to the lowest-cost sources?
Now that you know your innovation has the potential to be disruptive, do you have the resources, processes and values you require for success?
• Resources: Do you have the best people and sufficient cash? A manager who is radically different from those established with core business who has the correct experience? Are you aware of the challenges the manager will face? Too much cash can allow bloat that weakens the product (remember our tight profit margins), too little and the product may never see the light of day.
• Processes and Values: Traditional and established processes can hamper a disruptive enterprise, an independent business unit that emphasises the key values of simplicity and low-cost is best.
For established companies you will want to:
• Start before you need to grow, while you can afford it (ie: while your company is still growing)
• Create an aggregate project plan with strategic objectives
• Train employees to scout for well positioned small companies that will help accelerate and improve your disruptive product
A successful disruptive product should generate profits almost immediately, but needs to be cautious to not out-grow your capabilities: new markets take time to emerge.

## Wednesday, January 26, 2011

### Executable compression

Packing an executable to a smaller size is a common task, and the compression algorithms share a lot in common with good text compression systems. Most good executable compression algorithms have some method of interpreting, filtering or transforming the program instructions to a more compressible format. The most well-known compression algorithms are:
• Run Length Encoding - where multiples of the same in-order data are represented in a compressed form. For example, the word "beekeeper", may be represented as "b(2x e)k(2x e)per", with an appropriate coding results in a reduction in size. This kind of compression works well on 8bit images and is found in image formats such as BMP.
• Huffman compression - each code is represented with a number of bits proportional to the frequency of occurrence. For example, we could represent vowels with shorter bits, and consonants with longer bit sequences, or "e" could be represented with "1" and every other character with 0 plus the usual 8 bits then "beekeeper" would come to "b11k11p1r" in a total of 4x 9 bits and 5x 1 bit totalling to 41 bits rather than 72 bits.
• Lempel-Ziv (LZ) - a dictionary of all previous data is kept and references are made back to the history of the data. For example, "beekeeper" could be represented as "be(-1,1)k(-3,2)p(-2,1),r" which translates to "be" plus the same character as before (-1) and one (1) byte of it, resulting in "bee", then "k", followed by the occurrence three characters before (-3) and (2) bytes of it, resulting in "beekee", etc.
• Arithmetic coding is similar to huffman coding (and explained very concisely by Charles Bloom), in essence it also assigns bits according to the probability of occurrence (as a "floating point" number), and then re-scales/encodes the probabilities to fixed-point. Arturo ("Dario Phong") Campos has a concise explanation.
• PAQ - this is generally considered the state-of-the-art compression algorithm. It uses conditional probabilistic models to achieve optimal compression.
There is plenty of source code around the web for these algorithms, but the Basic Compression Library provides routines for RLE (Run Length Encoding), Huffman, Rice, Lempel-Ziv (LZ77) and Shannon-Fano compression algorithms. And Matt Mahoney maintains the PAQ algorithms. There are also quite a few general executable compressors available including UPX and Mpress, both zip and 7zip can generate self-extracting executables. Microsoft CAB also performs very well (See trimmit for Mac). But the state-of-the-art in self-extracting executable compressors for small programs are Crinkler and kkrunchy for < 4kb and < 128kb programs respectively. (A good resource for various tools is available at in4k)

These compressors use a standard compression algorithm but also transform the input to a more compressible format. The most common transform algorithm is the Burros Wheeler Transform (BWT), this attempts to re-order data so that similar codes are close to each other - increasing the ability for the data to be compressed.

For executable compression patterns in the instructions stream can be leveraged to increase the compression ratio. For example, instructions often address similar areas of memory, simple delta-encoding of the address values can typically greatly increase the compression ratio.

Recently, both kkrunchy and Crinkler were described in detail. The author of kkrunchy Fabian "ryg" Giesen wrote a great blog post describing how kkrunchy works. Since kkrunchy works with larger file sizes some more intelligent tricks can be applied. Alexandre Mutel decomposed the Crinkler compressor in his blog post and posted some fabulous pictures of before-vs-after entropy of a compressed executable. Unfortunately, both of these programs are targeted at Windows PE files, and so don't work for other platforms (See here, here, here and here for tips on PE).

The usual approach to exe compression on linux/osx is just to do the modern-day version of com/cab dropping and compress your program with gzip and have a script extract and execute the program. This lacks a bit of flare, but works. Articles describing elf and mach-o are available by Brian Raiter and Amit Singh respectively, but unfortunately they are a little out of date.

A few tools have been written for linux and osx including the byte-optimized linker (x86/64) and obj stripping for linux, and iPakk and muncho for OSX (again, a bit dated). An older elf stripper is also around, and Timo Wiren gives a good overview too. Unfortunately, a similar page on OSX is no longer available online (maybe one day). All in all, I've still found your best bet on a platform other than Windows is just to follow Timo Wiren's first step of advice and do a simple dropper:
1. Write your program, and compile it: g++ helloworld.cpp
```#include

int main() {
printf("hello world!\n");
return 0;
}
```
2. Optional: Strip it
`strip a.out`
3. Compress it:
`gzip a.out`
4. Create the unpack script (unpack.header):
```a=/tmp/I;tail -n+2 \$0|zcat>\$a;chmod +x \$a;\$a;rm \$a; exit
```
5. Combine the script and zip into a final executable:
```cat unpack.header a.out.gz > program
chmod +x program
```
6. Enjoy!

My a.out went from 8768 to 1153 bytes. Nice. (Using strip saves me 160 bytes on the original and 136 bytes on the final, but I'm sure you can save more by being clever about use of the gcc crt, gcc flags and throwing a few tools at this process).

## Tuesday, January 11, 2011

### Tunnel effect explained in WebGL

The tunnel effect builds on the circle equation we covered in the introductory XOR bitplasma. However, this time we need to add a two more concepts. First, we need to calculate the angle of each point on the screen, relative to the origin (center of screen). To find this we can use the properties of right-angle triangles, namely the tangent function (tan). That is, the tangent of the angle is equal to opposite over adjacent.

 Visualization of the angle for each pixel
By plugging in our pixel x/y coordinates we get a range from zero to pi of angles. Dividing by pi gives us a range from zero to one. In code this is:
``` float a = atan(p.y,p.x); result = a/(3.1416); ```

 Tunnel depth
Second, we need to find the "depth" of each point in the tunnel. Luckily this is easy to do, as projecting a 3d point to a 2d viewer is just a "divide by z" operation (objects that are further away appear smaller than objects that are closer - a nicer explanation will follow in future posts). So we can just generate a gradient of circles which we can treat as the distance from the viewer (our "z" value) and invert this (ie: 1/z).

Finally to get a good effect we need to apply a texture map. So combining the two coordinates we use the angular coordinate for the "v" coordinate and the depth coordinate as out "u" value.

So the complete code for our tunnel-shader is:
``` uniform vec2 resolution; uniform float time; uniform sampler2D tex; void main(void) { vec2 p = -1.0 + 2.0 * gl_FragCoord.xy / resolution.xy; vec2 uv; float a = atan(p.y,p.x); float r = sqrt(dot(p,p)); uv.x = 0.1/r; uv.y = a/(3.1416); vec3 col = texture2D(tex,uv).xyz; gl_FragColor = vec4(col,1.0); } ```

You can see the result below! (You will need a WebGL enabled browser)
Your browser doesn't appear to support the HTML5 `<canvas>` element.

As a small extension to this we can progressively change the power of the square function in the circle equation to generate a less and less round "curve", progressively changing the circular tunnel into a square tunnel.
 Graph of increasing squares

So starting from:
``` float power = 1.0; float r = pow( pow(p.x*p.x,power) + pow(p.y*p.y,power), 1.0/(2*power) ); ```

We can slowly increase the power until we get a square-like tunnel. The image shows the effect of increasing the power term, the final bottom right two images show the contrast between the round and square-like tunnels.

## Sunday, January 09, 2011

### XOR Demoeffect in WebGL

I wanted to do a series on a set of demoeffects, inspired by Shader Toy (and Denthor!). The majority of the shader toy effects are either raytraced, or polar/circle-based. So I thought I would begin with polar/circles and work my way from there. Most of the effects are based on a number of "oldschool" democoder tricks that are easy to understand once you have the background knowledge.

I won't be covering the basics behind writing shaders, but I might do so at a later point.

All of iq's shaders are rendered using a pixel shader (or fragment shader in OpenGL talk). The first step is to calculate the position of the pixel on the screen (or canvas in case of WebGL) normalised from -1 to 1. In a picture this is:

To visualise this in the shader we could write:
``` void main(void) { vec2 p = -1.0 + 2.0 * gl_FragCoord.xy / resolution.xy; gl_FragColor = vec4(p.x,p.y,0.0,1.0); } ```
Now each horizontal value will get an increasing value of red, and each vertical value will get an increasing value of green (we are using the RGBA color space)

We can use the cartesian equation of a circle to generate a gradient of circles.

Again, to visualise this in the shader we could write:
``` void main(void) { vec2 p = -1.0 + 2.0 * gl_FragCoord.xy / resolution.xy; float radius = sqrt(p.x*p.x + p.y*p.y); gl_FragColor = vec4(radius,0.0,0.0,1.0); } ```

If we want to generate concentric circles we could set a boolean value on or off depending on a modulo operation. If we take the modulo of a value and then test whether it is above half-way then we can generate an on-off pulse. For example, if we get a value ranging from 0 to 1, module 0.1, we can generate a on/off pulse by testing if it is greater than 0.05.

In shader code this would be:
``` void main(void) { vec2 p = -1.0 + 2.0 * gl_FragCoord.xy / resolution.xy; float radius = sqrt(p.x*p.x + p.y*p.y); float modulo = mod(radius,0.1); bool toggle = false; if (modulo > 0.05) toggle = true; if (toggle) gl_FragColor = vec4(1.0,0.0,0.0,1.0); else gl_FragColor = vec4(0.0,0.0,0.0,1.0); } ```

This is a bit long-handed, we can shorten this by directly setting the toggle value, and using the dot product to perform the square operations.
ie:
``` float radius = sqrt(dot(p,p)); bool toggle = mod(radius,0.1)>0.05; ```

Great! Now we have all the background knowledge we need to make our first interesting effect. This will be based on the XOR operation:
XOR Truth Table
Input Output
A B
0 0 0
0 1 1
1 0 1
1 1 0

First, lets generate two different circles.
``` void main(void) { vec2 p = -1.0 + 2.0 * gl_FragCoord.xy / resolution.xy; vec2 offset2 = vec2(0.3,0.3); float radius1 = sqrt(dot(p,p)); float radius2 = sqrt(dot(p-offset2,p-offset2)); bool toggle1 = mod(radius1,0.1)>0.05; bool toggle2 = mod(radius2,0.1)>0.05; gl_FragColor = vec4(toggle1,toggle2,0.0,1.0); } ```

Wonderfull! Now if we add in the XOR truth table:
``` //xor via if statements float col = 0.0; if (toggle1) col = 1.0; if (toggle2) col = 1.0; if ((toggle1) && (toggle2)) col = 0.0; gl_FragColor = vec4(col,0.0,0.0,1.0); ```
And we get a wonderful overlapping pattern.

Add in a bit of animation and we have our first demoeffect! Click the button below in a WebGL enabled browser (Firefox, Chrome, Safari). (view source for full code)

Your browser doesn't appear to support the HTML5 `<canvas>` element.

## Friday, January 07, 2011

### Future CPU/GPU architectures and OpenCL

For a while x86 CPUs were just aimed at getting faster, more and more ticks. Then, there was a shift from faster to more efficient (in terms of power used / heat generated vs. performance) in the Pentium 3/4 era. Then, another shift towards multiple cores. Now, the next shift has occurred: CPU and GPU fusion. (Despite AMD talking about it for longer than I can recall. I think I was talking to an AMD VP shortly before they purchased ATI about how GPU and CPU fusion was the future.)

Recently Intel released Sandy Bridge their CPU/GPU processor, and AMD's fusion APU's have been around. Interestingly, both are targeted at laptops (the bulk of the market), and still expect discrete GPU's for high-end performance. The fusion is just a bit of a bonus.

Interestingly all those 2009 rumours about nVidia building an x86 chip turned out to be wrong, with the real product being far more interesting. An nVidia fusion product of an ARM CPU with nVidia GPU's. This might prove to be a very interesting result considering the amount of effort being put into improving the ARM architecture and the power usage concerns in both smartphones/tablets and supercomputing. With the next version of windows running on ARM there will certainly be interesting times ahead.

Intel/AMD aren't standing still on the CPU front and are pushing ahead with AVX but the real interesting part is that OpenCL is being pushed across the board. Recent publications from the UK GPU computing conference demonstrate that ARM are pushing OpenCL as their platform of choice (for both CPU and their Mali GPU, it is Clang/LLVM based), and AMD and Intel have been strong supporters of OpenCL too. Did you know Samsung supports OpenCL too?

It would seem that OpenCL has a very strong support base, and is likely the platform of choice for developers on Intel, AMD, ARM, Apple, IBM, etc. What hope does CUDA stand? It seems nVidia will eventually be forced to drop CUDA, or invest heavily into CPU-based CUDA (Hello Ocelot!) or OpenCL translation. Eventually people will tire of writing and re-writing CUDA programs as they swap between platforms.

Envision the not-so-far future where you have a workstation with multiple CPU/GPU fusion cores [non-CUDA], and a discrete fusion GPU (GPU [CUDA]/ARM [non-CUDA] core) - are you really willing to write a specific CUDA routine for just the nVidia GPU, or are you more likely to try a less optimal OpenCL routine that will then also then run on the CPU, CPU SIMD, CPU/GPU fusion and ARM cores?

It seems that the argument being put forward by nVidia to write everything in CUDA for optimal performance will not hold as you will gain more from all the other devices helping out in the computation compared to the loss from using OpenCL over CUDA.

Of course there are other choices out there such as Accelerator and RapidMind.

The UK conference reveals Microsoft's GPGPU language "Accelerator" has made progress, and now is no longer only GPU limited (it now supports SSE3 on modern CPU's). I'm still not aware of anyone using this for anything practical though, which seems to be a bit of a shame. And ever since Intel bought RapidMind (now Array Building Blocks) the tech became very Intel-specific. So neither of those choices sound too promising.

It will be interesting to see what nVidia decide to do with CUDA, how developers will adjust to fusion CPU / GPUs and what higher-level language (or language extension?) will become de-facto in the future. Stay tuned..

## Sunday, January 02, 2011

### 2010 Summary

Looking back, 2010 was a big year. My life was dominated by three factors:
A bit of a departure from 2009 which was mostly focused on GPGPU, fluid simulations, physics engines, and computer graphics. I still have a backlog of content to write about from my GPGPU work in 2009, but its starting to be less relevant as the GPGPU world pushes ahead, but I'm aiming to post more in 2011 than I managed in 2010. Should be a good year!