Thursday 16 January 2014

Tech Feature: SSAO and Temporal Blur

Screen space ambient occlusion (SSAO) is the standard solution for approximating ambient occlusion in video games. Ambient occlusion is used to represent how exposed each point is to the indirect lighting from the scene. Direct lighting is light emitted from a light source, such as a lamp or a fire. The direct light then illuminates objects in the scene. These illuminated objects make up the indirect lighting. Making each object in the scene cast indirect lighting is very expensive. Ambient occlusion is a way to approximate this by using a light source with constant color and information from nearby geometry to determine how dark a part of an object should be. The idea behind SSAO is to get geometry information from the depth buffer.

There are many publicised algorithms for high quality SSAO. This tech feature will instead focus on improvements that can be made after the SSAO has been generated.

SSAO Algorithm
SOMA uses a fast and straightforward algorithm for generating medium frequency AO. The algorithm runs at half resolution which greatly increases the performance. Running at half resolution doesn’t reduce the quality by much, since the final result is blurred.

For each pixel on the screen, the shader calculates the position of the pixel in view space and then compares that position with the view space position of nearby pixels. How occluded the pixel gets is based on how close the points are to each other and if the nearby point is in front of the surface normal. The occlusion for each nearby pixel is then added together for the final result. 

SOMA uses a radius of 1.5m to look for nearby points that might occlude. Sampling points that are outside of the 1.5m range is a waste of resources, since they will not contribute to the AO. Our algorithm samples 16 points in a growing circle around the main pixel. The size of the circle is determined by how close the main pixel is to the camera and how large the search radius is. For pixels that are far away from the camera, a radius of just a few pixels can be used. The closer the point gets to the camera the more the circle grows - it can grow up to half a screen. Using only 16 samples to select from half a screen of pixels results in a grainy result that flickers when the camera is moving.
Grainy result from the SSAO algorithm
Bilateral Blur
Blurring can be used to remove the grainy look of the SSAO. Blur combines the value of a large number of neighboring pixels. The further away a neighboring pixel is, the less the impact it will have on the final result. Blur is run in two passes, first in the horizontal direction and then in the vertical direction.

The issue with blurring SSAO this way quickly becomes apparent. AO from different geometry leaks between boundaries causing a bright halo around objects. Bilateral weighting can be used to fix the leaks between objects. It works by comparing the depth of the main pixel to the depth of the neighboring pixel. If the distance between the depth of the main and the neighbor is outside of a limit the pixel will be skipped. In SOMA this limit is set to 2cm.
To get good-looking blur the number of neighboring pixels to sample needs to be large. Getting rid of the grainy artifacts requires over 17x17 pixels to be sampled at full resolution.

Temporal Filtering 
Temporal Filtering is a method for reducing the flickering caused by the low number of samples. The result from the previous frame is blended with the current frame to create smooth transitions. Blending the images directly would lead to a motion-blur-like effect. Temporal Filtering removes the motion blur effect by reverse reprojecting the view space position of a pixel to the view space position it had the previous frame and then using that to sample the result. The SSAO algorithm runs on screen space data but AO is applied on world geometry. An object that is visible in one frame may not be seen in the next frame, either because it has moved or because the view has been blocked by another object. When this happens the result from the previous frame has to be discarded. The distance between the points in world space determines how much of the result from the previous frame should be used.

Explanation of Reverse Reprojection used in Frostbite 2 [2]
Temporal Filtering introduces a new artifact. When dynamic objects move close to static objects they leave a trail of AO behind. Frostbite 2’s implementation of Temporal Filtering solves this by disabling the Temporal Filter for stable surfaces that don’t get flickering artifacts. I found another way to remove the trailing while keeping Temporal Filter for all pixels.

Shows the trailing effect that happens when a dynamic object is moved. The Temporal Blur algorithm is then applied and most of the trailing is removed.

Temporal Blur 

(A) Implementation of Temporal Filtered SSAO (B) Temporal Blur implementation 
I came up with a new way to use Temporal Filtering when trying to remove the trailing artifacts. By combining two passes of cheap blur with Temporal Filtering all flickering and grainy artifacts can be removed without leaving any trailing. 

When the SSAO has been rendered, a cheap 5x5 bilateral blur pass is run on the result. Then the blurred result from the previous frame is applied using Temporal Filtering. A 5x5 bilateral blur is then applied to the image. In addition to using geometry data to calculate the blending amount for the Temporal Filtering the difference in SSAO between the frames is used, removing all trailing artifacts. 

Applying a blur before and after the Temporal Filtering and using the blurred image from the previous frame results in a very smooth image that becomes more blurred for each frame, it also removes any flickering. Even a 5x5 blur will cause the resulting image to look as smooth as a 64x64 blur after a few frames.

Because the image gets so smooth the upsampling can be moved to after the blur. This leads to Temporal Blur being faster, since running four 5x5 blur passes in half resolution is faster than running two 17x17 passes in full resolution. 

All of the previous steps are performed in half resolution. To get the final result it has to be scaled up to full resolution. Stretching the half resolution image to twice its size will not look good. Near the edges of geometry there will be visible bleeding; non-occluded objects will have a bright pixel halo around them. This can be solved using the same idea as the bilateral blurring. Normal linear filtering is combined with a weight calculated by comparing the distance in depth between the main pixel and the depth value of the four closest half resolution pixels.

Combining SSAO with the Temporal Blur algorithm produces high quality results for a large search radius at a low cost. The total cost of the algoritm is 1.1ms (1920x1080 AMD 5870). This is more than twice as fast as a normal SSAO implementation.

SOMA uses high frequency AO baked into the diffuse texture in addition to the medium frequency AO generated by the SSAO.

Temporal Blur could be used to improve many other post effects that need to produce smooth-looking results.

Ambient Occlusion is only one part of the rendering pipeline, and it should be combined with other lighting techniques to give the final look.


 // SSAO Main loop

//Scale the radius based on how close to the camera it is
 float fStepSize = afStepSizeMax * afRadius / vPos.z;
 float fStepSizePart = 0.5 * fStepSize / ((2 + 16.0));    

 for(float d = 0.0; d < 16.0; d+=4.0)
        // Sample four points at the same time

        vec4 vOffset = (d + vec4(2, 3, 4, 5))* fStepSizePart;
        // Rotate the samples

        vec2 vUV1 = mtxRot * vUV0;
        vUV0 = mtxRot * vUV1;

        vec3 vDelta0 = GetViewPosition(gl_FragCoord.xy + vUV1 * vOffset.x) - vPos;
        vec3 vDelta1 = GetViewPosition(gl_FragCoord.xy - vUV1 * vOffset.y) - vPos;
        vec3 vDelta2 = GetViewPosition(gl_FragCoord.xy + vUV0 * vOffset.z) - vPos;
        vec3 vDelta3 = GetViewPosition(gl_FragCoord.xy - vUV0 * vOffset.w) - vPos;

        vec4 vDistanceSqr = vec4(dot(vDelta0, vDelta0),
                                 dot(vDelta1, vDelta1),
                                 dot(vDelta2, vDelta2),
                                 dot(vDelta3, vDelta3));

        vec4 vInvertedLength = inversesqrt(vDistanceSqr);

        vec4 vFalloff = vec4(1.0) + vDistanceSqr * vInvertedLength * fNegInvRadius;

        vec4 vAngle = vec4(dot(vNormal, vDelta0),
                            dot(vNormal, vDelta1),
                            dot(vNormal, vDelta2),
                            dot(vNormal, vDelta3)) * vInvertedLength;

        // Calculates the sum based on the angle to the normal and distance from point

        fAO += dot(max(vec4(0.0), vAngle), max(vec4(0.0), vFalloff));

// Get the final AO by multiplying by number of samples
fAO = max(0, 1.0 - fAO / 16.0);


// Upsample Code
vec2 vClosest = floor(gl_FragCoord.xy / 2.0);
vec2 vBilinearWeight = vec2(1.0) - fract(gl_FragCoord.xy / 2.0);

float fTotalAO = 0.0;
float fTotalWeight = 0.0;

for(float x = 0.0; x < 2.0; ++x)
for(float y = 0.0; y < 2.0; ++y)
       // Sample depth (stored in meters) and AO for the half resolution 
       float fSampleDepth = textureRect(aHalfResDepth, vClosest + vec2(x,y));
       float fSampleAO = textureRect(aHalfResAO, vClosest + vec2(x,y));

       // Calculate bilinear weight
       float fBilinearWeight = (x-vBilinearWeight .x) * (y-vBilinearWeight .y);
       // Calculate upsample weight based on how close the depth is to the main depth
       float fUpsampleWeight = max(0.00001, 0.1 - abs(fSampleDepth – fMainDepth)) * 30.0;

       // Apply weight and add to total sum
       fTotalAO += (fBilinearWeight + fUpsampleWeight) * fSampleAO;
       fTotalWeight += (fBilinearWeight + fUpsampleWeight);

// Divide by total sum to get final AO
float fAO = fTotalAO / fTotalWeight;


// Temporal Blur Code

// Get current frame depth and AO

vec2 vScreenPos = floor(gl_FragCoord.xy) + vec2(0.5);
float fAO = textureRect(aHalfResAO, vScreenPos.xy);
float fMainDepth = textureRect(aHalfResDepth, vScreenPos.xy);   

// Convert to view space position
vec3 vPos = ScreenCoordToViewPos(vScreenPos, fMainDepth);

// Convert the current view position to the view position it 
// would represent the last frame and get the screen coords
vPos = (a_mtxPrevFrameView * (a_mtxView
Inv * vec4(vPos, 1.0))).xyz;

vec2 vTemporalCoords = ViewPosToScreenCoord(vPos);

// Get the AO from the last frame

float fPrevFrameAO = textureRect(aPrevFrameAO, vTemporalCoords.xy);

float f
PrevFrameDepth = textureRect(aPrevFrameDepth, vTemporalCoords.xy);

// Get to view space position of temporal coords

vec3 vTemporalPos =
ScreenCoordToViewPos(vTemporalCoords.xy, fPrevFrameDepth);

// Get weight based on distance to last frame position (removes ghosting artifact)

float fWeight = distance(vTemporalPos, vPos) * 9.0;

// And weight based on how different the amount of AO is (removes trailing artifact)
// Only works if both fAO and fPrevFrameAO is blurred
fWeight += abs(
fPrevFrameAO - fAO ) * 5.0;

// Clamp to make sure atleast 1.0 / FPS of a frame is blended

fWeight = clamp(fWeight, afFrameTime, 1.0);       
fAO = mix(fPrevFrameAO , fAO , fWeight);


  1. Good to see you are improving the rendering capabilities.

    If you ever consider changing the development engine altogether to a 3rd party one, I recommend the Unreal Engine for you, as it suits the Frictional Game's dark artistic style well.

  2. This comment has been removed by the author.

    1. Isn't a non "proper" AA at full HD resolution enough ?
      I assume the last render shown here hasn't any AA as it's demonstrating the SSAO technique used. Still, it's hard to notice any Aliasing but i might be wrong.

      I think that what should bother graphics developers more is the banding you get in color grading especially when playing dark and monochrome games. Not sure if it's a limitation somewhere in the apis, screen support or due to too much effect passes...

  3. I am guessing they will be using SMAA or FXAA as post processing and multisampled buffers do not work so well together.

    I like the temporal aspect of the algorithm, have you guys read TSSAO article in GPU Pro 2?

    Will you let us edit some ini files to run the AO pass at full resolution? and more samples? I want to stress my pc to the max :D

    I am also interested in what you guys think about this SSAO technique:
    It uses a 'dual depth' linear extrapolation, I use it in my project without a blur pass and it looks very good for a depth only SSAO, very little halo artifacts even.
    Just wondering what your thoughts are on this algorithm.

    1. You will be able to run the AO at full resolution if you want.

      That algorithm looks really good for one that doesnt a use blur. The only negative thing about that shader is that it uses 36 samples per pixel and seems to be run in fullscreen.

  4. Awesome article.
    A question. Is it possible/feasible to use the difference in depth between the current fragment and the samples to introduce additional weight factor, so that objects that are relatively far from the background contribute little or nothing to the AO (of the background)? This would avoid a black halo that follows objects that the player can pick up and carry around. As you move an object away from a background surface, it blocks less and less ambient light w/ respect to a point on that surface, so it should contribute less AO.
    Have you experimented with something like that? If so, what did the results look like?

    1. The algorithm already does that. The halo that follows the object is from the temporal filtering and not the SSAO generation.

    2. Oh! Should have read the article/code more carefully. It would be great if the halo could be avoided, though; then again, in (visually) dark, horror-themed games, I suppose it's less of a problem.

    3. The algorithm already fixes the halo by applying the temporal blur. Look at the video 12 seconds in. But im not sure we are talking about the same thing :)

  5. To be honest I'm a little confused. The picture in the "Temporal Blur" article section indicates that there are two additional steps to the base algorithm - the temporal filter and temporal blur but you posted only one additional shader, called "Temporal Blur Code". Moreover, this shader doesn't do any 5x5 filtering but looks more like the "Temporal Filter" block from the "Temporal Blur" article section. Could you elaborate more on this, please? :)


Note: only a member of this blog may post a comment.