Screen
space ambient occlusion (SSAO) is the standard solution for approximating
ambient occlusion in video games. Ambient occlusion is used to represent how
exposed each point is to the indirect lighting from the scene. Direct lighting
is light emitted from a light source, such as a lamp or a fire. The direct
light then illuminates objects in the scene. These illuminated objects make up
the indirect lighting. Making each object in the scene cast indirect lighting
is very expensive. Ambient occlusion is a way to approximate this by using a light source with constant color and information from nearby geometry to determine how dark a part of an object should be. The idea behind SSAO is to get geometry information from the depth buffer.
There are
many publicised algorithms for high quality SSAO. This tech feature will instead focus on improvements that can be made after the SSAO has been generated.
SSAO Algorithm
SOMA uses a
fast and straightforward algorithm for generating medium frequency AO. The
algorithm runs at half resolution which greatly increases the performance.
Running at half resolution doesn’t reduce the quality by much, since the final
result is blurred.
For each
pixel on the screen, the shader calculates the position of the pixel in view
space and then compares that position with the view space position of nearby
pixels. How occluded the pixel gets is based on how close the points are to
each other and if the nearby point is in front of the surface normal. The
occlusion for each nearby pixel is then added together for the final result.
SOMA uses a
radius of 1.5m to look for nearby points that might occlude. Sampling points that are outside
of the 1.5m range is a waste of resources, since they will not contribute to the AO. Our algorithm samples 16 points in
a growing circle around the main pixel. The size of the circle is determined by
how close the main pixel is to the camera and how large the search radius is.
For pixels that are far away from the camera, a radius of just a few pixels can
be used. The closer the point gets to the camera the more the circle grows - it
can grow up to half a screen. Using only 16 samples to select from half a
screen of pixels results in a grainy result that flickers when the camera is
moving.
Grainy result from the SSAO algorithm
Bilateral Blur
Blurring
can be used to remove the grainy look of the SSAO. Blur combines the value of a
large number of neighboring pixels. The further away a neighboring pixel is, the less the
impact it will have on the final result. Blur is run in two passes, first in
the horizontal direction and then in the vertical direction.
The issue
with blurring SSAO this way quickly becomes apparent. AO from different
geometry leaks between boundaries causing a bright halo around objects. Bilateral weighting can be used to fix the
leaks between objects. It works by comparing the depth of the main pixel to the
depth of the neighboring pixel. If the distance between the depth of the main and the neighbor is outside of a limit the pixel will be skipped. In SOMA this limit is
set to 2cm.
To get good-looking blur the number of neighboring pixels to sample needs to be large. Getting
rid of the grainy artifacts requires over 17x17 pixels to be sampled at full resolution.
Temporal Filtering
Temporal
Filtering is a method for reducing the flickering caused by the low number of
samples. The result from the previous frame is blended with the current frame
to create smooth transitions. Blending the images directly would lead to a
motion-blur-like effect. Temporal Filtering removes the motion blur effect by reverse
reprojecting the view space position of a pixel to the view space position it had the
previous frame and then using that to sample the result. The SSAO algorithm
runs on screen space data but AO is applied on world geometry. An object that
is visible in one frame may not be seen in the next frame, either because it
has moved or because the view has been blocked by another object. When this
happens the result from the previous frame has to be discarded. The distance
between the points in world space determines how much of the result from the
previous frame should be used.
Explanation of Reverse Reprojection used in Frostbite 2 [2]
Temporal
Filtering introduces a new artifact. When dynamic objects move close to static
objects they leave a trail of AO behind. Frostbite 2’s implementation of
Temporal Filtering solves this by disabling the Temporal Filter for stable surfaces that don’t get flickering artifacts. I found another way to remove the
trailing while keeping Temporal Filter for all pixels.
Shows the trailing effect that happens when a dynamic object is moved. The Temporal Blur algorithm is then applied and most of the trailing is removed.
Temporal Blur
(A) Implementation of Temporal Filtered SSAO (B) Temporal Blur implementation
I came up with a
new way to use Temporal Filtering when trying to remove the trailing artifacts. By combining two passes of cheap blur with
Temporal Filtering all flickering and grainy artifacts can be removed without leaving
any trailing.
When the SSAO has been rendered, a cheap 5x5 bilateral blur pass is run on the result. Then the blurred result from the previous frame is applied
using Temporal Filtering. A 5x5 bilateral blur is then applied to the image. In addition to using geometry data to calculate the blending
amount for the Temporal Filtering the difference in SSAO between the frames is
used, removing all trailing artifacts.
Applying a blur before and after the
Temporal Filtering and using the blurred image from the previous frame results
in a very smooth image that becomes more blurred for each frame, it also
removes any flickering. Even a 5x5 blur will cause the resulting image to look
as smooth as a 64x64 blur after a few frames.
Because the
image gets so smooth the upsampling can be moved to after the blur. This leads
to Temporal Blur being faster, since running four 5x5 blur passes in half
resolution is faster than running two 17x17 passes in full resolution.
Upsampling
All of the previous steps are performed in half resolution. To get the final result it has to
be scaled up to full resolution. Stretching the half resolution image to twice
its size will not look good. Near the edges of geometry there will be visible
bleeding; non-occluded objects will have a bright pixel halo around them. This
can be solved using the same idea as the bilateral blurring. Normal linear
filtering is combined with a weight calculated by comparing the distance in depth
between the main pixel and the depth value of the four closest half resolution
pixels.
Summary
Combining SSAO with the Temporal Blur algorithm produces high quality results for a large search radius at a low cost. The total cost of the algoritm is 1.1ms (1920x1080 AMD 5870). This is more than twice as fast as a normal SSAO implementation.
SOMA uses high frequency AO baked into the diffuse texture in addition to the medium frequency AO generated by the SSAO.
Temporal Blur could be used to improve many other post effects that need to produce smooth-looking results. SOMA uses high frequency AO baked into the diffuse texture in addition to the medium frequency AO generated by the SSAO.
Ambient Occlusion is only one part of the rendering pipeline, and it should be combined with other lighting techniques to give the final look.
References
- http://gfx.cs.princeton.edu/pubs/Nehab_2007_ARS/NehEtAl07.pdf
- http://dice.se/wp-content/uploads/GDC12_Stable_SSAO_In_BF3_With_STF.pdf
// SSAO Main loop
//Scale the radius based on how close to the camera it is
float fStepSize = afStepSizeMax * afRadius / vPos.z;
float fStepSizePart = 0.5 * fStepSize / ((2 + 16.0));
float fStepSizePart = 0.5 * fStepSize / ((2 + 16.0));
for(float d = 0.0; d < 16.0; d+=4.0)
{
//////////////
// Sample four points at the same time
vec4 vOffset = (d + vec4(2, 3, 4, 5))* fStepSizePart;
{
//////////////
// Sample four points at the same time
vec4 vOffset = (d + vec4(2, 3, 4, 5))* fStepSizePart;
//////////////////////
// Rotate the samples
vec2 vUV1 = mtxRot * vUV0;
vUV0 = mtxRot * vUV1;
vec3 vDelta0 = GetViewPosition(gl_FragCoord.xy + vUV1 * vOffset.x) - vPos;
vec3 vDelta1 = GetViewPosition(gl_FragCoord.xy - vUV1 * vOffset.y) - vPos;
vec3 vDelta2 = GetViewPosition(gl_FragCoord.xy + vUV0 * vOffset.z) - vPos;
vec3 vDelta3 = GetViewPosition(gl_FragCoord.xy - vUV0 * vOffset.w) - vPos;
vec4 vDistanceSqr = vec4(dot(vDelta0, vDelta0),
dot(vDelta1, vDelta1),
dot(vDelta2, vDelta2),
dot(vDelta3, vDelta3));
vec4 vInvertedLength = inversesqrt(vDistanceSqr);
vec4 vFalloff = vec4(1.0) + vDistanceSqr * vInvertedLength * fNegInvRadius;
vec4 vAngle = vec4(dot(vNormal, vDelta0),
dot(vNormal, vDelta1),
dot(vNormal, vDelta2),
dot(vNormal, vDelta3)) * vInvertedLength;
////////////////////
// Calculates the sum based on the angle to the normal and distance from point
fAO += dot(max(vec4(0.0), vAngle), max(vec4(0.0), vFalloff));
// Rotate the samples
vec2 vUV1 = mtxRot * vUV0;
vUV0 = mtxRot * vUV1;
vec3 vDelta0 = GetViewPosition(gl_FragCoord.xy + vUV1 * vOffset.x) - vPos;
vec3 vDelta1 = GetViewPosition(gl_FragCoord.xy - vUV1 * vOffset.y) - vPos;
vec3 vDelta2 = GetViewPosition(gl_FragCoord.xy + vUV0 * vOffset.z) - vPos;
vec3 vDelta3 = GetViewPosition(gl_FragCoord.xy - vUV0 * vOffset.w) - vPos;
vec4 vDistanceSqr = vec4(dot(vDelta0, vDelta0),
dot(vDelta1, vDelta1),
dot(vDelta2, vDelta2),
dot(vDelta3, vDelta3));
vec4 vInvertedLength = inversesqrt(vDistanceSqr);
vec4 vFalloff = vec4(1.0) + vDistanceSqr * vInvertedLength * fNegInvRadius;
vec4 vAngle = vec4(dot(vNormal, vDelta0),
dot(vNormal, vDelta1),
dot(vNormal, vDelta2),
dot(vNormal, vDelta3)) * vInvertedLength;
////////////////////
// Calculates the sum based on the angle to the normal and distance from point
fAO += dot(max(vec4(0.0), vAngle), max(vec4(0.0), vFalloff));
}
//////////////////////////////////
// Get the final AO by multiplying by number of samples
// Get the final AO by multiplying by number of samples
fAO = max(0, 1.0 - fAO / 16.0);
------------------------------------------------------------------------------
// Upsample Code
vec2 vClosest = floor(gl_FragCoord.xy / 2.0);
vec2 vBilinearWeight = vec2(1.0) - fract(gl_FragCoord.xy / 2.0);
float fTotalAO = 0.0;
float fTotalWeight = 0.0;
for(float x = 0.0; x < 2.0; ++x)
for(float y = 0.0; y < 2.0; ++y)
{
// Sample depth (stored in meters) and AO for the half resolution
float fSampleDepth = textureRect(aHalfResDepth, vClosest + vec2(x,y));
float fSampleAO = textureRect(aHalfResAO, vClosest + vec2(x,y));
// Calculate bilinear weight
float fBilinearWeight = (x-vBilinearWeight .x) * (y-vBilinearWeight .y);
// Calculate upsample weight based on how close the depth is to the main depth
float fUpsampleWeight = max(0.00001, 0.1 - abs(fSampleDepth – fMainDepth)) * 30.0;
// Apply weight and add to total sum
fTotalAO += (fBilinearWeight + fUpsampleWeight) * fSampleAO;
fTotalWeight += (fBilinearWeight + fUpsampleWeight);
}
// Divide by total sum to get final AO
float fAO = fTotalAO / fTotalWeight;
-------------------------------------------------------------------------------------
// Temporal Blur Code
//////////////////
// Get current frame depth and AO
vec2 vScreenPos = floor(gl_FragCoord.xy) + vec2(0.5);
float fAO = textureRect(aHalfResAO, vScreenPos.xy);
// Get current frame depth and AO
vec2 vScreenPos = floor(gl_FragCoord.xy) + vec2(0.5);
float fAO = textureRect(aHalfResAO, vScreenPos.xy);
float fMainDepth = textureRect(aHalfResDepth, vScreenPos.xy);
//////////////////
// Convert to view space position
// Convert to view space position
vec3 vPos = ScreenCoordToViewPos(vScreenPos, fMainDepth);
/////////////////////////
// Convert the current view position to the view position it
// Convert the current view position to the view position it
// would represent the last frame and get the screen coords
vPos = (a_mtxPrevFrameView * (a_mtxViewInv * vec4(vPos, 1.0))).xyz;
vec2 vTemporalCoords = ViewPosToScreenCoord(vPos);
//////////////
// Get the AO from the last frame
float fPrevFrameAO = textureRect(aPrevFrameAO, vTemporalCoords.xy);
float fPrevFrameDepth = textureRect(aPrevFrameDepth, vTemporalCoords.xy);
vPos = (a_mtxPrevFrameView * (a_mtxViewInv * vec4(vPos, 1.0))).xyz;
vec2 vTemporalCoords = ViewPosToScreenCoord(vPos);
//////////////
// Get the AO from the last frame
float fPrevFrameAO = textureRect(aPrevFrameAO, vTemporalCoords.xy);
float fPrevFrameDepth = textureRect(aPrevFrameDepth, vTemporalCoords.xy);
/////////////////
// Get to view space position of temporal coords
vec3 vTemporalPos = ScreenCoordToViewPos(vTemporalCoords.xy, fPrevFrameDepth);
///////
// Get weight based on distance to last frame position (removes ghosting artifact)
float fWeight = distance(vTemporalPos, vPos) * 9.0;
// Get to view space position of temporal coords
vec3 vTemporalPos = ScreenCoordToViewPos(vTemporalCoords.xy, fPrevFrameDepth);
///////
// Get weight based on distance to last frame position (removes ghosting artifact)
float fWeight = distance(vTemporalPos, vPos) * 9.0;
////////////////////////////////
// And weight based on how different the amount of AO is (removes trailing artifact)
// Only works if both fAO and fPrevFrameAO is blurred
fWeight += abs(fPrevFrameAO - fAO ) * 5.0;
////////////////
// Clamp to make sure atleast 1.0 / FPS of a frame is blended
fWeight = clamp(fWeight, afFrameTime, 1.0);
fWeight += abs(fPrevFrameAO - fAO ) * 5.0;
////////////////
// Clamp to make sure atleast 1.0 / FPS of a frame is blended
fWeight = clamp(fWeight, afFrameTime, 1.0);
fAO = mix(fPrevFrameAO , fAO , fWeight);
------------------------------------------------------------------------------
Good to see you are improving the rendering capabilities.
ReplyDeleteIf you ever consider changing the development engine altogether to a 3rd party one, I recommend the Unreal Engine for you, as it suits the Frictional Game's dark artistic style well.
This comment has been removed by the author.
ReplyDeleteIsn't a non "proper" AA at full HD resolution enough ?
DeleteI assume the last render shown here hasn't any AA as it's demonstrating the SSAO technique used. Still, it's hard to notice any Aliasing but i might be wrong.
I think that what should bother graphics developers more is the banding you get in color grading especially when playing dark and monochrome games. Not sure if it's a limitation somewhere in the apis, screen support or due to too much effect passes...
I am guessing they will be using SMAA or FXAA as post processing and multisampled buffers do not work so well together.
ReplyDeleteI like the temporal aspect of the algorithm, have you guys read TSSAO article in GPU Pro 2?
Will you let us edit some ini files to run the AO pass at full resolution? and more samples? I want to stress my pc to the max :D
I am also interested in what you guys think about this SSAO technique:
http://blenderartists.org/forum/showthread.php?234822-new-and-fast-SSAO
It uses a 'dual depth' linear extrapolation, I use it in my project without a blur pass and it looks very good for a depth only SSAO, very little halo artifacts even.
Just wondering what your thoughts are on this algorithm.
You will be able to run the AO at full resolution if you want.
DeleteThat algorithm looks really good for one that doesnt a use blur. The only negative thing about that shader is that it uses 36 samples per pixel and seems to be run in fullscreen.
Awesome article.
ReplyDeleteA question. Is it possible/feasible to use the difference in depth between the current fragment and the samples to introduce additional weight factor, so that objects that are relatively far from the background contribute little or nothing to the AO (of the background)? This would avoid a black halo that follows objects that the player can pick up and carry around. As you move an object away from a background surface, it blocks less and less ambient light w/ respect to a point on that surface, so it should contribute less AO.
Have you experimented with something like that? If so, what did the results look like?
The algorithm already does that. The halo that follows the object is from the temporal filtering and not the SSAO generation.
DeleteOh! Should have read the article/code more carefully. It would be great if the halo could be avoided, though; then again, in (visually) dark, horror-themed games, I suppose it's less of a problem.
DeleteThe algorithm already fixes the halo by applying the temporal blur. Look at the video 12 seconds in. But im not sure we are talking about the same thing :)
DeleteTo be honest I'm a little confused. The picture in the "Temporal Blur" article section indicates that there are two additional steps to the base algorithm - the temporal filter and temporal blur but you posted only one additional shader, called "Temporal Blur Code". Moreover, this shader doesn't do any 5x5 filtering but looks more like the "Temporal Filter" block from the "Temporal Blur" article section. Could you elaborate more on this, please? :)
ReplyDelete