Showing posts with label tech. Show all posts
Showing posts with label tech. Show all posts

Friday, 22 October 2010

Pre-pass lighting redux

Introduction
After writing the previous post on pre-pass lighting I started doing some tests, to see how it compares to the old deferred renderer. The results that I got where pretty interesting, so thought I might as well share them. Also note that this post might be a bit more technical than the previous.

The good thing with these renderers is that they both share the basic material data. So I can use the same data for both HPl2 and HPl3. HPL3 comes with the few more features for decals but for tests, it is easy to just skip them. When setting up the test I went with a very simple scene, it just the same box model rendered several times, a floor and lights. Some times it is best to test with proper game scenes, but I wanted to something that could be easily tweaked and gave simpler output. This means that the tests are not 100% accurate of in-game performance, but even testing a level in game is not that, as framerate varies a lot depending on where in a level one looks. So usually benchmarking has some kind of fly-through, but that is of the scope from what I intended to do.

Note that HPl2 test was built in Visual Studio 2003, while HPL3 uses the 2010 version. I do not think this should matter much though, even if the optimization routines differ, simply because pretty much all of the work is done on the GPU. The graphics card I did all my testing on is a Radeon 5850 HD (and others where tried for some tests). And as a final note, all of the data is given as average frame time (in milliseconds!) and not as frames per second. As Emil Persson points out, FPS is not a very good way to compare performance.

Test #1
Now with my setup details out of the way, let's get down to the details. I first started out with a scene like this:
1 x box, xz-plane floor, 1x spot light + shadow
which game me the following results:
HPL2: 0.78ms
HPL3: 0.84ms
Difference: +7.7%
This means, that given a simple scene like this the old renderer is actually faster! This is not that strange though since the scene does not have many lit screen pixels, most of the image being sky. Thus, the extra pass extra made with the pre-pass renderer matters more than an lighting speed-ups. Also, the decrease in draw buffer (3 to 2) in the g-buffer does not make up for the extra pass.

Test #2
4000 x boxes, 1 x point light, x-z plane floor
HPL2: 14.9
HPL3: 18.5
Difference: +24%
As expected when there is a lot of things to render, the pre-pass lighting is even slower. That extra pass shows on the performance. Remember though that 4000 objects is quite much and an important thing for good performance on GPUs is to have as few draw calls as possible.


Test #3
1 x boxes, 1000 x point light, x-z plane floor
HPL2: 30.0
HPL3: 29.2
Difference: -2.7%
As noticed, once the scene is filled with lights, pre-pass lighting is faster, but only so by a slight amount. Especially considering the large amount of lights. (I later realised that the actual lit screen pixels where quite few, something fixed later on in test #5).


Test #4
4000 x boxes, 1000 x point light, x-z plane floor
HPL2: 47.5
HPL3: 52.0
Difference: +10%
Doing a really stressful test (the number of lights and objects are really large) it seems like the old deferred renderer wins out. This was actually a bit unexpected and dissappointing to me as I thought that the pre-pass lighting should not be this far behind. But taking the little difference in test 3 into account, it is not that suprising. Still, after these tests it is clearly shown that pre-pass lighting is far from a giant speed up compared to deferred shading and it actually seems slower in most cases.

I also tried to skip the early-z pass for pre-pass lighting (I use early-z in both renderers on all other tests). This is basically a pass where the z-buffer is set up, and makes sure later passes only draws visible pixels. From reading Crytech papers, it does not seem like the the Crysis 2 engine has this though (and same seems true for other engines), so I tried to do a quick and dirty test of not using it and got this data: 48.7 (+2.5%)
This means that even without the early z test, the pre-pass was still slower. However, I did not attempts to reduce overdraw (like sorting front to back) and it might be possible for optimizations here. However, when rendering front to back, there will be a lot more state switching as you cannot sort according to texture, etc as efficiently, so I wonder if the data might not even be worse in a more realistic scenario.

I also tried this test out on a few other other cards (again with full early-z testing):
Geforce 240gt: 125, 137 (+9.6%)
Geforce 320M: 240, 240 (+/- 0%)
This gave the indication that on some cards pre-pass might actually be better, and that it might not be as clear-cut as the first tests seemed to show.

As a final variation on this test, I added illumination maps to all textures, a feature that requires an extra pass in the old engine. I also removed the height map rendering. This gave me: 50.6, 50.0 (-1.2%)
This is a very tiny speed up considering that the methods now have the same amount of passes and that pre-pass lighting has faster light rendering and a smaller g-buffer.

Test #5
488 x boxes, 30 x point light, x-z plane floor
Radeon 5850 HD: 7.4, 7.8 (+5.4%)
Geforce 240gt: 18, 19 (+5.5%)
Geforce 320M: 50.0, 45.5 (-9%)
Geforce 9800gtx: 9.5, 9.5 (0%)

In this test I change to a more realistic number of lights and draw calls. I also aligned the lights so the lit pixels covered the entire screen, which I did not do above. As can be seen, on my computer (the 5850) deferred shading still wins, but on a less powerful card the pre-pass lighting is much faster. This difference might be a bandwidth issue and some cards might have problems pushing the data amounts required for deferred shading.

I also did a tweak to this test and turned down the number of draw calls a bit:
316x boxes, 30 x point light, x-z plane floor
Giving: 6.4, 6.6 (+3%)
This further reduced the difference and if I did the hackish removal of early z, pre-pass lighting plunged down to: 5,2 (-18%)
Even though this removal of early z is not very realistic, the results show that I need to investigate it. Something I will do once I get a more proper scene up and running.

Finally, I also tried to give all the boxes illumination (and turning back on early z test):
6.8, 6.6 (-2.9%)
This clearly shows how you get the illumination almost for free in pre-pass, and that it costs a bit more with the deferred shader. This is not surprising though, given that it requires an extra pass, but hints that further effects can be more efficiently implemented when using pre-pass lighting.


Conclusions
The tests clearly show that my previous assumption that light rendering in pre-pass lighting would be much faster was incorrect. It is a bit faster, but only noticeable so when really stretching the limit and then only by a small fraction. This makes me conclude that one should not use pre-pass lighting to have faster light rendering. However, as can be seen on the test with the Geforce 320M, the pre-pass lighting technique matters a lot more on older hardware, and it might actually be of greater use there.

There is not any vast differences in the techniques though and instead the choice should be based on other merits. Given that pre-pass lighting allows for so much more variety in materials, I will keep it for HPL3, but I will not be expecting any rises in framerate anymore.

I hope this post will prove useful for those who are thinking of using either rendering method, and for the rest it might be an interesting insight on how testing is done (at least how I do it). Again, sorry for the lack of pretty picture, which I promise to make up for!


Thursday, 21 October 2010

Tech Feature: Pre-pass lighting

Progress on the new engine, HPL3, is coming along nicely and recently I changed the core rendering system into something called Pre-pass lighting. This switch has been made for a number of reasons, but before I got into that and what pre-pass lighting exactly is, I need to explain how we did it back in the "old days".

Forward Rendering
The engine powering Penumbra (HPL1) uses something called forward rendering. This type of rendering works by rendering the entire scene on an object basis. So when rendering a chair, wall, or any geometry in the world, this was done by drawing it one time for every light that touches it. So an object that is lit by three lights has to be drawn three times, and so on. This technique can be quite limiting when setting up scenes as you need to be very careful when adding lights. It might not actually be clear exactly how much impact on performance a single light will have and levels usually require quite some tweaking to get right. The complexity of a scene can be expressed as:

Draw calls = Objects * Lights


This means that the number of draw calls can easily get very large and only adding a single light, even if it has little effect on the scene visually, can have very negative effects on performance.

Deferred Shading
When starting work on HPL2 (which was used for Amnesia) I wanted to get away from this annoying light limitations. Since HPL1 had been created a new technique called "Deferred shading" had emerged and when work on HPl2 was started, the average PC system was up for the the task.

What makes deferred shading special is that it separates rendering objects and rendering the lighting. This works by first rendering to a special G-buffer that contains information such as normals, depth and color of all on screen objects. The final output looks like this:


From left to right: Color, normals and depth. Note that these texture have 4 channels each and not visible are also saved specular intensity and power. These three texture then represent the properties of all visible data. It is then used by the lights to render the final image. This makes the complexity of the rendering:

Draw calls = Objects + Lights

This is a lot nicer and as lights and objects are separated, it is a lot easier to add lights to a scene without worrying about performance hits. It is also much simpler to intuitively understand how performance will be affected. By using this technique we where able to use a lot more light sources in Amnesia and considering all of the dynamic lights needed for the mechanics, the game would have been a lot harder to make using forward rendering.

Deferred rendering is not without problems though. First of all, rendering the G-buffer means rendering to three textures at one time which is quite performance heavy, meaning a scene with few lights runs faster on a forward renderer. Secondly, there is no support for fullscreen anti-aliasing either, and one has to do some hackish tricks to remove jagged edges (the "edge smooth" feature in Amnesia). Finally, there is much less material variety possible as every property needed to generate the final image needs to be in the G-buffer. Since we could mange without fancy skin shaders in Amnesia, it was turned out to not be too much of a problem though.

Scenes like the test of Agrippa above would not be possible in our old Renderer. In this test shot around 30 lights help light Agrippa in a nice fashion, and since the geometry and lighting is decoupled it is possible to run this with a high framerate.


Pre-pass lighting
I heard about this technique (first saw it here) during the development of Amnesia and was a bit interested in trying it out. I was interested in the tech back then since it made light rendering go faster, something that had proved a bit of a bottle neck in Amnesia. However, I did not have time back then and decided against it.

As I started to update the engine to HPL3 I again looked at this technology. This time more had been written on the subject and it had actually been tested. For example a similar algorithm was used in Insomniac's Reistance 2 and Crytech goes over it in a paper about CryEngine 2. This also meant that the method was practical, and was well worth trying (I usually try and use tech I have been able to try in other games, as tech dead-ends can prove quite expensive).

Pre-pass lighting (or deferred lighting as it is called sometimes) is very similar to deferred shading and I could use much of the code from HPl2 when implementing it. Only a few changes in materials and light rendering was really needed. The rendering works by also first rendering to a G-buffer, but one only containing normal, depth and specular power. After that lights are rendered, but they render only part of the light equation; basically color and specular intensity. Then in a final pass all objects are rendered again and the light data from the previous pass is used to render the final image. The sequence is like this:

Render Normals+Depth -> Render Lights -> Render final image


The first good thing is that this technique is able to render lights faster, since each lights has to do less equations and access less textures. The algorithm also includes an extra step at the end, but this does not matter that much, as the added the final render takes is regained by the one less buffer needed to be rendered to in the first g-buffer pass (only 2 textures needed instead of the 3 deferred shading uses).

This speed up was not the main reason why I used it though. Since each object rendered again during the final pass, it is possible to have a much larger variety of material types. Instead of being confined to using what can be fitted into a g-buffer, a material can do specific calculations the final image pass. This allows for specialized skin shaders and other tricks. For example, it is now possible to have more features packed into the decal materials:

Above is a decal with both color, normalmap and height map, something not possible in the previous engine. (Note that color and normal have separate alpha and that the height map make the tiles seem carved out of the ground).


End notes
Now I have given a little rundown of how the new renderer works and how it differs from the old one. I have skipped a lot of the details and more technical stuff, to make the post a bit shorter. So if you have any questions, comment and I might have some kind of answer!

Also, sorry for the lack of new and exciting images in this post. Next tech feature should be more fun on that part, as I am now moving on to Terrain...

EDIT:
I eventually did some tests on the algorithm and compared it to the old renderer. Results are:
http://frictionalgames.blogspot.com/2010/10/pre-pass-lighting-redux.html


Wednesday, 13 October 2010

Tech feature: Sunlight with Shadows

Now I am pretty much done with the first major feature for HPl3, the engine that will power our upcoming (and so far super secret) game. This feature is global sunlight along with shadows, a feature not implemented previously. And since it is now implemented you can bet it will an important feature for the secret game ;)

First up was making the Ambient light nicer. Below is screenshot of how it looks in HPL2:

This is just a uniform color blend that has been added and does not look very. It is quite hard to see any details when there is a uniform texture such as in the screenshot above. To get nice ambient lighting what you need to do is to somehow simulate global illumination. Light does not just hit an object and then stop, but bounces around and scatters color from nearby on each other. There is a lot of research into this and most of it require one to pre-calculate the result in one way or another.

I settled with something really simple called hemispherical lighting, which basically mean to have separate up and down (sky and ground) colors and then blend them depending on the normal of the surface. My first idea was to use cubemaps to do something similar, but since the cubemap needs to be very blurred, using hemispherical lighting gave pretty much the same result and is a lot faster. Here is the result:
Now it is a lot easier to see details and looks a lot better.

Next up was the direct lighting from the sun and this was quite simple. I could just add some tweak to the existing shaders and make it work. Basically sunlight is just like normal light but without any position or attenuation. This means every pixel is lit from the same direction and strength is independent of distance to the source (as it does not have a position).

Here is how the scene looks when we add that too:

Much nicer, even though the texture is a bit boring. Note that there now is specular because of the direct light from the sun.

Finally, lets move on to the shadows! The engine already feature the basic shadow rendering though a technique called shadow maps. What is the big problem now is to make it look good over long distances. Since shadow mapping works by rendering from the light's point of view, doing shadows from an omnipresent light source gets a bit complicated. The simple solution is to use a really large shadow map, but then you would get very bad resolution near the camera.

What you do instead is to use the fact that objects take up smaller space of the screen the farther away they are. So the idea is to use the shadow map in such a way that you give more room for pixel nearby and less to those far away. This is something that has been researched quite a bit and there are various techniques to achieve this. One solution is to warp the projection when rendering to the shadow map. This comes with a lot of issues though and is very rarely used in games.

Instead most games use a technique called Cascaded Shadow Maps or Parallel Split Shadow Maps. The way this works is that the view frustum (a geometrical shape encompassing all the camera sees) is split into several parts and a shadow map is given to each split. This means that the first split, which is right in front of the camera, gets a much larger shadow-per-pixel ratio than the last split, which is much larger, but further away (and hence has small on-screen pixels).

The algorithm itself is pretty easy, especially since there is quite a few papers on it. The big problem is that you can get some flickering results because the shadow map can change a lot. There are published techniques that solve this fairly easy though. Most of my time was instead spent on making it work with the rest of the engine. This is something that might not be that known to non-programmers, but most of the work is often not in the algorithm itself but fitting it into the engine design. I think I spend a 3-4 days on getting it inside the engine, before I had everything set up as it should. The actually algorithm took 2 days or so.

Here is how it looks:
Note how shadows are detailed up front, yet they are cast all the way into the distance.

Here is how the shadow maps look (I combine all into a single one):
Probably want to enlarge this by clicking!

These are the four shadow maps that all render shadows for different slices of the frustum. The white stuff is the objects, the red lines outlines the frustum slice and the sphere is part of an anti-flicker algorithm (that determines size of shadow maps).

Now lets add this to the image I started with:
And lets add some nicer texture while at it:
There! Sunlight with shadows is in!

My next job will be to update the very core of the renderer with something called a Light Pre-Pass Renderer. More on that in a later post!