Myvideo

Guest

Login

New features in RTSS beta 2: PresentMon latency analyzer, overlay layouts merging support

Uploaded By: Myvideo
1 view
0
0 votes
0

This video demonstrates some new features from upcoming RTSS beta 2. NVIDIA Reflex latency markers based overlay is the most precise way of tracking rendering pipeline related timings and latencies, but it is limited to NVIDIA GPUs and Direct3D11/12 applications only. RTSS beta 2 will introduce similar looking latency analyzer based purely on PresentMon V2 timings. So, it works on any graphics cards and in all 3D APIs. It is not as detailed as Reflex, but still provides you useful info, especially if you cannot use Reflex latency analyzer overlay on your system. So I demonstrate the following things in this video: -I show you original NVIDIA Reflex latency markers based layout in Cyberpunk 2077, then I open new overlay layout called . PresentMon and Reflex layouts start working as soon as you switch to 3D application, so I switch back to CP 2077 to start displaying data for it. As you see, it is not as detailed as Reflex, but unlike Reflex it additionally allows you to peek behind “GPU Render” stage and see additional DWM composition related latency (distance between the end of “GPU render” stage and “Display” marker). Our D3D12 application is working in optimal hardware composed independent flip mode, so there are no major delays there in this specific case, however if you see major delay there – presentation mode is not optimal on your system. - To compare both reflex and presentmon_latency_analyzer layouts I wanted to create new overlay combining them both. This can be easily done with one more brand new feature of OverlayEditor, which will be introduced in beta 2: overlay merging mode. I just use new “Merge” command, remove duplicating frametime graph which I don’t need, select background layer and stretch it to cover new layers. Done, our new overlay can be saved under new name now. - Now we can see both latency analyzers simultaneously and compare what they display in realtime. - Both latency analyzers give you the same info about GPU rendering stage and allow you to see “bubbles” there. PresentMon’s “GPU wait” is nothing but a difference between “GPU Render” and “GPU Active” / GPUBusy. - The first thing you may notice is that more precise NVIDIA Reflex sim-to-render latency is actually much higher than PresentMon’s Display latency. The reason is multithreaded CP 2077 engine, which is performing simulation and rendering on different threads. So Reflex knows the real distance between simulation and end of rendering, while PresentMon can only see timings related to rendering thread. So whole “CPU busy” on PresentMon’s side is just a rendering submit stage in this specific application. Keep this specific in mind when comparing latencies detected by PresentMon and NVIDIA Reflex. - The second thing you may notice is the major difference in “Present” stage latency. PresentMon’s “Present” stage is roughly , while in Reflex analyzer “Present” fluctuates much more and peaks to 2-3ms. The reason is that PresentMon monitors length of presentation stage at OS level only, while Reflex monitors Present calls at game context. So values traced by Reflex allow you to see all present hook related overhead. All third party overlays normally render overlays at this stage. In my case I run 3 overlays simultaneously to show you the difference: RTSS overlay itself, Steam FPS overlay in the top right corner and FPSMon green overlay next to it. - So let’s start killing overlays one by one to see how it affects present latency displayed by Reflex. - First, we kill FPSMon. It was the main performance offender in this specific case, it decreased present latency from to approximately 1ms level. - Second, we kill Steam overlay. It shaved a bit more of present latency, roughly 200-300 microseconds - Remaining difference is obviously RTSS overlay itself. We can enable RTSS performance profiler to see how much time it adds to presentation stage. That’s what I do next. “CPU total” counter inside the performance profiler is our target, we see there that RTSS overlay renderer adds roughly 350 microseconds to presentation latency. - Finally, there is one trick, allowing us to verify it. I move DesktopOverlayHost from my mini-display to my primary monitor then disable overlay for global profile and CP 2077. So, we can finally see that present latencies traced by PresentMon and Reflex almost match, remaining 100-200 microseconds in API overhead. - Also, please take a note that third party framerate limiters normally perform framepacing and wait inside presentation stage too. So, it will be visible to Reflex latency analyzer, but PresentMon won’t be able to trace such things correctly! I set FPS limit to 60 FPS to demonstrate the difference. So keep it in mind too when using PresentMon latency analyzer in conjunction with enabled external framerate limiters!

Share with your friends

Link:

Embed:

Video Size:

Custom size:

x

Add to Playlist:

Favorites
My Playlist
Watch Later