¶How to fight tearing
Tearing is a display artifact that occurs when images are presented to the screen without regard for the current status of the output circuitry. It occurs because pixels are sent to the screen gradually in book order rather than instantaneously. What happens is that the output DAC reads across the same area of the screen that is bring written to, so the monitor ends up showing a half-updated image. The result is a momentary frame that has half of an old frame and half of a new one, with a clean horizontal split across (the tear). Because the location of the tear varies according to timing, it usually jumps all over the place, which can be distracting.
I don't advertise VirtualDub as a player, but I've had several requests to add an option to fix this problem lately, so I decided to look into it. Tearing can be quite noticeable if you're previewing nearly full-screen 24 fps video. The most straightforward way to solve this problem is by synchronizing updates to the output scan, a technique known as vertical sync (vsync). This forces the tear to always occur off-screen so that it is not noticeable. Well, after doing some preliminary investigation, it's doable, but not as easy as I had hoped. As usual, DirectX is part of the reason why.
Background
Display output circuitry is generally geared toward CRTs, which use an electron scanning beam to draw an image. On a VGA display, this beam scans the screen in book order at 31.5KHz horizontally and 60 or 70Hz vertically. The period in which the beam returns from the right to the left is the horizontal retrace or horizontal blank, and the return from bottom to top is the vertical retrace or vertical blank. During both these periods, no video is displayed. The vertical blank period is much longer than the horizontal blank, and is definitely the region of choice for manipulating display memory unseen.
LCDs don't use an electron scanning beam, but still receive information in the same order at approximately the same rates, so they can show tearing effects as well.
The simplest method of avoiding tearing is to avoid updating the screen when any visible area of it is being displayed. One way to do this is to synchronize rendering with the output scan so that it always happens during the vertical blank. This is called vertical sync or vsync. During vertical blank, the screen is changed — either by direct rendering, updating the entire screen from an off-screen buffer (blit/copy), or by swapping the active screen by switching display pointers (flip). None of this ever happens while the screen is being displayed, and thus the monitor always shows only full frames.
A downside to vsync is the dead time between when a frame is completed and when it can become the actively displayed frame. In a single-buffered or double-buffered system, this is dead time, and creates cliffs in performance — if the system just barely misses 60 fps update speed with a 60Hz refresh, it has to drop to a consistent 30 fps. Switching to a three-buffer system, triple-buffering, resolves this problem by allowing the renderer to start drawing a third frame while a second one is pending. This produces an uneven frame rate and consumes more memory, but allows arbitrary frame rates regardless of refresh rate. It also requires a way to do an asynchronous flip, but most video hardware allows this, either through a vertical blank interrupt or a double-buffered display pointer register.
If the entire screen doesn't need to be updated, such as when displaying video in windowed mode, another solution is beam following. Many video display chips can report the current vertical location of the output scan, which then allows the screen to be updated during the visible portion as well. Assuming the blit is faster than the output scan, it is OK to blit whenever the beam is not scanning the portion being updated. If the beam is before the update region, the blit will outrace the beam, and if the beam is after it, the blit will stop before it hits the scanning point. This greatly increases the amount of "safe blit time" available, and reduces waiting time.
Avoiding tearing with GDI
Neither the vertical blank nor the beam position can be accessed with GDI. Therefore, unless you're running on Windows Vista with the desktop composition engine enabled, the answer is: you don't.
Avoiding tearing in DirectX
Flipping is the nicest and fastest way to avoid tearing because it requires very little effort; all that is required is to change one display pointer. Unfortunately, to do this the entire screen has to be flipped, and VirtualDub isn't a full-screen application. It's possible to flip an overlay surface, but no modern hardware supports those for RGB pixel formats. So blit scheduling is the way to go.
Calling WaitForVerticalBlank() is a solution, if you don't mind burning a ton of CPU time. It's also not really the most ideal since you're often starting the blit later than necessary.
DirectDraw 3 has a nice flag that can be set on a blit operation called DDBLTFX_NOTEARING (0x00000008). According to the documentation, it schedules blits to avoid tearing on-screen, which is exactly what I want. And when I tried it, it did squat. Checking around with Google, I found that others had hit this issue, and then found the following quote in the Windows Device Driver Kit (DDK) documentation:
Note that the DDBLTFX_NOTEARING, DDBLTFX_MIRRORLEFTRIGHT, and DDBLTFX_MIRRORUPDOWN flags are unsupported on Windows 2000 and later and are never passed to the driver.
Some testing revealed that MIRRORLEFTRIGHT and MIRRORUPDOWN still work, so it wasn't out of the question for NOTEARING to be handled in the runtime. To test that conclusion, I started stepping through the disassembly of IDirectDrawSurface2::Blt() with a data read breakpoint on the blit FX flag word. Testing flag 0x01, OK... 0x02, OK... 0x04, OK... 0x70, OK... do the blit... hey, wait a minute....
Seems that the Windows XP DirectDraw runtime simply drops DDBLTFX_NOTEARING on the floor, and the documentation hasn't been updated to note that fact. Grreaat.
DirectShow does it, though. Current versions normally use the Video Mixing Renderer (VMR) which is Direct3D based, but older versions relied on the DirectDraw-based Video Renderer, which also had no-tearing logic. After attaching the debugger to GraphEdit and forcing the old Video Renderer, I discovered that it doesn't use DDBLTFX_NOTEARING at all. It simply beam-avoids by polling the beam location in a tight loop and delays the blit until the danger zone has passed. Interestingly enough, this doesn't take as much CPU as might be expected.
Basically, avoiding tearing in DirectDraw on Windows 2000/XP requires some manual work.
Direct3D is a little more intelligent about how it does polling — it uses Sleep(1) to release CPU time while avoiding the beam. Depending on whether D3DPRESENT_INTERVAL_IMMEDIATE to D3DPRESENT_INTERVAL_ONE is used, it can also optionally increase the scheduling timer rate to make this more accurate (10ms is a long time for a 16ms frame!). I was hoping that the hardware could queue up a vsynced or beam-avoided blit, but nope, the runtime polls. Unfortunately, there doesn't seem to be a way to tell the runtime to present as quickly as possible while avoiding the beam, so if I change the present mode and update two panes the best I can get is 30 fps with a wait in between, because the runtime does a Sleep() to ensure that the present rate doesn't exceed the refresh rate. Bleah. Part of the problem is that I'm only using one swap chain to service all panes, since I see no reason to allocate extra backbuffers, but I suspect that multiple swap chains might still stall unnecessarily.
In the end, I'll probably end up beam-avoiding in the D3D9 minidriver the same way as in the DirectDraw minidriver. Another way to do it would be to allocate a second swap chain — which really only needs one buffer since we're running windowed — and asynchronously schedule the blit using a 1ms multimedia timer callback that polls the beam. Problem is, there isn't a good way to schedule the blit from another thread, since Direct3D isn't a terribly multithreading-friendly API.