What is VirtualDub?
VirtualDub is a video capture/processing utility for 32-bit Windows platforms (95/98/ME/NT4/2000/XP), licensed under the GNU General Public License (GPL). It lacks the editing power of a general-purpose editor such as Adobe Premiere, but is streamlined for fast linear operations over video. It has batch-processing capabilities for processing large numbers of files and can be extended with third-party video filters. VirtualDub is mainly geared toward processing AVI files, although it can read (not write) MPEG-1 and also handle sets of BMP images.
I basically started VirtualDub in college to do some quick capture-and-encoding that I wanted done; from there it's basically grown into a more general utility that can trim and clean up video before exporting to tape or processing with another program. I released it on the web and others found it useful, so I've been tinkering around with its code ever since. If you have the time, please download and enjoy.
§ ¶0.5 != 128/255
I cringe whenever I see people implement YCbCr to RGB conversion like this:
y = 1.164 * (y - 16.0 / 256.0);
r = y + 1.596 * (cr - 0.5);
g = y - 0.813 * (cr - 0.5) - 0.391 * (cb - 0.5);
b = y + 2.018 * (cb - 0.5);
What's wrong with this, you say? Too slow? No, that's not the problem. The problem is that the bias constants are wrong. The minor error is the 16/256 luma bias, which should be 16/255. That's a 0.02% error over the full range, so we can grudgingly let that slide. What isn't as excusable are the 0.5 chroma bias constants. If you're working with 8-bit channels, the chroma center is placed at 128, which when converted to float is 128/255 rather than exactly one-half. This is an error of 0.5/255, which would also be barely excusable, except for one problem. The coefficients for converting chroma red and chroma blue are greater than 1 in magnitude, so they'll actually amplify the error. If you're converting from 8-bit YCbCr to 8-bit RGB, this basically guarantees that you'll frequently be off by one in one or more components. Even more fun is that the green channel errors won't coincide with the red and blue errors and will be in the opposite direction, which then leads to ugliness like blacks that aren't black and have color to them.
In other words, please use 16/255 and 128/255 when those are actually where your luma and chroma values are based.
(I have to confess that the reason this came to mind is that I found this issue in some prototype code of mine when I started actually hacking it into shape. Needless to say, the production code doesn't have this problem.)
You might be thinking that it's a bit weird to be doing a conversion on 8-bit components with floating point, and you'd be correct. The place where this frequently comes up is in 3D pixel shaders. A pixel shader is a natural place to do color conversion, and is also where you'd frequently encounter 8-bit components converted to floating point. Unlike a CPU-based converter, however, there is basically no extra cost to using the correct constants. The only time you'd be better off using 0.5 in a shader instead of 128/255 is if you're on a GeForce 3, in which case (a) you're already skating on thin ice precision-wise and (b) the hardware is 9-bit signed fixed point so you're going to get 128/255 anyway. Otherwise, it's kind of sloppy to be displaying 720p video on-screen with a shader that doesn't get colors right just because someone was too lazy to type in the right constants.
(Read more....)
§ ¶Weighted averaging in SSE (part 2)
Last time I talked about a faster way to do parallel averaging between 8-bit components in SSE with unequal weights, specifically the [1 7]/8 case. This time, I'll improve on the [1 7] case and also show how to do the [3 5] case.
To recap, the idea centers around using the SSE pavgb instruction to stay within bytes by rounding off LSBs correctly each time we introduce a new intermediate result with an extra significant bit. Previously, I did this by converting a subtraction to an addition and using a correction factor. It turns out that there's another way to do this that doesn't require a correction factor and requires fewer constants, which means more temp registers available.
(Read more....)