Comparison of VidSync with Hughes & Kelly's (1996) method

The methodological predecessor to VidSync was a set of mathematical methods published by Hughes & Kelly (1996), who originated the use of the front and back planes of a calibration frame (which they called a quadrat) to obtain intersecting 3-D lines and find 3-D positions. The largest mathematical difference between our methods is in the manner in which screen coordinates, in pixels, are converted to "world" coordinates, in meters or millimeters, on the surface of each face of the calibration frame.

The Hughes & Kelly method fits a third-order polynomial to describe and predict the relationship between screen and world coordinates. The equations are shown below in an excerpt from their paper (note that although equations (1) and (2) are written in the same form, they are fitted separately to the x and y data, respectively, and therefore each one refers to different values of the parameters p1 to p10).

Polynomial models (higher than first order, anyway) are useful for interpolation but run into problems when extrapolating outside the range of the calibration data. See Wikipedia for a nice illustration of this general phenomenon. Hughes & Kelly encountered this issue during a behavioral study in the early 2000s and realized it could be solved by using a matrix method called "Direct Linear Transformation" (DLT) that fits a linear grid over the whole screen instead of using polynomials. The math in VidSync evolved from this idea and related concepts in the computer vision literature.

At the suggestion of a manuscript reviewer, I checked to see how much difference there really is between these two models. I took distortion-corrected calibration data from a pool-based VidSync test and calculated the world coordinate grids across the screen using both methods. This was the camera's view of the calibration frame:

2012 Pool Test (0 00 22 47.901+2997) CalibrationB Left Camera

The diagram below shows the fits of each model to the front face of the calibration frame. The bold grid shows the actual position of the frame face, within which both methods interpolate coordinates well. Extrapolation beyond that is shown in green lines (DLT method) and black lines (polynomial method). Near the edges of the image, you can begin to see substantial warping in the polynomial method that departs from what should be a straight grid.

The warping becomes much more exaggerated for the back face of the calibration frame, which requires more extrapolation:

You can see that the extrapolation problems of the polynomial method are not too bad for very short distances (about 1/4 to 1/2 the frame size) outside the calibration frame, but they pose a major problem for accurate 3-D measurement when extrapolating farther out. These problems could largely be avoided in the field by using a calibration frame that nearly fills the screen in both cameras, when it is practical to do so. However, the DLT method used in VidSync (at least in combination with VidSync's removal of non-linear distortion) avoids these problems altogether and has no relative disadvantages.

References

Hughes, N. F., and Kelly, L. H. 1996. New techniques for 3-D video tracking of fish swimming movements in still or flowing water. Can. J. Fish. Aquat. Sci. 53(11): 2473-2483.