Thursday, January 1, 2015

Scaling up panotools some more

To recap, some things that I currently do to help stitch large IC photos with pr0ntools:
  • pr0nstitch.py: creates baseline .pto.  Instead of doing n to n image feature match, only matches features to adjacent images.  This reduces feature finding from O(n**2) to O(n)
  • pr0nts.py: creates image tiles from .pto file. This allows me to stitch large panoramas with limited RAM
  • Some misc wrapper scripts such as pr0npto.py that helped work around some peculiarities of tools like PToptimizer
For large panoramas, these were still a few problematic parts:
  • Big problem: PToptimizer is O(n**2).  Very slow for large image sets.  IIRC 1200 images took a week to optimize
  • Cropping and rotating .pto in Hugin is unnecessarily slow since I only use the outer images for alignment.  With n images only about O(n**0.5) are used
  • pr0nstitch.py was single threaded.  I recently picked up a beefy machine at a flea market and now had an incentive to take advantage of the extra cores
 Other issues:
  • pr0nstitch.py: I couldn't get cpfind to produce good results so I used closed source autopano at its core to do feature matching.  This has a lot of hacks to work through wine which is complicated and hard to setup
Lets see what we can do about these.

Regarding optimization, panotools has a tool called autoptimizer with a nifty feature: "-p Pairwise optimisation of yaw, pitch and roll, starting from first image."  This sounds a lot better than the O(n**2) optimization that PToptimizer dos.  Unfortunately, as stated, and, verified through my trials, it does not work for position.

So, I decided to try to see what I could do myself.  Most of the time is spent getting the panorama near its final position.  However,since the images are in an xy grid, we have a pretty good idea of where they'll be.  But better than that, since the images are simple xy transpositions we should be able to estimate pretty close to the final position simply by lining up control points from one image to another.  By taking the average control point distance from one image to another, I was able to very quickly construct a reasonably accurate optimized estimate.  The upper left image is taken as coordinate 0, 0 and everything is relative to it.  I then iterate the algorithm to fill in all remaining positions.  Presumably I could iterate the algorithm additional steps to optimize it further but at this time I simply hand off the pre-optimized project to PToptimizer.

This has a dramatic performance impact.  Here's some results from a project with 805 images:
  •  pr0npto --optimize: real    488m47.457s
    • Normal PToptimizer
  • pr0npto --pre-opt: real    0m58.843s
    • Pre-optimizer only with no PToptimizer final pass
    • rms error: 0.293938367201598 units
  • pr0npto --pre-opt-pt: real    29m39.604s
    • Pre-optimizer followed by PToptimize
    • rms error:  0.215644922039943 units
    • Took 3 iterations
--optimize and --pre-opt-pt should produce about the same final result.  I forgot to check what the final optimization score of --optimize was.  Anyway, above results show that the pre-optimizer was able to pre-position to better than average 1/3 pixel.  A final optimizer pass was able to slightly improve the result.

The next problem was crop/rotation.  There are two problems:
  • Its unnecessarily slow to create thumbnails for many images, most of which will just be thrown away
  • I'm not sure what the order of the preview algorithm is, but its much worse than O(n) on my laptop:
    • 74 images: 10.6 img / sec
    • 368 images: 2.9 img / sec
My solution: create a wrapper script (pr0nhugin) that eliminates the unused images.  This won't fix hugin's O(>n) issue but mitigates it by keeping n low.  Basically it deletes the unused images and opens a sub-project with the reduced image set.  After hugin closes the new crop/rotate parameters are merged back into the original project.

Sample results:
  • Original project: 368 images in 128 seconds
  • Reduced project: 74 images in 7 seconds
Original project in fast pano preview:



Reduced project in fast pano preview:



128 seconds isn't so bad but this is a smaller project and gets pretty bad for larger projects.  Could also probably eliminate every other image or something like that to speed it up more if needed.

Finally, I figured out what I was doing wrong with cpfind.  It was quite simple really, I basically just needed to use cpclean to get rid of the control point false positives.  Comparison of optimization results:
  • pr0nstitch w/ autopano: rms 1.09086625512561 units
  • cpfind/cpclean optimize pitch/raw/yaw xyz: rms 6.25849993203006 units
    • Only xyz should change, its optimizing things it shouldn't
  • cpfind/cpclean optimize xy: 0.510798407629321 units
    • IIRC this was through new pr0nstitch but can't recall for sure
Some quick tests  show that cpfind/cpclean meets or exceeds autopano performance.  Given the pains of coordinating through WINE (they have a Linux version but it doesn't work as well as the Windows one).

Now with the backend tool working smoother I was able to parallelize feature finding easier.  Basically pr0nstitch now spawns a bunch of worker threads that calculates features between two images.  A main thread merges these into a main project as workers complete matching sub-images.

There is also a stitch wrapper script that pulls these tools into a recommended workflow.

Summary of new round of improvement:
  • pr0nstitch.py
    • Parallelized
    •  Switch to panotools feature finding/cleaning.  Now fully open source and much easier to setup
  • pr0npto.py: new efficient --pre-opt-pt optimizer
  • pr0nhugin.py: hugin wrapper to edit reduced .pto files for fast editing
  • Added "stitch" workflow script

    No comments:

    Post a Comment