MLSGPU User Manual

Bruce Merry


Table of Contents

1. Introduction
2. Installation
2.1. Dependencies
2.2. Compiling
2.3. Installing
3. Running MLSGPU
3.1. Input files
3.2. Output files
3.3. Command-line options
3.3.1. Temporary files
3.3.2. Response files
3.3.3. Splitting the output
3.3.4. Selecting OpenCL devices
3.3.5. Smoothing
3.3.6. Component pruning
3.3.7. Boundary handling
3.4. Limitations
4. Using MLSGPU on a GPU cluster
5. Troubleshooting
6. Support
7. License
7.1. Third-party components

Chapter 1. Introduction

MLSGPU is a tool for reconstructing triangle meshes from point clouds obtained via laser range scanning (or potentially other methods). It is able to take advantage of GPUs for high performance, and can handle hundreds of gigabytes of input and output.

MLSGPU is only one step in a scanning pipeline. Acquistion, cleaning, registration and feature size estimation all need to occur before MLSGPU is used. Refer to Section 3.1, “Input files” and Section 3.2, “Output files” for further details on the input and output data formats.

Chapter 2. Installation

2.1. Dependencies

MLSGPU requires either Microsoft Windows, or a POSIX operating system such as a GNU/Linux system. At present only Ubuntu 12.04 is tested, but other variants are expected to work. It is also highly recommended that you use a 64-bit operating system. It should still be possible to use a 32-bit OS, but it is untested and there may be problems with large data sets.

MLSGPU depends on the following software to compile and run. Versions listed are the ones that have been tested; older or newer versions will often work too.

  • A C++ compiler. GCC 4.6 and 4.7, Clang 3.0 and MSVC 2010 have been tested. Note that at the time of writing, Clang does not support OpenMP and so performance will be reduced.

  • Boost 1.48, including the following runtime libraries:

    • boost_program_options

    • boost_iostreams

    • boost_thread

    • boost_filesystem

    • boost_system

    • boost_math_c99

    • boost_math_c99f

    • boost_serialization

  • clogs 1.1

  • Doxygen 1.7.4

  • Python 2.7

  • xsltproc 1.1

  • DocBook 4.3 stylesheets

  • An implementation of OpenCL 1.1. GPU device drivers will normally include this. It has been tested with NVIDIA GPU drivers and with the AMD APP SDK 2.7 on a CPU. The device must support images.

The following tools and libraries are necessarily to build optional parts, but are not required:

  • CppUnit 1.12 is needed to build the test suite.

The following list of packages should suffice on Ubuntu 12.04 (although it has not been tested against a clean installation), with the exception of clogs which has not been packaged for Ubuntu. When configuring clogs you can pass --cl-headers=MLSGPU_ROOT/khronos_headers to get the OpenCL header files.

xsltproc
docbook-xml
docbook-xsl
libboost-dev
libboost-iostreams-dev
libboost-filesystem-dev
libboost-system-dev
libboost-math-dev
libboost-program-options-dev
libboost-thread-dev
libcppunit-dev
g++
libgl1-mesa-dev

2.2. Compiling

Before actually compiling, the build must be configured. This can be done by running python waf configure. This will check that the required libraries are present. If configuration fails, you can find more detailed error information in build/config.log. The build system will attempt to auto-detect the compiler, but if you wish to override it you can set the CXX environment variable before doing the configuration.

The installation directories are chosen at configure time. The default is to install files into subdirectories of /usr/local, but this can be overridden with --prefix=PREFIX.

There are also other command-line options that can be given to affect the configured build. They are intended mainly for developer rather than end-user use, so they are not documented here. Running python waf configure --help will show a full list.

Once configuration is complete, running python waf will perform the compilation.

2.3. Installing

Once compilation is complete, run python waf install to install MLSGPU. If you used the default installation paths on the POSIX system, you may need to be root to do this.

Chapter 3. Running MLSGPU

3.1. Input files

The input format for MLSGPU is the PLY file format. Additionally, it is restricted to a subset of valid PLY files:

  • Only binary files are supported, and only in the endianness used by the host CPU (typically little-endian for an x86 or x86-64 CPU).

  • The first type of element in the file must be vertex. Other elements may be present but they must occur later in the file, and will be ignored.

  • The vertex element must contain the fields x, y, z, nx, ny, nz and radius (explained below), and they must all have type float32. Other fields may be present as long as they are not lists, and they will be ignored.

The positions are given by x, y and z. The units are arbitrarily, but they must of course match across all input files. The oriented normals are given by nx, ny and nz, and they must have unit length. The final required field is radius, which is an estimate of the spacing between the sample and its neighbors. This must be positive and use the same units as the position.

For best performance, the order of input samples in a file should correlate well with position. Simply outputting the points as they are encountered in a regular sampling grid will give good results. In particular, do not sort the points along a single axis, as this will reduce coherence.

MLSGPU accepts multiple input files. The files must already have been registered and transformed into a common coordinate system.

3.2. Output files

The output format for MLSGPU is again the PLY file format. The output file will contain just vertex positions and triangles; all other metadata from the input is discarded. MLSGPU can either write the entire output mesh to a single PLY file, or break the volume up into a regular grid and output a separate PLY file for each non-empty grid cell. In the latter case, the vertices at the boundaries between files will be duplicated in both files, so that neighboring files can be loaded together to give a seamless join.

3.3. Command-line options

The minimum command-line for running MLSGPU is

mlsgpu --fit-grid=spacing -o output.ply input.ply...

The spacing specifies the spacing between sample points in a regular grid that will be used in the Marching Tetrahedra algorithm. All vertices in the output file will be on edges of this grid. This value should be of a similar order of magnitude to the finest scanning density. Using too large a value will not only cause the reconstruction to look blocky, but will also lead to unexpected holes. Using too small a value will lead to an excessively large output file, and will also significantly increase the running time.

Multiple input files may be listed on the command line. You may also list a directory on the command line, in which case all .ply files in that directory will be loaded (but without recursing into subdirectories).

The following subsections document the options that are intended for general use. There are additional options that are only intended for use by the developers of MLSGPU, and which are not documented. You can see a full list of options by running mlsgpu --help, which also shows the default values used.

3.3.1. Temporary files

To handle the large datasets, the output mesh is first written to temporary files before being reorganised for the final output files. The temporary files will take roughly the same amount of space (sometimes around 20% more) as the final output files, so you will need to ensure you have sufficient free space. Use --tmp-dir path to store the temporary files in path. If this option is not specified, the default path for the operating system is used.

The temporary files are deleted at the end of a successful run, but if the program crashes or is killed, the temporary files will remain on disk and need to be manually removed to recover the space.

3.3.2. Response files

Operating systems sometimes place a limit on the length of a command-line, which can be difficult if there are a very large number of input files (although the option to specify a directory instead of a file is usually sufficient). To work around this, a response file can be used to place the command-line arguments in a file. First create a file with the command-line arguments. The arguments can be separated by whitespace or placed on separate lines. Then pass --response-file filename when running MLSGPU. It is possible to place some arguments in the response file and others on the command line, but only one response file is supported. The response-file processor is also extremely basic: spaces in filenames will cause problems, and shell wildcards will not work.

3.3.3. Splitting the output

Rather than producing a single giant output file, it is possible to split the output into chunks by passing --split on the command line. The chunks form a regular grid and each chunk is named basename_XXXX_YYYY_ZZZZ.ply, where XXXX, YYYY and ZZZZ are the positions within this grid and basename is the argument to -o. Note that for this usage, the argument to -o should be just a prefix and not a full filename.

Only output files that contain at least one triangle are written. If you are experimenting with different parameters, it is strongly recommended that you delete all the outputs from previous runs with the same basename before starting, as if the corresponding file is not written in the current run then the old file will be mixed in with the other newly written files.

The spatial size of the chunks is chosen automatically using heuristics that attempt to keep the size of each file manageable, but since it is impossible to determine the sizes of the output files in advance, the heuristic may need to be adjusted if the output files are too big or too small. This can be done by passing --split-size=size, where size is a target size. Use --help to see what the default value is and then adjust accordingly. You can use a suffix of K, M or G to specify kibibytes, mebibytes or gibibytes respectively.

3.3.4. Selecting OpenCL devices

By default, MLSGPU will run on all GPU devices it finds in the system. This is often the desired result, but in some cases it may be desirable to use extra devices or restrict the set of devices used. In particular, when there are no OpenCL-capable GPUs in the system, it will usually be necessary to pass --cl-cpu.

There are three command-line options that control device selection: --cl-cpu, --cl-gpu and --cl-device. The effects are additive, i.e., any device that matches any of the command-line selectors will be used. The --cl-cpu and --cl-gpu options take no arguments, and simply enable all CPU or GPU devices.

The --cl-device option can be used in two ways: firstly, --cl-device=prefix will enable all devices whose device name begins with prefix. The device name is determined by the OpenCL API; a tool like clinfo from the AMD APP SDK is useful to discover the names of the devices in the system. Secondly, --cl-device=prefix:n will enable just the nth device (zero-based) whose name starts with prefix. This is mainly useful if there are several identical devices in the system.

As an example, passing --cl-cpu --cl-device=Intel --cl-device=GeForce:0 will enable all CPU devices, all devices whose name begins with Intel and the first device whose name begins with GeForce.

Warning

When mixing devices that are not identical, differences in floating-point computation can cause variations at the join between blocks. This can lead to cracks in the reconstructed mesh, and in extreme cases the mesh may even become non-manifold. For final production always use only identical devices.

When MLSGPU starts, it will report which devices it is using.

3.3.5. Smoothing

The MLS reconstruction is essentially a process to smooth the noisy sampling process. The degree of smoothing can be controlled with --fit-smooth. Increasing the smoothing value will reduce noise, but may also smooth out detail. As a side effect of the implementation, increasing smoothing will also allow small holes to filled in that would not have been filled at lower smoothing levels. The running time scales roughly with the square of the smoothing factor, so using too much smoothing can also make MLSGPU very slow.

3.3.6. Component pruning

The underlying reconstruction algorithm tends to create spurious pieces of geometry that are disconnected from the rest of the model, so as a final step any small connected components are discarded. Usually this will just do the right thing, but if the scans actually capture some small feature that is disconnected from the rest of the scanned data, it may accidentally be discarded. In this case, the threshhold for discarding a component (as a fraction of the total number of output vertices) may be specified with --fit-prune.

3.3.7. Boundary handling

MLSGPU explicitly detects boundaries in the provided point cloud. It tries to avoid extrapolating beyond these boundaries, as these extrapolations tend to have very poor quality. However, the heuristic is not perfect, and tends to both cause unwanted small holes in the reconstruction and to extrapolate in some areas it should not. The default tries to balance the two, but the user can override the threshhold using --fit-boundary-limit. Increasing the value will cause more extrapolation, while decreasing it will reduce extrapolation but potentially open more holes. However, increasing the value beyond about 1.7 will have no further effect.

3.4. Limitations

There are a number of limitations to the amount and type of input and output that MLSGPU can handle:

  • Only certain types of input files can be used. See Section 3.1, “Input files” for details.

  • Up to 223 (about 8 million) input files. Note that when using large numbers of input files, you will probably need to either pass a directory on the command line, or use response files to work around limits on the length of the command line.

  • Up to 240 (about 1.1 trillion) points per input file.

  • Up to 232-1 (about four billion) vertices per output file (this is a limitation of the PLY file format).

  • The total size of the model can be at most 220 (about one million) times the grid spacing. For example, a model with a side length of 1 kilometre cannot be reconstructed at finer than 1mm.

Two runs of MLSGPU will generally not produce exactly the same stream of bytes, even with identical arguments. However, the only difference should be the order in which the vertices and triangles appear in the files, and the geometry should be identical.

Chapter 4. Using MLSGPU on a GPU cluster

MLSGPU can be used on a cluster to distribute processing to more GPUs than will fit in a single box. It scales reasonably well to 8 GPUs, but beyond this point it is likely that the master node will become a bottleneck as some operations are not parallelized.

To use MLSGPU on a cluster, you will need an MPI implementation while supports MPI-IO. We have only tested with OpenMPI 1.6 on Linux, and in fact older versions of OpenMPI have known bugs. MPI is automatically detected when running python waf configure. The resulting binary is called mlsgpu-mpi, and the interface is essentially the same as for mlsgpu.

Most data movement is handled through the filesystem. It is thus beneficial to have a high performance parallel filesystem that integrates with MPI-IO. We have had good results with GPFS, but other filesystems will probably work fine too. NFS does not work very well, because it requires a lot of locking to guarantee the necessary semantics for safe parallel access. Note that the temporary directory must be on a filesystem that is shared between the nodes, not a local scratch area.

MLSGPU is designed to run with one process per node and to use multiple threads, rather than running one per CPU core. If you are using OpenMPI, then you should pass -pernode to mpirun. MLSGPU will fire up a number of threads for managing I/O and GPUs, and more under the control of OpenMP (the number can be overridden by passing --omp-threads to mlsgpu-mpi). If you are using a scheduling system on the cluster it is best to ask to reserve entire nodes, but if not it is up to you to ensure that MLSGPU does not consume more CPU cores than you have reserved.

Chapter 5. Troubleshooting

5.1. The configuration said that a header file was not found, but I know it exists.
5.2. Meshlab crashes when I try to open one of the output files.
5.3. Every time I run the program I get different output files, even though I use the same options.
5.4. MLSGPU is using too much CPU memory.
5.5. I am getting errors about too much memory being used for an OpenCL device.
5.6. I get the error Too many splats covering one cell.
5.7. I get almost no output, or I get the message Warning: no output files written!
5.8. The output model contains lots of tiny holes in a regular pattern.
5.9. There are some small holes in the output that I would like to fill.
5.10. My scans consisted of several unconnected pieces and one of them does not appear in the output.

5.1.

The configuration said that a header file was not found, but I know it exists.

The error indicates that compilation using that header file failed, but this can happen for other reasons than the header file being absent. Look through build/config.log to find the error message.

5.2.

Meshlab crashes when I try to open one of the output files.

Meshlab is unable to process long comments. Try deleting the comments from the output file. On a UNIX system you can to do this by running

sed -i '/^comment mlsgpu/d' filename.ply

5.3.

Every time I run the program I get different output files, even though I use the same options.

This is normal behavior. The geometry is (or should be) the same every time. Only the order of the vertices and triangles change.

5.4.

MLSGPU is using too much CPU memory.

Run mlsgpu --help to get a list of options affecting memory usage with their default values, and try decreasing them. If you are only using one GPU it is possible to reduce the --mem-order value very substantially without having much effect on performance. If that isn't sufficient, try decreasing --mem-host-splats and --mem-load-splats proportionally.

Check whether --fit-grid was specified using the right units. If the input data is in millimetres but --fit-grid was specified in metres, the reconstruction will be 1000 times more detailed than expected, and this will require excessive memory to compute.

5.5.

I am getting errors about too much memory being used for an OpenCL device.

Firstly try reducing the value of --mem-bucket-splats. If this affects performance too badly, try increasing --subsampling by 1, or reducing --levels by 1.

5.6.

I get the error Too many splats covering one cell.

This usually indicates that the value of --fit-grid is far too high. This can happen if it is specified in millimetres when the input data is specified in metres, for example. It can also occur if trying to perform too coarse a reconstruction. If it is only slightly too large, it might be resolved by increasing --mem-bucket-splats.

5.7.

I get almost no output, or I get the message Warning: no output files written!

This usually indicates that the value of --fit-grid is too large, possibly as a result of a mismatch between the units specified for the option and the units used in the data files.

5.8.

The output model contains lots of tiny holes in a regular pattern.

This is usually caused by the value of --fit-grid being too large to accurately sample the surface. Try decreasing it slightly. Increasing --fit-smooth can also help and will avoid large increases to the output file size.

5.9.

There are some small holes in the output that I would like to fill.

Increasing --fit-smooth will provide some hole-filling, at the expense of being slower and potentially smoothing away important detail. Adjusting --fit-boundary-limit can also cause extrapolation to fill small holes, but will also cause extrapolation beyond genuine boundaries, sometimes with poor results.

5.10.

My scans consisted of several unconnected pieces and one of them does not appear in the output.

See Section 3.3.6, “Component pruning” for an explanation of the --fit-prune option.

Chapter 6. Support

MLSGPU is no longer being actively developed. If you find a bug or need a new feature, your best option is to fix or implement it yourself and send me a GitHub pull request.

Chapter 7. License

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

7.1. Third-party components

The files in the khronos_headers directory are copyright The Khronos Group Inc. Refer to the individual files for their license terms.

The waf build tool is copyright Thomas Nagy. Refer to the file for its license terms.