How to overcome large datasets in point cloud processing
Point clouds store tremendously detailed information about physical space. This produces large files that can become unwieldy, particularly as projects grow in size. Large projects can include hundreds (or even thousands) of scans, sometimes creating datasets in excess of a terabyte. This puts a strain on hardware capabilities and massively extends already lengthy processing periods. Dealing with datasets of this size is one of the largest problems that creators of point clouds face. This is a crash course in how to overcome large datasets in point cloud processing.
Step 1: Provision hardware that can handle large point cloud datasets
Storage has become less of an issue than even a few years ago. It is not hard to find laptops with more than a terabyte of capacity. You can scale up to a 10 terabyte external hard drive for a few hundred pounds. Point cloud processing is not a graphics-heavy operation. Investing in a quality graphics card is only important if you plan on creating 3D models based on the processed point cloud data.
But, even when looking to just undertake processing, that doesn’t mean you can go out and purchase the cheapest laptop on the market and expect it to run smoothly. To start with, buy SSD over HDD, it will be faster. Second, get something with a quality processor and a lot of RAM, point cloud processing is CPU and memory intensive.
Some point cloud processing software claims minimum system requirements of 4GB of RAM. But, this is just the requirement to run the program. 16GB of RAM is your baseline, and ideally invest in 32GB. Power users might want to invest in even more. It is easier and more affordable to scale up RAM in a desktop. It is, however, unlikely that you will need anything above 128GB under current circumstances.
You will only be able to unlock the potential of all of that RAM with a good processor. Basically, you want the best processor on the commercial market that you can afford. Right now, that is probably the Intel Core i9-7980XE, closely followed by the Intel Core i9-7940X and AMD Ryzen Threadripper 1950X. High-end Apple products favour the Intel Xeon W.
A more standard (and more affordable) Intel Core i7-7820X or AMD Ryzen 7 2700X are also good choices. But, moving down into the budget end of the spectrum will not serve you well. Particularly if investing in a RAM heavy computer you will simply waste your money.
Why Processor Specifics Really Matter
Some point cloud processing software allows you to register multiple scans simultaneously during segments of the registration procedure, each taking up one thread within a CPU core. A few programs allow you to take advantage of this multi-tasking throughout the entire procedure. Depending on the software you use, it becomes increasingly difficult to have an ‘overpowered’ CPU.
You should also look out for ‘hyperthreading’, or what AMD calls ‘simultaneous multi-threading (SMT)’. This technology allows each ‘core’ to power two ‘threads’ at once. Most high-end Intel and AMD processors use this technology, although there have been rumours that the newest i7s won't. ‘Hyperthreaded’ actions will run at a slower clock speed than those assigned a dedicated logical core, so hyperthreading should not be seen as a substitute for a multi-core processor. However, you should be aware of your CPUs capability.
Regardless, reading the fine print can make a big difference when it comes to processors. For example, a standard i7 processor has four cores. A high-end i7, however, can pack 8-10 cores (such as an i7-7820X i7-6950X). The i9-7980EX has 18 cores. Each step up will cost you, but those differences have a huge practical impact on processing speeds.
Step 2: Consider the Cloud as a solution to supercharge point cloud processing
The cloud is being looked at as a solution to processing the largest point cloud datasets at speed. The cloud offers effectively infinite processing power. For processing software that can take advantage of multiple threads throughout processing, this theoretically creates the ability to simultaneously undertake the ‘coarse registration’ of a project of any size. Putting the scans in order (setting scan pairs) will always take longer with more scans. But, in the cloud, you can just keep adding threads.
To effectively approach the cloud this way you need to make sure that your software can undertake multi-thread, simultaneous registration and supports cloud processing. You also need to make sure that you have a high-quality connection to the internet and are not limited by a data transfer cap. You still need a good computer, but your ability to process at speed will hinge more on your bandwidth than CPU.
Step 3: Make the right point cloud processing software choices
All of your investments in hardware and quality connections will go to waste if using bad software. First, look for software that allows for the process multiple scans simultaneously on multiple threads. Ideally, look for software that can take advantage of multiple threads throughout the entire processing procedure. Also, check for cloud support.
The big brand producers of point cloud processing software are Autodesk, Leica, Bentley, Faro and Trimble. These are all quality options. However, all of these software systems come as packages with 3D modelling. If that is not something you are interested in, it will drive up costs. It also means that processing has never been the sole focus of the design teams that built these products.
New entrants into the market often focus on a single aspect of point cloud creation, delivering improvements to either the processing or modelling. This type of focus allows for faster innovation and often better, yet singular, products. For example, several new processing programs are taking advantage of novel rotational vector techniques to accelerate the processing registration procedure by as much as 40%-80%.
Vector analysis allows coarse registration to be split into three stages. Point clouds are compressed into unique ‘vector spheres’, enabling rotational alignment. This allows for rapid 2D point density alignment on the vertical and horizontal axes. By speeding up the pair alignment of each adjacent scan, this approach allows much larger datasets to be processed faster and more simply.
Automation, frontloading and review
Look for software that automates manual processes and frontloads what remains. Cross-checking scans, changing parameters for every pairing, and generally having to be involved throughout the processing period exponentially increases the time costs of handling large datasets. The ability to build entire scan trees, walk away and deal with verification after the dataset has been processed greatly improves efficiency.
Look into the review processes available within programs. Ideally, you want options. You can either check the accuracy of data using visualisation features or statistical data. Visualisations give you confidence in the results being correct while statistics tell you ‘how’ correct that data is. It is important to have both options available if looking to quickly review data and check in detail whether or not the results comply with the job specifications. The quality of such tools to review large quantities of data directly impacts your ability to trust the final output.
Pay particular attention to the interface for scan pairing (building a scan tree). For large projects, this can be a challenging aspect of the processing procedure. A system with a smooth interface that allows for the rapid alignment of scans will significantly reduce the effort and time it takes to handle a large dataset.
Step 4: Learn software tricks to speed up point cloud processing and decrease file sizes
In addition to the critical ability to use multiple threads to process several scan pairs simultaneously, there are three main features that you should look out for. The first is the ability to normalise point distributions throughout a scan field. Laser scanners produce a greater density of point data in the area directly surrounding them. This is an inescapable byproduct of making line-of-site measurements based on incremental angular changes. This excess data, however, is effectively useless. Most programs allow you to set a ‘thinning’ metric that culls all data points within set parameters. Depending on the nature of the project, you will want to set this number somewhere between 5mm and 25mm.
The second feature is ‘decimation’. This is the thinning of point data throughout a scan field regardless of its proximity to the scanner. Unlike normalising, this impacts all data throughout a scene and can easily deteriorate your data files. Decimation should be done carefully. But, if you understand the specific precision requirements of the end result, you can safely reduce file sizes.
The third feature is the maximum distance settings. Laser scanners have an enormous range. If scanning outdoors, or even in a large indoor space, scans can collect point data that is very far away from the scanner. If, in order to get angle coverage, that area is part of another scan anyway, that data is not useful. By setting a maximum distance, you can remove data from your files and decrease the amount of data that has to be processed for each cloud-to-cloud alignment. This is also important to creating a clear end product.
Step 5: Make sure that you can appropriately account for ‘propagation of error’
For datasets with a large number of scans, it is critical to be able to account for ‘propagation of error’. This is the compounding of inaccuracies and deficiencies in precision as each scan is paired with another. Each pair of scans is effectively built on the next one, all rooting back to a single ‘home’ scan that ‘fixes’ the entire composite point cloud. The more links between any given scan and its ‘home’ scan, the greater propagation error it will suffer.
For example, imagine aligning a dataset in which each scan had an error of +/- 2mm. Your home scan will have an error +/- 2mm. But, each step you take away from that scan will suffer an ever increasing level of error equal to the sum of the errors squared. Therefore, although each adjacent scan will have an inherent baseline error of +/- 2mm, it's error within the composite scan will be +/-16mm. This will only increase as you move further away. [(n1*2)2 + (n2*2)2 … = total error (where ‘n’ is the error rate for each scan)]
This is why it is important, even in small datasets, to place the home scan in the middle of the scan tree. For large scan sets, you need to subdivide the entire scan file with multiple home scans dispersed at intervals that will keep the whole dataset within the maximum tolerable error rate.
To do this, you need software that can take two or more point clouds that have gone through coarse registration and combine them prior to doing a ‘global’ fine registration. However, the bedrock of this capability is rooted in the surveying techniques used in the field. For large projects, surveyors need to build a rigorous network of targets using a total station that can act as a control frame for the global placement of scans. This does not mean that each scan pair has to be aligned using targeted registration. Surveyors simply need to think about the end requirements of the project and build a site grid that will allow for cloud-to-cloud registration in containable chunks that will not be overly corrupted by propagation of error.
Summary: Large Datasets Require Planning
Large datasets are a challenge, but they can be overcome with planning. That starts with thinking about the level of precision required in the finished point cloud, along with the number of scans your survey will produce. It is possible that to meet the job specifications, you will have to construct a site grid using a total station to account for the propagation of error. Next, make sure that your hardware is up to standards and that you pick quality software.
Program capabilities will differ, and it is important to make sure that you have the options to reduce the size of datasets and then efficiently process them. Specifically, look out for software that can take advantage of multi-thread, simultaneous processing throughout the entire processing procedure. Look at cloud options for processing the largest datasets. Then, all you really need to worry about is having enough storage to keep the output. Given the state of modern computing, that is realistically the least of your worries. Large datasets aren’t fundamentally different than smaller one. Just plan ahead and give yourself enough time to processes and handle the large files.
Tags: point clouds