Training ensembles of randomized decision trees operating on large datasets on streaming processors such as GPUs or multi-core CPUs has classically been done utilizing large computing clusters or been mapped to the GPU in an overly specific or ineffective manner. None of the current machine learning methods are general enough to map on both GPU and CPU concurrently, and do not take advantage of heterogeneous hardware i.e mixed device types and capabilities.
The novelty of the method lies in its new and efficient mapping of the construction of randomized decision trees to modern streaming multiprocessor architectures while simultaneously maintaining high construction speed and the capacity to deal with very large datasets.
- The method significantly reduces operating costs and operating complexity and processing speed when compared to cluster based solutions for the same problem
- The method reduces the need for on-core datasets as previously been required for single node workstations used for building randomized decision trees
- The method takes full advantage of the parallelism of modern GPUs and CPUs, and the speed scales linearly in the number of processors and GPU devices
Applications and Opportunities
- The method described can be applied to very large datasets of training data on commodity hardware