thrust
|
Classes | |
struct | thrust::execution_policy< DerivedPolicy > |
struct | thrust::host_execution_policy< DerivedPolicy > |
struct | thrust::device_execution_policy< DerivedPolicy > |
struct | thrust::system::cpp::tag |
Variables | |
static const detail::host_t | thrust::host |
static const __device__ detail::device_t | thrust::device |
static const detail::seq_t | thrust::seq |
static const unspecified | thrust::system::cpp::par |
static const unspecified | thrust::system::cuda::par |
static const unspecified | thrust::system::omp::par |
static const unspecified | thrust::system::tbb::par |
|
static |
thrust::device
is the default parallel execution policy associated with Thrust's device backend system configured by the THRUST_DEVICE_SYSTEM
macro.
Instead of relying on implicit algorithm dispatch through iterator system tags, users may directly target algorithm dispatch at Thrust's device system by providing thrust::device
as an algorithm parameter.
Explicit dispatch can be useful in avoiding the introduction of data copies into containers such as thrust::device_vector
or to avoid wrapping e.g. raw pointers allocated by the CUDA API with types such as thrust::device_ptr
.
The user must take care to guarantee that the iterators provided to an algorithm are compatible with the device backend system. For example, raw pointers allocated by std::malloc
typically cannot be dereferenced by a GPU. For this reason, raw pointers allocated by host APIs should not be mixed with a thrust::device
algorithm invocation when the device backend is CUDA.
The type of thrust::device
is implementation-defined.
The following code snippet demonstrates how to use thrust::device
to explicitly dispatch an invocation of thrust::for_each
to the device backend system:
|
static |
thrust::host
is the default parallel execution policy associated with Thrust's host backend system configured by the THRUST_HOST_SYSTEM
macro.
Instead of relying on implicit algorithm dispatch through iterator system tags, users may directly target algorithm dispatch at Thrust's host system by providing thrust::host
as an algorithm parameter.
Explicit dispatch can be useful in avoiding the introduction of data copies into containers such as thrust::host_vector
.
Note that even though thrust::host
targets the host CPU, it is a parallel execution policy. That is, the order that an algorithm invokes functors or dereferences iterators is not defined.
The type of thrust::host
is implementation-defined.
The following code snippet demonstrates how to use thrust::host
to explicitly dispatch an invocation of thrust::for_each
to the host backend system:
|
static |
thrust::tbb::par
is the parallel execution policy associated with Thrust's TBB backend system.
Instead of relying on implicit algorithm dispatch through iterator system tags, users may directly target Thrust's TBB backend system by providing thrust::tbb::par
as an algorithm parameter.
Explicit dispatch can be useful in avoiding the introduction of data copies into containers such as thrust::tbb::vector
.
The type of thrust::tbb::par
is implementation-defined.
The following code snippet demonstrates how to use thrust::tbb::par
to explicitly dispatch an invocation of thrust::for_each
to the TBB backend system:
|
static |
thrust::omp::par
is the parallel execution policy associated with Thrust's OpenMP backend system.
Instead of relying on implicit algorithm dispatch through iterator system tags, users may directly target Thrust's OpenMP backend system by providing thrust::omp::par
as an algorithm parameter.
Explicit dispatch can be useful in avoiding the introduction of data copies into containers such as thrust::omp::vector
.
The type of thrust::omp::par
is implementation-defined.
The following code snippet demonstrates how to use thrust::omp::par
to explicitly dispatch an invocation of thrust::for_each
to the OpenMP backend system:
|
static |
thrust::system::cpp::par
is the parallel execution policy associated with Thrust's standard C++ backend system.
Instead of relying on implicit algorithm dispatch through iterator system tags, users may directly target Thrust's C++ backend system by providing thrust::cpp::par
as an algorithm parameter.
Explicit dispatch can be useful in avoiding the introduction of data copies into containers such as thrust::cpp::vector
.
The type of thrust::cpp::par
is implementation-defined.
The following code snippet demonstrates how to use thrust::cpp::par
to explicitly dispatch an invocation of thrust::for_each
to the standard C++ backend system:
|
static |
thrust::cuda::par
is the parallel execution policy associated with Thrust's CUDA backend system.
Instead of relying on implicit algorithm dispatch through iterator system tags, users may directly target Thrust's CUDA backend system by providing thrust::cuda::par
as an algorithm parameter.
Explicit dispatch can be useful in avoiding the introduction of data copies into containers such as thrust::cuda::vector
.
The type of thrust::cuda::par
is implementation-defined.
The following code snippet demonstrates how to use thrust::cuda::par
to explicitly dispatch an invocation of thrust::for_each
to the CUDA backend system:
Explicit dispatch may also be used to direct Thrust's CUDA backend to launch CUDA kernels implementing an algorithm invocation on a particular CUDA stream. In some cases, this may achieve concurrency with the caller and other algorithms and CUDA kernels executing on a separate CUDA stream. The following code snippet demonstrates how to use the thrust::cuda::par
execution policy to explicitly dispatch invocations of thrust::for_each
on separate CUDA streams:
Even when using CUDA streams with thrust::cuda::par
, there is no guarantee of concurrency. Algorithms which return a data-dependent result or whose implementations require temporary memory allocation may cause blocking synchronization events. Moreover, it may be necessary to explicitly synchronize through cudaStreamSynchronize
or similar before any effects induced through algorithm execution are visible to the rest of the system. Finally, it is the responsibility of the caller to own the lifetime of any CUDA streams involved.
|
static |
thrust::seq
is an execution policy which requires an algorithm invocation to execute sequentially in the current thread. It can not be configured by a compile-time macro.
The type of thrust::seq
is implementation-defined.
The following code snippet demonstrates how to use thrust::seq
to explicitly execute an invocation of thrust::for_each
sequentially: