thrust
Classes | Variables
Parallel Execution Policies

Classes

struct  thrust::execution_policy< DerivedPolicy >
 
struct  thrust::host_execution_policy< DerivedPolicy >
 
struct  thrust::device_execution_policy< DerivedPolicy >
 
struct  thrust::system::cpp::tag
 

Variables

static const detail::host_t thrust::host
 
static const __device__
detail::device_t 
thrust::device
 
static const detail::seq_t thrust::seq
 
static const unspecified thrust::system::cpp::par
 
static const unspecified thrust::system::cuda::par
 
static const unspecified thrust::system::omp::par
 
static const unspecified thrust::system::tbb::par
 

Detailed Description

Variable Documentation

static const detail::device_t thrust::device
static

thrust::device is the default parallel execution policy associated with Thrust's device backend system configured by the THRUST_DEVICE_SYSTEM macro.

Instead of relying on implicit algorithm dispatch through iterator system tags, users may directly target algorithm dispatch at Thrust's device system by providing thrust::device as an algorithm parameter.

Explicit dispatch can be useful in avoiding the introduction of data copies into containers such as thrust::device_vector or to avoid wrapping e.g. raw pointers allocated by the CUDA API with types such as thrust::device_ptr.

The user must take care to guarantee that the iterators provided to an algorithm are compatible with the device backend system. For example, raw pointers allocated by std::malloc typically cannot be dereferenced by a GPU. For this reason, raw pointers allocated by host APIs should not be mixed with a thrust::device algorithm invocation when the device backend is CUDA.

The type of thrust::device is implementation-defined.

The following code snippet demonstrates how to use thrust::device to explicitly dispatch an invocation of thrust::for_each to the device backend system:

#include <cstdio>
struct printf_functor
{
__host__ __device__
void operator()(int x)
{
printf("%d\n", x);
}
};
...
thrust::device_vector<int> vec(3);
vec[0] = 0; vec[1] = 1; vec[2] = 2;
thrust::for_each(thrust::device, vec.begin(), vec.end(), printf_functor());
// 0 1 2 is printed to standard output in some unspecified order
See Also
host_execution_policy
thrust::device
const detail::host_t thrust::host
static

thrust::host is the default parallel execution policy associated with Thrust's host backend system configured by the THRUST_HOST_SYSTEM macro.

Instead of relying on implicit algorithm dispatch through iterator system tags, users may directly target algorithm dispatch at Thrust's host system by providing thrust::host as an algorithm parameter.

Explicit dispatch can be useful in avoiding the introduction of data copies into containers such as thrust::host_vector.

Note that even though thrust::host targets the host CPU, it is a parallel execution policy. That is, the order that an algorithm invokes functors or dereferences iterators is not defined.

The type of thrust::host is implementation-defined.

The following code snippet demonstrates how to use thrust::host to explicitly dispatch an invocation of thrust::for_each to the host backend system:

#include <cstdio>
struct printf_functor
{
__host__ __device__
void operator()(int x)
{
printf("%d\n", x);
}
};
...
int vec(3);
vec[0] = 0; vec[1] = 1; vec[2] = 2;
thrust::for_each(thrust::host, vec.begin(), vec.end(), printf_functor());
// 0 1 2 is printed to standard output in some unspecified order
See Also
host_execution_policy
thrust::device
const unspecified thrust::system::tbb::par
static

thrust::tbb::par is the parallel execution policy associated with Thrust's TBB backend system.

Instead of relying on implicit algorithm dispatch through iterator system tags, users may directly target Thrust's TBB backend system by providing thrust::tbb::par as an algorithm parameter.

Explicit dispatch can be useful in avoiding the introduction of data copies into containers such as thrust::tbb::vector.

The type of thrust::tbb::par is implementation-defined.

The following code snippet demonstrates how to use thrust::tbb::par to explicitly dispatch an invocation of thrust::for_each to the TBB backend system:

#include <cstdio>
struct printf_functor
{
__host__ __device__
void operator()(int x)
{
printf("%d\n");
}
};
...
int vec[3];
vec[0] = 0; vec[1] = 1; vec[2] = 2;
thrust::for_each(thrust::tbb::par, vec.begin(), vec.end(), printf_functor());
// 0 1 2 is printed to standard output in some unspecified order
const unspecified thrust::system::omp::par
static

thrust::omp::par is the parallel execution policy associated with Thrust's OpenMP backend system.

Instead of relying on implicit algorithm dispatch through iterator system tags, users may directly target Thrust's OpenMP backend system by providing thrust::omp::par as an algorithm parameter.

Explicit dispatch can be useful in avoiding the introduction of data copies into containers such as thrust::omp::vector.

The type of thrust::omp::par is implementation-defined.

The following code snippet demonstrates how to use thrust::omp::par to explicitly dispatch an invocation of thrust::for_each to the OpenMP backend system:

#include <cstdio>
struct printf_functor
{
__host__ __device__
void operator()(int x)
{
printf("%d\n");
}
};
...
int vec[3];
vec[0] = 0; vec[1] = 1; vec[2] = 2;
thrust::for_each(thrust::omp::par, vec.begin(), vec.end(), printf_functor());
// 0 1 2 is printed to standard output in some unspecified order
const unspecified thrust::system::cpp::par
static

thrust::system::cpp::par is the parallel execution policy associated with Thrust's standard C++ backend system.

Instead of relying on implicit algorithm dispatch through iterator system tags, users may directly target Thrust's C++ backend system by providing thrust::cpp::par as an algorithm parameter.

Explicit dispatch can be useful in avoiding the introduction of data copies into containers such as thrust::cpp::vector.

The type of thrust::cpp::par is implementation-defined.

The following code snippet demonstrates how to use thrust::cpp::par to explicitly dispatch an invocation of thrust::for_each to the standard C++ backend system:

#include <cstdio>
struct printf_functor
{
__host__ __device__
void operator()(int x)
{
printf("%d\n");
}
};
...
int vec[3];
vec[0] = 0; vec[1] = 1; vec[2] = 2;
thrust::for_each(thrust::cpp::par, vec.begin(), vec.end(), printf_functor());
// 0 1 2 is printed to standard output in some unspecified order
const unspecified thrust::system::cuda::par
static

thrust::cuda::par is the parallel execution policy associated with Thrust's CUDA backend system.

Instead of relying on implicit algorithm dispatch through iterator system tags, users may directly target Thrust's CUDA backend system by providing thrust::cuda::par as an algorithm parameter.

Explicit dispatch can be useful in avoiding the introduction of data copies into containers such as thrust::cuda::vector.

The type of thrust::cuda::par is implementation-defined.

The following code snippet demonstrates how to use thrust::cuda::par to explicitly dispatch an invocation of thrust::for_each to the CUDA backend system:

#include <cstdio>
struct printf_functor
{
__host__ __device__
void operator()(int x)
{
printf("%d\n");
}
};
...
int vec[3];
vec[0] = 0; vec[1] = 1; vec[2] = 2;
thrust::for_each(thrust::cuda::par, vec.begin(), vec.end(), printf_functor());
// 0 1 2 is printed to standard output in some unspecified order

Explicit dispatch may also be used to direct Thrust's CUDA backend to launch CUDA kernels implementing an algorithm invocation on a particular CUDA stream. In some cases, this may achieve concurrency with the caller and other algorithms and CUDA kernels executing on a separate CUDA stream. The following code snippet demonstrates how to use the thrust::cuda::par execution policy to explicitly dispatch invocations of thrust::for_each on separate CUDA streams:

struct printf_functor
{
cudaStream_t s;
printf_functor(cudaStream_t s) : s(s) {}
__host__ __device__
void operator()(int)
{
printf("Hello, world from stream %p\n", static_cast<void*>(s));
}
};
int main()
{
// create two CUDA streams
cudaStream_t s1, s2;
cudaStreamCreate(&s1);
cudaStreamCreate(&s2);
// execute for_each on two different streams
thrust::for_each(thrust::cuda::par(s1), iter, iter + 1, printf_functor(s1));
thrust::for_each(thrust::cuda::par(s2), iter, iter + 1, printf_functor(s2));
// synchronize with both streams
cudaStreamSynchronize(s1);
cudaStreamSynchronize(s2);
// destroy streams
cudaStreamDestroy(s1);
cudaStreamDestroy(s2);
return 0;
}

Even when using CUDA streams with thrust::cuda::par, there is no guarantee of concurrency. Algorithms which return a data-dependent result or whose implementations require temporary memory allocation may cause blocking synchronization events. Moreover, it may be necessary to explicitly synchronize through cudaStreamSynchronize or similar before any effects induced through algorithm execution are visible to the rest of the system. Finally, it is the responsibility of the caller to own the lifetime of any CUDA streams involved.

const detail::seq_t thrust::seq
static

thrust::seq is an execution policy which requires an algorithm invocation to execute sequentially in the current thread. It can not be configured by a compile-time macro.

The type of thrust::seq is implementation-defined.

The following code snippet demonstrates how to use thrust::seq to explicitly execute an invocation of thrust::for_each sequentially:

#include <vector>
#include <cstdio>
struct printf_functor
{
__host__ __device__
void operator()(int x)
{
printf("%d\n", x);
}
};
...
std::vector<int> vec(3);
vec[0] = 0; vec[1] = 1; vec[2] = 2;
thrust::for_each(thrust::seq, vec.begin(), vec.end(), printf_functor());
// 0 1 2 is printed to standard output in sequential order
See Also
thrust::host
thrust::device