Hi Peter,
What happens is when the “this” pointer is created, space for the data members are created but not initialized. You can do a “pcopy” instead of “create” then the values of the scalar data members will be copied over, however in the case of pointers, the value will be that of the host pointer.
Next when you create (or copy) a data member that’s an array, space is created for the array and then “attached” to the this pointer (i.e. the data member pointer value is set to the device pointer’s address). Now when the data member is accessed via the implicit dereference, i.e. “this->data_”, the address is correct. Hence, order matters.
Try the following:_
void ProcessData(double *restrict dataXs, double *restrict dataYs,
double *restrict modelYs, int **restrict dataMap,
int nToAssign) {
#pragma acc enter data copy(this)
#pragma acc data copy(modelXs[0:N_MODELS], modelYs[0:N_MODELS], dataMap[0:N_INSTANCES][0:N_MODELS]) \
copyin(dataXs[0:N_INSTANCES], dataYs[0:N_INSTANCES])
{
...
// Assign the instance to the closest cluster
dataMap[i][j] = minID;
}
}
}
#pragma acc exit data delete(this)
...
Note that in this example, you’re creating the data on the device every time this routine is call. I would suggest instead move the create/copy of the “this” pointer into the constructor and the corresponding delete into the destructor. I would also use an unstructured data region to create/delete the data member arrays at same time they are allocated on the host. Then finally, change the structured data region’s copy you have in the code to instead use the update directive.
This way as you expand the use of compute regions, you can leave the data on the device and not have to copy it back and forth. You would then replace the “updates” with “present” clauses, and then only use the “updates” when you need to synchronize with the host. I will typically create a “acc_update_host” and “acc_update_device” methods in my classes where I encapsulate the update directives.
Here’s a simple example to help illustrate this:
% cat simple1.cpp
#include <iostream>
class foo {
protected:
int * data;
int size;
int factor;
public:
foo() {
size = 32;
factor=1;
data = new int[size];
#pragma acc enter data create(this)
#pragma acc update device(this)
// In 14.7 or later the above two directives can be replaced by
// #pragma acc enter data copy(this)
#pragma acc enter data create(data[0:size])
}
~foo() {
#pragma acc exit data delete(data[0:size])
#pragma acc exit data delete(this)
}
void setfactor(int fac) {
factor=fac;
#pragma acc update device(factor)
}
int getfactor() {
return factor;
}
void setdata() {
#pragma acc data present(data)
{
#pragma acc kernels loop independent
for (int i=0; i < size; ++i) {
data[i] = i+getfactor();
}
}
}
void printdata() {
for (int i=0; i < size; ++i) {
std::cout << data[i] << " ";
}
std::cout << std::endl;
}
#ifdef _OPENACC
void acc_update_host() {
#pragma acc update host(data[0:size])
}
void acc_update_device() {
#pragma acc update device(data[0:size])
}
#endif
};
int main () {
foo A;
A.setfactor(2);
A.setdata();
#ifdef _OPENACC
A.acc_update_host();
#endif
A.printdata();
return 0;
}
% pgc++ simple1.cpp -V14.4 -o cpu.out; cpu.out
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
% pgc++ simple1.cpp -acc -Minfo=accel -V14.4 -o acc.out ; acc.out
foo::foo():
17, Generating enter data create(this[:1])
Generating update device(this[:1])
Generating enter data create(data[:size])
foo::setfactor(int):
26, Generating update device(factor)
foo::getfactor():
27, Generating implicit acc routine seq
Generating Tesla code
foo::setdata(int):
33, Generating present(data[:])
Generating present_or_copy(this[:])
Generating Tesla code
35, Conditional loop will be executed in scalar mode
Accelerator scalar kernel generated
foo::acc_update_host():
49, Generating update host(data[:size])
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
Note that 14.4 was the first release that included support for the “this” pointer and unstructured data regions. There were a few bugs in this initial efforts, including the one I note in the comments, but later releases have hardened support.
If you get a chance to attend the NVIDIA GTC conference next March (http://www.gputechconf.com/), I have session on OpenACC and C++ which will discuss these issues.
Also, if you scroll down to the tutorials section on our OpenACC page (PGI Compilers with OpenACC | PGI), you can find an hour long presentation by Michael Wolfe on using C++ and OpenACC.
Hope this helps,
Mat