shared task vars vs. reduction

The public comment period closed January 31, 2008. This forum is now locked (read only).

shared task vars vs. reduction

Postby jakub » Thu Mar 06, 2008 3:29 am

2.7 says "Note – When storage is shared by an explicit task region, it is the programmer's responsibility to ensure, by adding proper synchronization, that the storage does not reach the end of its lifetime before the explicit task region completes its execution."
When exactly are supposed all explicit tasks tied to the current thread waited for?
Say is:
Code: Select all
#include <omp.h>
#include <stdlib.h>

int
main (void)
{
  int l = 0;
#pragma omp parallel reduction (+:l) num_threads (1)
  {
  #pragma omp task shared (l)
    l++;
  /* #pragma omp taskwait */
  }
  if (l != 1)
    abort ();
  return 0;
}


valid? I.e. is the point where all explicit tasks tied to the current thread after the last instruction in the implicit task, or later (e.g. after all reduction var merging, running destructors, etc.)?
Mandating it before might penalize even OpenMP 2.5 code or code that never uses tasks - as tasks can be created in other functions called from implicit task, not necessarily the implicit task itself,
this would mean the compiler needs to add before reduction merging code, lastprivate copying and running destructors probably a function call which would wait for all pending tasks. With
OpenMP 2.5 that wasn't necessary. Or is the above undefined and the above mentioned note applies in this case (and so the commented out taskwait is needed)?
jakub
 
Posts: 74
Joined: Fri Oct 26, 2007 3:19 am

Re: shared task vars vs. reduction

Postby aduran » Sun Apr 27, 2008 12:09 pm

Jakub,

Tasks are guaranteed to be completed in this case after the implicit barrier of the parallel region (as happens with all implicit barriers) so no taskwait is necessary.

The impact of that check on applications without tasks (e.g. 2.5 apps) is negligible if properly implemented.
aduran
 
Posts: 12
Joined: Wed Oct 24, 2007 8:33 am
Location: Barcelona, Spain

Re: shared task vars vs. reduction

Postby lfm » Sun Apr 27, 2008 8:36 pm

There is one case where you might need a barrier (related to Jakub's previous question). It is unspecified whether the implicit barrier at the end of the parallel region is executed before the block goes out of scope. So with:

Code: Select all
#pragma omp parallel
{
   int i;
#pragma omp task shared(i)
   ...
}


i is allowed to go out of scope before the tasks, possibly executed in the barrier at the end of the region, are completed. So if you want to ensure that i remains in scope you would need an explicit barrier before the end of the parallel region. Again, this is a programmer requirement, not an implementation requirement.

-- Larry
lfm
 
Posts: 135
Joined: Sun Oct 21, 2007 4:58 pm
Location: OpenMP ARB

Re: shared task vars vs. reduction

Postby jakub » Mon Apr 28, 2008 3:44 am

I still am not sure I understand when exactly variables go out of the scope. For local variables declared inside of parallel block it is expected that they get out of scope before the implicit barrier. So
Code: Select all
#pragma omp parallel
  {
    int i = 5;
    #pragma omp task shared (i)
    {
      do_something_with_i (&i);
    }
  }

is invalid, because by the time the task is executed i might be already out of scope in the implicit task. Now, are variables in private/firstprivate clauses on the parallel going out of the scope
before or after the implicit barrier?
Code: Select all
int i = 5;
#pragma omp parallel firstprivate (i)
  {
  #pragma omp task shared (i)
    {
      do_something_with_i (&i);
    }
  }

If they are going out of the scope before the implicit barrier, then what makes reduction special? If they are still in the scope at the implicit barrier, what if their destructors have
#pragma omp task ? Then tasks would need to be scheduled after the implicit barrier (of course they could be then forced to be if (0) task).

Another thing are task firstprivate variables with constructors. While for POD firstprivate vars the implementation can copy the variables into some buffer and defer creation of the task's stack
till it is actually scheduled to be run, I doubt the implementation would be allowed to construct vars with constructors in a temporary buffer - that would mean one extra pair of user visible copy-ctor/dtor calls per such firstprivate variable. But if it is not untied task and would create the task stack right away, switch to the new task context temporarily to run all the copy-ctors and then switch back and let the new task wait until it is actually scheduled to be run, the user could observe a tied task executed by two different threads (e.g. if the copy-ctors call omp_get_thread_num () and so does the body of the task). Is that ok?
jakub
 
Posts: 74
Joined: Fri Oct 26, 2007 3:19 am

Re: shared task vars vs. reduction

Postby lfm » Mon Apr 28, 2008 9:05 am

Jakub, you raise some interesting points. I missed the fact that you were talking about reduction in the original post. Let me think about this a little bit.

-- Larry
lfm
 
Posts: 135
Joined: Sun Oct 21, 2007 4:58 pm
Location: OpenMP ARB

Re: shared task vars vs. reduction

Postby lfm » Mon Apr 28, 2008 4:13 pm

Ok, with the help of some folks on the OpenMP language committee that actually read the specification instead of just editing it, here are the answers:

First, it is unspecified when the variables in a reduction clause go out of scope. This is part of the reason for the user-supplied synchronization requirement.

Second, the reduction example is non-conforming. See Section 2.9.3.6. (p. 95, line 26 in the public comment draft) in the Restrictions section for reduction. You can't access the reduction variable in an explicit task. So that's what makes reductions special.

For the third item, the scope of the private and firstprivate variables is the task, so any destructors are called when the task is finished. You are correct that the implementation cannot introduce any extra ctor/dtor calls for privatization. However, the specification does not constrain the points at which construction occurs (except of course they must occur before the variables are used). I would expect most implementations to construct the private variables when the task is encountered and place them in a data area that implements the data environment for the task. So it would be perfectly reasonable for the thread that executes the constructor be different from the thread that executes the task.

Hope this resolves all your questions.

-- Larry
lfm
 
Posts: 135
Joined: Sun Oct 21, 2007 4:58 pm
Location: OpenMP ARB

Re: shared task vars vs. reduction

Postby lfm » Mon Apr 28, 2008 4:18 pm

First, it is unspecified when the variables in a reduction clause go out of scope


Geez, I must be getting tired. I meant

It is unspecified when the variables in a parallel region go out of scope; that is, before or after the implicit barrier at the end of the region.

Note that this is true in general for implicit barriers at the end of regions.
lfm
 
Posts: 135
Joined: Sun Oct 21, 2007 4:58 pm
Location: OpenMP ARB


Return to Draft 3.0 Public Comment

Who is online

Users browsing this forum: No registered users and 2 guests

cron