[Omp] The test of the barrier
Shengyan Hong
shhong at cse.psu.edu
Mon Mar 26 07:13:33 PDT 2007
Every OMP member,
Now I test the barrier by using the code like this:
!$omp parallel default(shared) private(i,j,k,jj,y1,y2)
!$omp& shared(is,logd1,d1)
CALL MAGIC_BRK_SIM_START()
!$omp do
do jj = 0, d2 - fftblock, fftblock
do j = 1, fftblock
do i = 1, d1
y1(j,i) = x(i,j+jj,k)
enddo
enddo
call cfftz (is, logd1, d1, y1, y2)
do j = 1, fftblock
do i = 1, d1
xout(i,j+jj,k) = y1(j,i)
enddo
enddo
enddo
!$omp end do nowait
CALL MAGIC_BRK_SIM_MIDDLE()
!$omp BARRIER
CALL MAGIC_BRK_SIM_STOP()
!$omp end parallel
enddo
I test the exe time and the idle time.
exe_time=middle_time-start_time, idle_time=stop_time-middle_time
I run the program on the simics, and I use 8 processors with
different frequencies and L1 cache latencies. The parallel code is
divided into 8 threads. Each time I get the data exe time and the idle
time for one iteration, I will reallocate the 8 threads to the 8 processors.
Besides, I also do another experiment in which I do not reallocate the
8 threads to the 8 processors.
Actually I just change the frequencies and the latencies of the 8
processors to implement the allocation.
Now there is one problem. I compare the exe time of the 2
experiments, and find that for the same iteration, the exe time for the
same thread in 2 experiments are the same, even when I change the
processor for the thread in 1 experiment. For example, in the reallocation
experiment, for 32nd iteration, the exe time for the thread 2 is 22664; in the
non-reallocation experiment, for the same iteration, the exe time for the
thread 2 is also 22664. But the processors for the threads are different.
Another problem is that I add the exe time and the idle time to get
the total time. But the total time of the thread 0 is always around
400 cycles smaller than that of the other threads.
Thank you.
Shengyan Hong
More information about the Omp
mailing list