[Omp] Overhead of #pragma omp for static nowait
Greg Bronevetsky
greg at bronevetsky.com
Sun Dec 10 02:19:20 PST 2006
I'm getting my numbers from the EPCC microbenchmarks and the overhead
numbers provided by the scheduling microbenchmark are the following:
let seq = time of the following loop:
for(i=0; i<n; i++) { delay(); }
let par = time of following loop:
int num_threads=omp_num_threads();
#pragma omp for
for(i=0; i<n*num_threads; i++) { delay(); }
overhead = par-seq
One interesting phenomenon that I've noticed with this test is that on the
IA32 machine (I haven't tried the IA64 yet) and an IBM machine the loop
overhead is static for upto a certain number of iterations per thread
(this number is different for different chunk sizes). For larger numbers
iterations per thread the overhead then rises linearly with the number of
iterations. From this it is possible to compute the base overhead of
"#pragma omp for schedule(static) nowait" as well as the per-iteration
overhead. From these (admittedly crude) calculations, it appears that on
both machines the base overhead is on the order of 1-2us while the
per-iteration overhead is a few ns. Thus, the mystery overhead seems to be
incurred at loop start-up or termination.
I've looked at the assembly for your example code from Intel 9.1 for IA32
and IA64 and while I don't really understand what it is doing, one thing
I've noticed is that while both versions call __kmpc_global_thread_num
before entering the parallel loop, neither version calls a function to get
the total number of threads. Since this seems to be necessary to compute
the iteration schedule at compile-time, this implies that the generated
code is doing something else. However, when you look at the code generated
by the IA32 compiler with optimizations set to -O0, the code is quite
small (<100 assembly instructions), implying that whatever the compiler is
doing it is probably rather simple. I'm attaching the assembly files for
both -O0 and -O3 on both IA32 and IA64. Please tell me if you can derive
more from it than I can.
Greg Bronevetsky
> Intel's compiler does something close to that except that the
> for bounds for each chunk are computed by calling library calls
> rather than inline. I'm sure you can determine the code gen
> strategy for other platforms by inspecting the assembly code.
> Normally something like "icc -S -openmp foo.c" will give you
> what you are looking for. Try this simple code with an Intel64
> compiler and look at the assembly:
>
> fubar(int n)
> {
> int i;
> #pragma omp for schedule(static)
> for (i = 0; i < n; ++i)
> sub(i);
> }
>
> I find it hard to believe that the overhead of calling a
> couple of functions really accounts for two orders of magnitiude
> performance difference. Perhaps there is a problem with your
> analysis?
>
> -----Original Message-----
> From: omp-bounces at openmp.org [mailto:omp-bounces at openmp.org] On Behalf
> Of Greg Bronevetsky
> Sent: Friday, December 08, 2006 1:59 PM
> To: omp at openmp.org
> Subject: RE: [Omp] Overhead of #pragma omp for static nowait
>
> I mean the following compiler transformation:
> #pragma omp for static(1) nowait
> for(int i=0; i<n; i++){}
> should become:
> for(int i=omp_get_thread_num(); i<n; i+=omp_get_num_threads())
> {}
>
> and
> #pragma omp for static nowait
> for(int i=0; i<n; i++){}
> should become:
> // the id of the last thread that gets 1 more iteration than others
> int midPoint=n%omp_get_num_threads();
> // number of iterations assigned to threads with smaller ids
> int itersBeforeMe;
> if(omp_get_thread_num()<=midPoint)
> itersBeforeMe = omp_get_thread_num()*(n/omp_get_num_threads()+1);
> else
> itersBeforeMe = midPoint*(n/omp_get_num_threads()+1)+
>
> (omp_get_thread_num()-midPoint)*(n/omp_get_num_threads());
> // number of iterations assigned to this thread
> int numIter;
> if(omp_get_thread_num()<=midPoint)
> numIter = n/omp_get_num_threads()+1;
> else
> numIter = n/omp_get_num_threads();
>
> for(int i=itersBeforeMe; i<itersBeforeMe+numIter; i++)
> {}
>
> Other chunk sizes or loop bounds would involve more complex arithmetic
> to
> set up loop bounds but the basic idea is pretty much the same. The
> overall
> cost of the above implementation of "#pragma omp for static(1) nowait"
> should be several ns per iteration. However, I am seeing much higher
> overheads in my experiments.
>
> Greg Bronevetsky
>
> On Fri, 8 Dec 2006, Meadows, Lawrence F wrote:
>
> > What do you mean by "converting to a set of serial loops"
> >
> > -----Original Message-----
> > From: omp-bounces at openmp.org [mailto:omp-bounces at openmp.org] On Behalf
> > Of Greg Bronevetsky
> > Sent: Friday, December 08, 2006 12:48 PM
> > To: omp at openmp.org
> > Subject: [Omp] Overhead of #pragma omp for static nowait
> >
> > I have recently executed the EPCC microbenchmarks on several machines
> > and
> > noticed that there is a consistent overhead of ~1us (~several thousand
> > cycles) for #pragma omp for static nowait and its variants on the
> > platforms I've tried. Given the simplicity of this scheduling policy,
> it
> > seems to me that it should be possible to convert the parallel loop
> into
> > a
> > set of serial loops at compile-time. This would result in a loop that
> > requires no inter-thread communication and costs only a few tens of
> > cycles.
> >
> > What is the reason for this much-higher than expected overhead? Is it
> > just
> > that the above compiler analysis is not typically performed or is
> there
> > a
> > more fundamental reason. Here at LLNL, we have applications that would
> > like to use OpenMP to parallelize loops with ~50 iterations and ~.25us
> > of
> > work per iteration. ~1us overheads for the #pragma omp for static
> nowait
> > make OpenMP too expensive for this task.
> >
> > Greg Bronevetsky
> >
> > _______________________________________________
> > Omp mailing list
> > Omp at openmp.org
> > http://openmp.org/mailman/listinfo/omp
> >
> >
>
> _______________________________________________
> Omp mailing list
> Omp at openmp.org
> http://openmp.org/mailman/listinfo/omp
>
>
-------------- next part --------------
// mark_description "Intel(R) C++ Compiler for Itanium(R)-based applications";
// mark_description "Version 9.1 Build 20060523 %s";
// mark_description "-long_double -Xlinker -rpath -Xlinker /usr/local/intel/compiler91/lib -O3 -openmp -lm -lm -S -o basic_loop.";
// mark_description "O3.s";
//.radix C
.file "basic_loop.c"
.section .text, "xa", "progbits"
.align 64
// -- Begin fubar
.proc fubar#
// Block 0: entry Pred: Succ: 21 -GO
// Freq 1.0e+00
.global fubar#
fubar:
fubar??unw:
{ .mmi
alloc r33=ar.pfs,1,10,8,0 //0: {6:1:basic_loop.c} 234
add sp=-32,sp //0: {6:1} 235
mov r34=b0 //0: {6:1} 2
}
{ .mii
mov r36=gp //0: {6:1} 218
add r3=@ltoff($2$1_2_kmpc_loc_struct_pack$0#),gp ;;//0: {6:1} 5
mov r35=ar.lc //1: {6:1} 3
}
{ .mmi
ld8 r2=[r3] //1: {6:1} 6
nop.m 0
nop.i 0 ;;
}
{ .mii
alloc r31=ar.pfs,1,4,1,0 //2: {6:1} 232
mov r37=r2 //2: {6:1} 7
nop.i 0 ;;
// Block 21: Pred: 0 Succ: 1 -GO
// Freq 1.0e+00
}
{ .mib
nop.m 0
nop.i 0
br.call.sptk b0=__kmpc_global_thread_num# ;;//3: {6:1} 8
// Block 1: Pred: 21 Succ: 2 3 -GO
// Freq 1.0e+00
}
{ .mmi
alloc r31=ar.pfs,1,10,8,0 //5: {6:1} 233
mov gp=r36 //5: {6:1} 211
add r42=40,sp //5: {6:1} 10
}
{ .mmi
add r38=-1,r32 //5: {8:2} 20
add r28=16,sp //5: {8:2} 31
cmp4.le.unc p7,p6=r32,r0 ;; //5: {9:2} 12
}
{ .mmi
st4 [r42]=r8 //6: {6:1} 11
(p6) add r18=24,sp //6: {8:2} 26
(p6) add r37=32,sp //6: {8:2} 21
}
{ .mmi
(p6) add r32=36,sp //6: {8:2} 18
add r30=1,r0 //6: {8:2} 25
(p6) add r17=28,sp ;; //6: {8:2} 23
}
{ .mib
nop.m 0
nop.i 0
// Branch taken probability 0.50
(p7) br.cond.dptk .b1_2 ;; //7: {9:2} 13
// Block 3: Pred: 1 Succ: 4 -GO
// Freq 5.0e-01
}
{ .mmi
st4 [r18]=r30 //0: {8:2} 27
st4 [r37]=r38 //0: {8:2} 22
add r14=@ltoff($2$1_2_kmpc_loc_struct_pack$1#),gp//0: {8:2} 28
}
{ .mmi
mov r44=r8 //0: {8:2} 34
mov r46=r17 //0: {8:2} 36
add r45=34,r0 ;; //0: {8:2} 30
}
{ .mmi
ld8 r41=[r14] //1: {8:2} 29
st4 [r17]=r0 //1: {8:2} 24
mov r47=r32 //1: {8:2} 37
}
{ .mmi
st4 [r32]=r0 //1: {8:2} 19
mov r49=r18 //1: {8:2} 39
mov r48=r37 ;; //1: {8:2} 38
}
{ .mii
st8 [r28]=r30 //2: {8:2} 32
mov r43=r41 //2: {8:2} 33
mov r50=r30 //2: {8:2} 40
}
{ .mmb
nop.m 0
nop.m 0
br.call.sptk b0=__kmpc_for_static_init_4# ;;//2: {8:2} 41
// Block 4: Pred: 3 Succ: 5 6 -GO
// Freq 5.0e-01
}
{ .mmi
ld4 r8=[r37] //4: {8:2} 43
ld4 r40=[r32] //4: {8:2} 42
mov gp=r36 ;; //4: {8:2} 212
}
{ .mii
cmp4.gt.unc p15,p0=r40,r38 //5: {8:2} 44
cmp4.le.unc p14,p0=r8,r38 //5: {8:2} 50
nop.i 0
}
{ .mbb
nop.m 0
// Branch taken probability 0.50
(p15)br.cond.dptk .b1_5 //5: {8:2} 45
// Block 6: Pred: 4 Succ: 7 8 -G
// Freq 2.5e-01
// Branch taken probability 0.50
(p14)br.cond.dptk .b1_7 ;; //5: {8:2} 51
// Block 8: Pred: 6 Succ: 7 -G
// Freq 1.2e-01
}
{ .mii
mov r8=r38 //0: {8:2} 58
nop.i 0
nop.i 0 ;;
// Block 7: Pred: 6 8 Succ: 9 10 -G
// Freq 0.0e+00
}
.b1_7:
{ .mmi
add r30=1,r0 //0: {8:2} 59
add r37=1,r40 //0: {8:2} 68
sxt4 r17=r40 //0: {8:2} 52
}
{ .mmi
mov r38=r40 //0: {8:2} 67
add r10=1,r0 //0: {8:2} 60
sxt4 r16=r8 ;; //0: {8:2} 53
}
{ .mii
sub r15=r16,r17 //1: {8:2} 54
mov r32=r17 //1: {8:2} 71
nop.i 0 ;;
}
{ .mmi
add r39=1,r15 //2: {8:2} 55
nop.m 0
nop.i 0 ;;
}
{ .mii
add r2=-1,r39 //3: {8:2} 74
cmp.lt.unc p6,p0=16,r39 //3: {8:2} 56
cmp.ge.unc p9,p8=r39,r0 //3: {8:2} 132
}
{ .mbb
cmp.gt.unc p12,p0=1,r39 //3: {8:2} 72
// Branch taken probability 0.50
(p6) br.cond.dptk .b1_9 //3: {8:2} 57
// Block 10: Pred: 7 Succ: 5 11 -G
// Freq 0.0e+00
// Branch taken probability 0.50
(p12)br.cond.dptk .b1_5 ;; //0: {8:2} 73
// Block 11: prolog Pred: 10 Succ: 12 -G
// Freq 2.5e+00
}
{ .mii
nop.m 0
mov ar.lc=r2 //0: {0:0} 75
nop.i 0 ;;
// Block 12: lentry Pred: 11 13 Succ: 13 -G
// Freq 5.0e+00
}
.b1_12:
{ .mib
mov r43=r32 //0: {10:3} 76
nop.i 0
br.call.sptk b0=sub# ;; //0: {10:3} 78
// Block 13: lexit ltail Pred: 12 Succ: 12 55 -GO
// Freq 2.5e+00
}
{ .mib
mov gp=r36 //0: {10:3} 213
add r32=1,r32 //0: {8:2} 80
// Branch taken probability 0.99
br.cloop.sptk .b1_12 ;; //0: {8:2} 81
// Block 55: Pred: 13 Succ: 5 -O
// Freq 2.5e-02
}
{ .mib
nop.m 0
nop.i 0
br.cond.sptk .b1_5 ;; //0: {8:2} 236
// Block 9: collapsed Pred: 7 Succ: 14 15 -G
// Freq 0.0e+00
}
.b1_9:
{ .mii
(p9) add r20=0,r0 //0: {8:2} 63
(p8) add r20=1,r0 //0: {8:2} 64
nop.i 0 ;;
}
{ .mmi
add r19=r39,r20 ;; //1: {8:2} 65
nop.m 0
shr r18=r19,1 ;; //2: {8:2} 66
}
{ .mib
cmp.gt.unc p7,p0=1,r18 //3: {8:2} 69
add r25=-1,r18 //3: {8:2} 85
// Branch taken probability 0.50
(p7) br.cond.dptk .b1_14 ;; //3: {8:2} 70
// Block 15: prolog Pred: 9 Succ: 16 -GO
// Freq 1.3e-02
}
{ .mii
mov r32=r10 //0: {8:2} 228
sxt4 r24=r25 ;; //0: {8:2} 86
mov ar.lc=r24 ;; //1: {8:2} 87
// Block 16: lentry Pred: 15 18 Succ: 17 -GO
// Freq 2.5e+00
}
.b1_16:
{ .mii
nop.m 0
nop.i 0
mov r43=r38 //0: {10:3} 88
}
{ .mib
add r38=2,r38 //0: {8:2} 95
add r32=1,r32 //0: {8:2} 96
br.call.sptk b0=sub# ;; //0: {10:3} 89
// Block 17: Pred: 16 Succ: 18 -GO
// Freq 2.5e+00
}
{ .mii
mov gp=r36 //2: {10:3} 214
mov r43=r37 //2: {10:3} 91
add r37=2,r37 //2: {8:2} 94
}
{ .mmb
nop.m 0
nop.m 0
br.call.sptk b0=sub# ;; //2: {10:3} 92
// Block 18: lexit ltail Pred: 17 Succ: 16 19 -GO
// Freq 2.5e+00
}
{ .mib
mov gp=r36 //4: {10:3} 215
nop.i 0
// Branch taken probability 0.99
br.cloop.sptk .b1_16 ;; //4: {8:2} 97
// Block 19: epilog Pred: 18 Succ: 14 -GO
// Freq 5.0e-01
}
{ .mmi
mov r10=r32 ;; //0: {8:2} 231
shladd r26=r10,1,r0 //1: {8:2} 98
nop.i 0 ;;
}
{ .mii
add r30=-1,r26 //2: {8:2} 99
nop.i 0
nop.i 0 ;;
// Block 14: Pred: 9 19 Succ: 5 20 -GO
// Freq 1.0e+00
}
.b1_14:
{ .mii
add r29=r40,r30 //0: {10:3} 100
sxt4 r28=r30 ;; //0: {8:2} 82
add r3=-1,r29 //1: {10:3} 101
}
{ .mib
cmp.lt.unc p10,p13=r39,r28 //1: {8:2} 83
nop.i 0
// Branch taken probability 0.50
(p10)br.cond.dptk .b1_5 ;; //1: {8:2} 84
// Block 20: Pred: 14 Succ: 26 -GO
// Freq 5.0e+00
}
{ .mib
(p13)mov r43=r3 //0: {10:3} 102
nop.i 0
(p13)br.call.dptk b0=sub# ;; //0: {10:3} 103
// Block 26: Pred: 20 Succ: 5 -GO
// Freq 5.0e+00
}
{ .mii
mov gp=r36 //2: {10:3} 216
nop.i 0
nop.i 0 ;;
// Block 5: epilog Pred: 4 55 14 10 26 Succ: 22 -GO
// Freq 0.0e+00
}
.b1_5:
{ .mib
ld4 r44=[r42] //0: {8:2} 46
mov r43=r41 //0: {8:2} 47
br.call.sptk b0=__kmpc_for_static_fini# ;; //0: {8:2} 49
// Block 22: Pred: 5 Succ: 2 -GO
// Freq 0.0e+00
// Block 2: exit Pred: 1 22 Succ: -GO
// Freq 1.0e+00
}
.b1_2:
{ .mii
add sp=32,sp //0: {11:1} 237
mov ar.pfs=r33 ;; //0: {11:1} 15
mov ar.lc=r35 ;; //1: {11:1} 16
}
{ .mib
mov gp=r36 //2: {8:2} 217
mov b0=r34 //2: {11:1} 14
br.ret.sptk.many b0 ;; //2: {11:1} 17
}
.section .IA_64.unwind_info, "a", "progbits"
.align 8
__udt_fubar??unw:
data8 0x1000000000003 // length: 24 bytes
// flags: 0x00
// version: 1
string "\x60\x0c" //R3: prologue size 12
string "\xe0\x01\x02" //P7: mem_stack_f t/off 0x1 size 32
string "\xe6\x00" //P7: pfs_when t/off 0x0
string "\xb1\x21" //P3: pfs_gr r33
string "\xe4\x02" //P7: rp_when t/off 0x2
string "\xb0\xa2" //P3: rp_gr r34
string "\xea\x05" //P7: lc_when t/off 0x5
string "\xb2\xa3" //P3: lc_gr r35
string "\x61\x84\x01" //R3: body size 132
string "\x81" //B1: label_state 1
string "\xc0\x05" //B2: epilog time 5 ecount 0
string "\x00"
.section .IA_64.unwind, "ao", "unwind"
data8 @segrel(fubar??unw#)
data8 @segrel(fubar??unw#+0x300)
data8 @segrel(__udt_fubar??unw)
.section .data, "wa", "progbits"
.align 16
$2$1_2_kmpc_loc_struct_pack$0:
data4.ua 0 // s32
data4.ua 2 // s32
data4.ua 0 // s32
data4.ua 0 // s32
data8.ua $2$1_2__kmpc_loc_pack$0# // p64
.skip 8 // pad
$2$1_2_kmpc_loc_struct_pack$1:
data4.ua 0 // s32
data4.ua 2 // s32
data4.ua 0 // s32
data4.ua 0 // s32
data8.ua $2$1_2__kmpc_loc_pack$1# // p64
$2$1_2__kmpc_loc_pack$0:
data1 59 // s8
data1 47 // s8
data1 103 // s8
data1 47 // s8
data1 103 // s8
data1 49 // s8
data1 53 // s8
data1 47 // s8
data1 98 // s8
data1 114 // s8
data1 111 // s8
data1 110 // s8
data1 101 // s8
data1 118 // s8
data1 101 // s8
data1 116 // s8
data1 47 // s8
data1 111 // s8
data1 112 // s8
data1 101 // s8
data1 110 // s8
data1 109 // s8
data1 112 // s8
data1 98 // s8
data1 101 // s8
data1 110 // s8
data1 99 // s8
data1 104 // s8
data1 95 // s8
data1 67 // s8
data1 95 // s8
data1 118 // s8
data1 50 // s8
data1 47 // s8
data1 116 // s8
data1 104 // s8
data1 117 // s8
data1 110 // s8
data1 100 // s8
data1 101 // s8
data1 114 // s8
data1 95 // s8
data1 114 // s8
data1 117 // s8
data1 110 // s8
data1 115 // s8
data1 47 // s8
data1 98 // s8
data1 97 // s8
data1 115 // s8
data1 105 // s8
data1 99 // s8
data1 95 // s8
data1 108 // s8
data1 111 // s8
data1 111 // s8
data1 112 // s8
data1 46 // s8
data1 99 // s8
data1 59 // s8
data1 102 // s8
data1 117 // s8
data1 98 // s8
data1 97 // s8
data1 114 // s8
data1 59 // s8
data1 54 // s8
data1 59 // s8
data1 54 // s8
data1 59 // s8
data1 59 // s8
.skip 1 // pad
$2$1_2__kmpc_loc_pack$1:
data1 59 // s8
data1 47 // s8
data1 103 // s8
data1 47 // s8
data1 103 // s8
data1 49 // s8
data1 53 // s8
data1 47 // s8
data1 98 // s8
data1 114 // s8
data1 111 // s8
data1 110 // s8
data1 101 // s8
data1 118 // s8
data1 101 // s8
data1 116 // s8
data1 47 // s8
data1 111 // s8
data1 112 // s8
data1 101 // s8
data1 110 // s8
data1 109 // s8
data1 112 // s8
data1 98 // s8
data1 101 // s8
data1 110 // s8
data1 99 // s8
data1 104 // s8
data1 95 // s8
data1 67 // s8
data1 95 // s8
data1 118 // s8
data1 50 // s8
data1 47 // s8
data1 116 // s8
data1 104 // s8
data1 117 // s8
data1 110 // s8
data1 100 // s8
data1 101 // s8
data1 114 // s8
data1 95 // s8
data1 114 // s8
data1 117 // s8
data1 110 // s8
data1 115 // s8
data1 47 // s8
data1 98 // s8
data1 97 // s8
data1 115 // s8
data1 105 // s8
data1 99 // s8
data1 95 // s8
data1 108 // s8
data1 111 // s8
data1 111 // s8
data1 112 // s8
data1 46 // s8
data1 99 // s8
data1 59 // s8
data1 102 // s8
data1 117 // s8
data1 98 // s8
data1 97 // s8
data1 114 // s8
data1 59 // s8
data1 56 // s8
data1 59 // s8
data1 49 // s8
data1 49 // s8
data1 59 // s8
data1 59 // s8
.section .text, "xa", "progbits"
// -- End fubar
.endp fubar#
.type __kmpc_for_static_fini#, at function
.global __kmpc_for_static_fini#
.type __kmpc_for_static_init_4#, at function
.global __kmpc_for_static_init_4#
.type __kmpc_global_thread_num#, at function
.global __kmpc_global_thread_num#
.type sub#, at function
.global sub#
// End
-------------- next part --------------
// mark_description "Intel(R) C++ Compiler for Itanium(R)-based applications";
// mark_description "Version 9.1 Build 20060523 %s";
// mark_description "-long_double -Xlinker -rpath -Xlinker /usr/local/intel/compiler91/lib -O0 -openmp -lm -lm -S -o basic_loop.";
// mark_description "O0.s";
//.radix C
.file "basic_loop.c"
.section .text, "xa", "progbits"
.align 64
// -- Begin fubar
.proc fubar#
// Block 0: entry Pred: Succ: 13 -
// Freq 0.0e+00
.global fubar#
fubar:
fubar??unw:
{ .mmi
alloc r33=ar.pfs,1,26,8,0 //0: {6:1:basic_loop.c} 164
add sp=-192,sp //0: {6:1} 165
mov r35=gp ;; //0: {6:1} 113
}
{ .mii
nop.m 0
mov r34=b0 //1: {6:1} 2
mov r36=r32 //1: {6:1} 3
}
{ .mmi
add r37=144,sp ;; //1: {6:1} 166
st8 [r37]=r36 //2: {6:1} 6
add r38=@ltoff($2$1_2_kmpc_loc_struct_pack$0#),gp ;;//2: {6:1} 7
}
{ .mii
ld8 r39=[r38] //3: {6:1} 8
add r20=64,sp //3: {6:1} 116
nop.i 0 ;;
}
{ .mii
st8 [r20]=r35 //4: {6:1} 117
add r20=72,sp //4: {6:1} 118
nop.i 0 ;;
}
{ .mii
st8 [r20]=r39 //5: {6:1} 119
add r20=72,sp //5: {6:1} 120
nop.i 0 ;;
}
{ .mmi
ld8 r35=[r20] ;; //6: {6:1} 121
mov r59=r35 //7: {6:1} 9
nop.i 0
// Block 13: Pred: 0 Succ: 1 -
// Freq 0.0e+00
}
{ .mib
nop.m 0
nop.i 0
br.call.sptk b0=__kmpc_global_thread_num# ;;//7: {6:1} 10
// Block 1: Pred: 13 Succ: 2 3 -
// Freq 0.0e+00
}
{ .mmi
add r20=64,sp ;; //9: {6:1} 122
ld8 r35=[r20] //10: {6:1} 123
nop.i 0 ;;
}
{ .mii
mov gp=r35 //11: {6:1} 108
mov r36=r8 //11: {6:1} 107
nop.i 0
}
{ .mmi
add r37=24,sp ;; //11: {6:1} 12
st4 [r37]=r36 //12: {6:1} 13
add r38=28,sp ;; //12: {9:7} 14
}
{ .mii
st4 [r38]=r0 //13: {9:7} 15
add r39=28,sp //13: {9:2} 16
nop.i 0 ;;
}
{ .mii
ld4 r40=[r39] //14: {9:2} 17
add r41=144,sp //14: {9:2} 167
nop.i 0 ;;
}
{ .mmi
ld4 r42=[r41] ;; //15: {9:2} 19
cmp4.ge p8,p0=r40,r42 //16: {9:2} 20
add r19=0,r0 ;; //16: {9:2} 124
}
{ .mii
(p8) add r19=1,r0 //17: {9:2} 125
add r20=80,sp //17: {9:2} 126
nop.i 0 ;;
}
{ .mib
st8 [r20]=r19 //18: {9:2} 127
nop.i 0
// Branch taken probability 0.50
(p8) br.cond.dptk .b1_2 ;; //18: {9:2} 21
// Block 3: Pred: 1 Succ: 4 -
// Freq 0.0e+00
}
{ .mmi
add r35=32,sp ;; //0: {8:2} 25
st4 [r35]=r0 //1: {8:2} 26
add r36=144,sp ;; //1: {8:2} 168
}
{ .mmi
ld4 r37=[r36] ;; //2: {8:2} 28
add r38=-1,r37 //3: {8:2} 29
add r39=36,sp ;; //3: {8:2} 30
}
{ .mii
st4 [r39]=r38 //4: {8:2} 31
add r40=144,sp //4: {8:2} 169
nop.i 0 ;;
}
{ .mmi
ld4 r41=[r40] ;; //5: {8:2} 33
add r42=-1,r41 //6: {8:2} 34
add r43=40,sp ;; //6: {8:2} 35
}
{ .mii
st4 [r43]=r42 //7: {8:2} 36
add r44=44,sp //7: {8:2} 37
nop.i 0 ;;
}
{ .mii
st4 [r44]=r0 //8: {8:2} 38
add r45=48,sp //8: {8:2} 39
add r46=1,r0 ;; //8: {8:2} 40
}
{ .mii
st4 [r45]=r46 //9: {8:2} 41
add r47=@ltoff($2$1_2_kmpc_loc_struct_pack$1#),gp//9: {8:2} 42
nop.i 0 ;;
}
{ .mii
ld8 r48=[r47] //10: {8:2} 43
add r49=24,sp //10: {8:2} 44
nop.i 0 ;;
}
{ .mii
ld4 r50=[r49] //11: {8:2} 45
add r51=34,r0 //11: {8:2} 46
add r52=44,sp //11: {8:2} 47
}
{ .mmb
add r53=32,sp //11: {8:2} 48
add r54=36,sp //11: {8:2} 49
nop.b 0 ;;
}
{ .mii
add r55=48,sp //12: {8:2} 50
add r56=1,r0 //12: {8:2} 51
nop.i 0
}
{ .mmb
add r57=16,sp //12: {8:2} 52
add r58=1,r0 //12: {8:2} 53
nop.b 0 ;;
}
{ .mii
st8 [r57]=r58 //13: {8:2} 54
mov r59=r48 //13: {8:2} 55
mov r60=r50 //13: {8:2} 56
}
{ .mmb
mov r61=r51 //13: {8:2} 57
mov r62=r52 //13: {8:2} 58
nop.b 0 ;;
}
{ .mii
mov r63=r53 //14: {8:2} 59
mov r64=r54 //14: {8:2} 60
nop.i 0
}
{ .mmb
mov r65=r55 //14: {8:2} 61
mov r66=r56 //14: {8:2} 62
br.call.sptk b0=__kmpc_for_static_init_4# ;;//14: {8:2} 63
// Block 4: Pred: 3 Succ: 5 6 -
// Freq 0.0e+00
}
{ .mmi
add r20=64,sp ;; //16: {8:2} 128
ld8 r35=[r20] //17: {8:2} 129
nop.i 0 ;;
}
{ .mii
mov gp=r35 //18: {8:2} 109
add r36=32,sp //18: {8:2} 64
nop.i 0 ;;
}
{ .mii
ld4 r37=[r36] //19: {8:2} 65
add r38=36,sp //19: {8:2} 66
nop.i 0 ;;
}
{ .mii
ld4 r39=[r38] //20: {8:2} 67
add r40=48,sp //20: {8:2} 68
nop.i 0 ;;
}
{ .mii
ld4 r41=[r40] //21: {8:2} 69
add r42=40,sp //21: {8:2} 70
nop.i 0 ;;
}
{ .mmi
ld4 r43=[r42] ;; //22: {8:2} 71
cmp4.gt p8,p0=r37,r43 //23: {8:2} 72
add r20=88,sp ;; //23: {8:2} 130
}
{ .mii
st8 [r20]=r37 //24: {8:2} 131
add r20=96,sp //24: {8:2} 132
nop.i 0 ;;
}
{ .mii
st8 [r20]=r39 //25: {8:2} 133
add r19=0,r0 //25: {8:2} 134
nop.i 0 ;;
}
{ .mii
(p8) add r19=1,r0 //26: {8:2} 135
add r20=104,sp //26: {8:2} 136
nop.i 0 ;;
}
{ .mib
st8 [r20]=r19 //27: {8:2} 137
nop.i 0
// Branch taken probability 0.50
(p8) br.cond.dptk .b1_5 ;; //27: {8:2} 73
// Block 6: Pred: 4 Succ: 7 8 -
// Freq 0.0e+00
}
{ .mmi
add r35=40,sp ;; //0: {8:2} 81
ld4 r36=[r35] //1: {8:2} 82
add r20=96,sp ;; //1: {8:2} 138
}
{ .mmi
ld8 r37=[r20] ;; //2: {8:2} 139
cmp4.le p8,p0=r37,r36 //3: {8:2} 83
add r19=0,r0 ;; //3: {8:2} 140
}
{ .mii
(p8) add r19=1,r0 //4: {8:2} 141
add r20=112,sp //4: {8:2} 142
nop.i 0 ;;
}
{ .mib
st8 [r20]=r19 //5: {8:2} 143
nop.i 0
// Branch taken probability 0.50
(p8) br.cond.dptk.many .b1_7 ;; //5: {8:2} 84
// Block 8: Pred: 6 Succ: 7 -
// Freq 0.0e+00
}
{ .mmi
add r35=40,sp ;; //0: {8:2} 91
ld4 r36=[r35] //1: {8:2} 92
add r20=96,sp ;; //1: {8:2} 144
}
{ .mii
st8 [r20]=r36 //2: {8:2} 145
nop.i 0
nop.i 0 ;;
// Block 7: Pred: 6 8 Succ: 9 5 -
// Freq 0.0e+00
}
.b1_7:
{ .mii
add r35=28,sp //0: {8:2} 85
add r20=88,sp //0: {8:2} 146
nop.i 0 ;;
}
{ .mmi
ld8 r36=[r20] ;; //1: {8:2} 147
st4 [r35]=r36 //2: {8:2} 86
add r37=28,sp ;; //2: {8:2} 87
}
{ .mii
ld4 r38=[r37] //3: {8:2} 88
add r20=96,sp //3: {8:2} 148
nop.i 0 ;;
}
{ .mmi
ld8 r39=[r20] ;; //4: {8:2} 149
cmp4.le p8,p0=r38,r39 //5: {8:2} 89
add r19=0,r0 ;; //5: {8:2} 150
}
{ .mii
(p8) add r19=1,r0 //6: {8:2} 151
add r20=120,sp //6: {8:2} 152
nop.i 0 ;;
}
{ .mib
st8 [r20]=r19 //7: {8:2} 153
nop.i 0
// Branch taken probability 0.50
(p8) br.cond.dptk .b1_9 ;; //7: {8:2} 90
// Block 5: Pred: 10 7 4 Succ: 14 -
// Freq 0.0e+00
}
.b1_5:
{ .mmi
add r35=@ltoff($2$1_2_kmpc_loc_struct_pack$1#),gp ;;//0: {8:2} 74
ld8 r36=[r35] //1: {8:2} 75
add r37=24,sp ;; //1: {8:2} 76
}
{ .mii
ld4 r38=[r37] //2: {8:2} 77
mov r59=r36 //2: {8:2} 78
nop.i 0 ;;
}
{ .mib
mov r60=r38 //3: {8:2} 79
nop.i 0
br.call.sptk b0=__kmpc_for_static_fini# ;; //3: {8:2} 80
// Block 14: Pred: 5 Succ: 2 -
// Freq 0.0e+00
}
{ .mmi
add r20=64,sp ;; //5: {8:2} 154
ld8 r35=[r20] //6: {8:2} 155
nop.i 0 ;;
}
{ .mib
mov gp=r35 //7: {8:2} 110
nop.i 0
br.cond.sptk .b1_2 ;; //7: {8:2} 114
// Block 9: Pred: 15 7 Succ: 10 -
// Freq 0.0e+00
}
.b1_9:
{ .mmi
add r35=28,sp ;; //0: {10:3} 93
ld4 r36=[r35] //1: {10:3} 94
nop.i 0 ;;
}
{ .mib
mov r59=r36 //2: {10:3} 95
nop.i 0
br.call.sptk b0=sub# ;; //2: {10:3} 96
// Block 10: Pred: 9 Succ: 5 15 -
// Freq 0.0e+00
}
{ .mmi
add r20=64,sp ;; //4: {10:3} 156
ld8 r35=[r20] //5: {10:3} 157
nop.i 0 ;;
}
{ .mii
mov gp=r35 //6: {10:3} 112
add r36=28,sp //6: {9:23} 98
nop.i 0 ;;
}
{ .mmi
ld4 r37=[r36] ;; //7: {9:23} 99
add r38=1,r37 //8: {9:23} 100
add r39=28,sp ;; //8: {9:23} 101
}
{ .mii
st4 [r39]=r38 //9: {9:23} 102
add r40=28,sp //9: {9:2} 103
nop.i 0 ;;
}
{ .mii
ld4 r41=[r40] //10: {9:2} 104
add r20=96,sp //10: {9:2} 158
nop.i 0 ;;
}
{ .mmi
ld8 r42=[r20] ;; //11: {9:2} 159
cmp4.gt p8,p0=r41,r42 //12: {9:2} 105
add r19=0,r0 ;; //12: {9:2} 160
}
{ .mii
(p8) add r19=1,r0 //13: {9:2} 161
add r20=128,sp //13: {9:2} 162
nop.i 0 ;;
}
{ .mbb
st8 [r20]=r19 //14: {9:2} 163
// Branch taken probability 0.50
(p8) br.cond.dptk .b1_5 //14: {9:2} 106
// Block 15: Pred: 10 Succ: 9 -
// Freq 0.0e+00
br.cond.sptk .b1_9 ;; //14: {9:2} 170
// Block 2: exit Pred: 1 14 Succ: -
// Freq 0.0e+00
}
.b1_2:
{ .mii
nop.m 0
mov b0=r34 ;; //0: {11:1} 22
mov ar.pfs=r33 //1: {11:1} 23
}
{ .mib
add sp=192,sp //1: {11:1} 171
nop.i 0
br.ret.sptk.many b0 ;; //1: {11:1} 24
}
.section .IA_64.unwind_info, "a", "progbits"
.align 8
__udt_fubar??unw:
data8 0x1000000000003 // length: 24 bytes
// flags: 0x00
// version: 1
string "\x60\x15" //R3: prologue size 21
string "\xe0\x01\x0c" //P7: mem_stack_f t/off 0x1 size 192
string "\xe6\x00" //P7: pfs_when t/off 0x0
string "\xb1\x21" //P3: pfs_gr r33
string "\xe4\x04" //P7: rp_when t/off 0x4
string "\xb0\xa2" //P3: rp_gr r34
string "\x61\xc0\x01" //R3: body size 192
string "\x81" //B1: label_state 1
string "\xc0\x02" //B2: epilog time 2 ecount 0
string "\x00\x00\x00\x00\x00"
.section .IA_64.unwind, "ao", "unwind"
data8 @segrel(fubar??unw#)
data8 @segrel(fubar??unw#+0x470)
data8 @segrel(__udt_fubar??unw)
.section .data, "wa", "progbits"
.align 16
$2$1_2_kmpc_loc_struct_pack$0:
data4.ua 0 // s32
data4.ua 2 // s32
data4.ua 0 // s32
data4.ua 0 // s32
data8.ua $2$1_2__kmpc_loc_pack$0# // p64
.skip 8 // pad
$2$1_2_kmpc_loc_struct_pack$1:
data4.ua 0 // s32
data4.ua 2 // s32
data4.ua 0 // s32
data4.ua 0 // s32
data8.ua $2$1_2__kmpc_loc_pack$1# // p64
$2$1_2__kmpc_loc_pack$0:
data1 59 // s8
data1 47 // s8
data1 103 // s8
data1 47 // s8
data1 103 // s8
data1 49 // s8
data1 53 // s8
data1 47 // s8
data1 98 // s8
data1 114 // s8
data1 111 // s8
data1 110 // s8
data1 101 // s8
data1 118 // s8
data1 101 // s8
data1 116 // s8
data1 47 // s8
data1 111 // s8
data1 112 // s8
data1 101 // s8
data1 110 // s8
data1 109 // s8
data1 112 // s8
data1 98 // s8
data1 101 // s8
data1 110 // s8
data1 99 // s8
data1 104 // s8
data1 95 // s8
data1 67 // s8
data1 95 // s8
data1 118 // s8
data1 50 // s8
data1 47 // s8
data1 116 // s8
data1 104 // s8
data1 117 // s8
data1 110 // s8
data1 100 // s8
data1 101 // s8
data1 114 // s8
data1 95 // s8
data1 114 // s8
data1 117 // s8
data1 110 // s8
data1 115 // s8
data1 47 // s8
data1 98 // s8
data1 97 // s8
data1 115 // s8
data1 105 // s8
data1 99 // s8
data1 95 // s8
data1 108 // s8
data1 111 // s8
data1 111 // s8
data1 112 // s8
data1 46 // s8
data1 99 // s8
data1 59 // s8
data1 102 // s8
data1 117 // s8
data1 98 // s8
data1 97 // s8
data1 114 // s8
data1 59 // s8
data1 54 // s8
data1 59 // s8
data1 54 // s8
data1 59 // s8
data1 59 // s8
.skip 1 // pad
$2$1_2__kmpc_loc_pack$1:
data1 59 // s8
data1 47 // s8
data1 103 // s8
data1 47 // s8
data1 103 // s8
data1 49 // s8
data1 53 // s8
data1 47 // s8
data1 98 // s8
data1 114 // s8
data1 111 // s8
data1 110 // s8
data1 101 // s8
data1 118 // s8
data1 101 // s8
data1 116 // s8
data1 47 // s8
data1 111 // s8
data1 112 // s8
data1 101 // s8
data1 110 // s8
data1 109 // s8
data1 112 // s8
data1 98 // s8
data1 101 // s8
data1 110 // s8
data1 99 // s8
data1 104 // s8
data1 95 // s8
data1 67 // s8
data1 95 // s8
data1 118 // s8
data1 50 // s8
data1 47 // s8
data1 116 // s8
data1 104 // s8
data1 117 // s8
data1 110 // s8
data1 100 // s8
data1 101 // s8
data1 114 // s8
data1 95 // s8
data1 114 // s8
data1 117 // s8
data1 110 // s8
data1 115 // s8
data1 47 // s8
data1 98 // s8
data1 97 // s8
data1 115 // s8
data1 105 // s8
data1 99 // s8
data1 95 // s8
data1 108 // s8
data1 111 // s8
data1 111 // s8
data1 112 // s8
data1 46 // s8
data1 99 // s8
data1 59 // s8
data1 102 // s8
data1 117 // s8
data1 98 // s8
data1 97 // s8
data1 114 // s8
data1 59 // s8
data1 56 // s8
data1 59 // s8
data1 49 // s8
data1 49 // s8
data1 59 // s8
data1 59 // s8
.section .text, "xa", "progbits"
// -- End fubar
.endp fubar#
.type __kmpc_for_static_fini#, at function
.global __kmpc_for_static_fini#
.type __kmpc_for_static_init_4#, at function
.global __kmpc_for_static_init_4#
.type __kmpc_global_thread_num#, at function
.global __kmpc_global_thread_num#
.type sub#, at function
.global sub#
// End
-------------- next part --------------
# -- Machine type IA32
# mark_description "Intel(R) C++ Compiler for 32-bit applications, Version 9.1 Build 20060519Z %s";
# mark_description "-long_double -Xlinker -rpath -Xlinker /usr/local/intel/compiler91/lib -no-ipo -S -O3 -openmp -lm -lm";
.ident "Intel(R) C++ Compiler for 32-bit applications, Version 9.1 Build 20060519Z %s"
.ident "-long_double -Xlinker -rpath -Xlinker /usr/local/intel/compiler91/lib -no-ipo -S -O3 -openmp -lm -lm"
.file "basic_loop.c"
.text
# -- Begin fubar
# mark_begin;
.align 2,0x90
.globl fubar
fubar:
# parameter 1: 32 + %esp
..B1.1: # Preds ..B1.0
pushl %edi #6.1
pushl %esi #6.1
pushl %ebx #6.1
subl $16, %esp #6.1
movl 32(%esp), %edi #5.1
pushl $.2.1_2_kmpc_loc_struct_pack.0 #6.1
call __kmpc_global_thread_num #6.1
# LOE eax ebp esi edi
..B1.16: # Preds ..B1.1
popl %ecx #6.1
movl %eax, %ebx #6.1
# LOE ebx ebp esi edi
..B1.2: # Preds ..B1.16
testl %edi, %edi #9.2
jle ..B1.13 # Prob 10% #9.2
# LOE ebx ebp esi edi
..B1.3: # Preds ..B1.2
lea 8(%esp), %eax #8.2
xorl %edx, %edx #8.2
movl %edx, 4(%esp) #8.2
movl %edx, (%esp) #8.2
lea 12(%esp), %edx #8.2
lea -1(%edi), %edi #8.2
movl %edi, 8(%esp) #8.2
movl $1, %ecx #8.2
movl %ecx, 12(%esp) #8.2
pushl %ecx #8.2
pushl %ecx #8.2
pushl %edx #8.2
pushl %eax #8.2
lea 20(%esp), %edx #8.2
pushl %edx #8.2
lea 20(%esp), %edx #8.2
pushl %edx #8.2
pushl $34 #8.2
pushl %ebx #8.2
pushl $.2.1_2_kmpc_loc_struct_pack.1 #8.2
call __kmpc_for_static_init_4 #8.2
# LOE ebx ebp esi edi
..B1.17: # Preds ..B1.3
addl $36, %esp #8.2
# LOE ebx ebp esi edi
..B1.4: # Preds ..B1.17
movl 4(%esp), %edx #8.2
movl 8(%esp), %eax #8.2
cmpl %edi, %edx #8.2
jg ..B1.12 # Prob 50% #8.2
# LOE eax edx ebx ebp esi edi
..B1.5: # Preds ..B1.4
cmpl %edi, %eax #8.2
jle ..B1.7 # Prob 50% #8.2
# LOE eax edx ebx ebp esi edi
..B1.6: # Preds ..B1.5
movl %edi, %eax #8.2
# LOE eax edx ebx ebp esi
..B1.7: # Preds ..B1.6 ..B1.5
cmpl %eax, %edx #8.2
jg ..B1.12 # Prob 50% #8.2
# LOE eax edx ebx ebp esi
..B1.8: # Preds ..B1.7
movl %eax, %esi #
movl %edx, %edi #
# LOE ebx ebp esi edi
..B1.9: # Preds ..B1.10 ..B1.8
pushl %edi #10.3
call sub #10.3
# LOE ebx ebp esi edi
..B1.18: # Preds ..B1.9
popl %ecx #10.3
# LOE ebx ebp esi edi
..B1.10: # Preds ..B1.18
addl $1, %edi #9.23
cmpl %esi, %edi #9.2
jle ..B1.9 # Prob 82% #9.2
# LOE ebx ebp esi edi
..B1.12: # Preds ..B1.10 ..B1.7 ..B1.4
pushl %ebx #6.1
pushl $.2.1_2_kmpc_loc_struct_pack.1 #6.1
call __kmpc_for_static_fini #8.2
# LOE ebp esi
..B1.19: # Preds ..B1.12
addl $8, %esp #8.2
# LOE ebp esi
..B1.13: # Preds ..B1.19 ..B1.2
addl $16, %esp #11.1
popl %ebx #11.1
popl %esi #11.1
popl %edi #11.1
ret #11.1
.align 2,0x90
# LOE
# mark_end;
.type fubar, at function
.size fubar,.-fubar
.data
.align 4
.align 4
.2.1_2_kmpc_loc_struct_pack.0:
.long 0
.long 2
.long 0
.long 0
.long .2.1_2__kmpc_loc_pack.0
.2.1_2__kmpc_loc_pack.0:
.byte 59
.byte 47
.byte 103
.byte 47
.byte 103
.byte 49
.byte 53
.byte 47
.byte 98
.byte 114
.byte 111
.byte 110
.byte 101
.byte 118
.byte 101
.byte 116
.byte 47
.byte 111
.byte 112
.byte 101
.byte 110
.byte 109
.byte 112
.byte 98
.byte 101
.byte 110
.byte 99
.byte 104
.byte 95
.byte 67
.byte 95
.byte 118
.byte 50
.byte 47
.byte 109
.byte 99
.byte 114
.byte 95
.byte 114
.byte 117
.byte 110
.byte 115
.byte 47
.byte 98
.byte 97
.byte 115
.byte 105
.byte 99
.byte 95
.byte 108
.byte 111
.byte 111
.byte 112
.byte 46
.byte 99
.byte 59
.byte 102
.byte 117
.byte 98
.byte 97
.byte 114
.byte 59
.byte 54
.byte 59
.byte 54
.byte 59
.byte 59
.space 1 # pad
.2.1_2_kmpc_loc_struct_pack.1:
.long 0
.long 2
.long 0
.long 0
.long .2.1_2__kmpc_loc_pack.1
.2.1_2__kmpc_loc_pack.1:
.byte 59
.byte 47
.byte 103
.byte 47
.byte 103
.byte 49
.byte 53
.byte 47
.byte 98
.byte 114
.byte 111
.byte 110
.byte 101
.byte 118
.byte 101
.byte 116
.byte 47
.byte 111
.byte 112
.byte 101
.byte 110
.byte 109
.byte 112
.byte 98
.byte 101
.byte 110
.byte 99
.byte 104
.byte 95
.byte 67
.byte 95
.byte 118
.byte 50
.byte 47
.byte 109
.byte 99
.byte 114
.byte 95
.byte 114
.byte 117
.byte 110
.byte 115
.byte 47
.byte 98
.byte 97
.byte 115
.byte 105
.byte 99
.byte 95
.byte 108
.byte 111
.byte 111
.byte 112
.byte 46
.byte 99
.byte 59
.byte 102
.byte 117
.byte 98
.byte 97
.byte 114
.byte 59
.byte 56
.byte 59
.byte 49
.byte 49
.byte 59
.byte 59
.data
# -- End fubar
.data
.section .note.GNU-stack, ""
# End
-------------- next part --------------
# -- Machine type IA32
# mark_description "Intel(R) C++ Compiler for 32-bit applications, Version 9.1 Build 20060519Z %s";
# mark_description "-long_double -Xlinker -rpath -Xlinker /usr/local/intel/compiler91/lib -no-ipo -S -O0 -openmp -lm -lm";
.ident "Intel(R) C++ Compiler for 32-bit applications, Version 9.1 Build 20060519Z %s"
.ident "-long_double -Xlinker -rpath -Xlinker /usr/local/intel/compiler91/lib -no-ipo -S -O0 -openmp -lm -lm"
.file "basic_loop.c"
.data
.text
# -- Begin fubar
# mark_begin;
.align 2,0x90
.globl fubar
fubar:
# parameter 1: 8 + %ebp
..B1.1: # Preds ..B1.0
pushl %ebp #6.1
movl %esp, %ebp #6.1
subl $40, %esp #6.1
pushl %edi #6.1
movl $.2.1_2_kmpc_loc_struct_pack.0, (%esp) #6.1
call __kmpc_global_thread_num #6.1
# LOE eax
..B1.15: # Preds ..B1.1
popl %ecx #6.1
movl %eax, -8(%ebp) #6.1
# LOE
..B1.2: # Preds ..B1.15
movl -8(%ebp), %eax #6.1
movl %eax, -36(%ebp) #6.1
movl $0, -16(%ebp) #9.7
movl -16(%ebp), %eax #9.14
movl 8(%ebp), %edx #9.18
cmpl %edx, %eax #9.2
jge ..B1.12 # Prob 50% #9.2
# LOE
..B1.3: # Preds ..B1.2
xorl %eax, %eax #8.2
movl %eax, -28(%ebp) #8.2
movl 8(%ebp), %edx #9.18
decl %edx #8.2
movl %edx, -24(%ebp) #8.2
movl 8(%ebp), %edx #9.18
decl %edx #8.2
movl %edx, -12(%ebp) #8.2
movl %eax, -32(%ebp) #8.2
movl $1, %eax #8.2
movl %eax, -20(%ebp) #8.2
addl $-36, %esp #8.2
movl $.2.1_2_kmpc_loc_struct_pack.1, (%esp) #8.2
movl -36(%ebp), %edx #8.2
movl %edx, 4(%esp) #8.2
movl $34, 8(%esp) #8.2
lea -32(%ebp), %edx #8.2
movl %edx, 12(%esp) #8.2
lea -28(%ebp), %edx #8.2
movl %edx, 16(%esp) #8.2
lea -24(%ebp), %edx #8.2
movl %edx, 20(%esp) #8.2
lea -20(%ebp), %edx #8.2
movl %edx, 24(%esp) #8.2
movl %eax, 28(%esp) #9.23
movl %eax, 32(%esp) #8.2
call __kmpc_for_static_init_4 #8.2
# LOE
..B1.16: # Preds ..B1.3
addl $36, %esp #8.2
# LOE
..B1.4: # Preds ..B1.16
movl -28(%ebp), %eax #8.2
movl %eax, -40(%ebp) #8.2
movl -24(%ebp), %edx #8.2
movl %edx, -4(%ebp) #8.2
movl -12(%ebp), %edx #8.2
cmpl %edx, %eax #8.2
jg ..B1.8 # Prob 50% #8.2
# LOE
..B1.5: # Preds ..B1.4
movl -12(%ebp), %eax #8.2
movl -4(%ebp), %edx #8.2
cmpl %eax, %edx #8.2
jle ..B1.7 # Prob 50% #8.2
# LOE
..B1.6: # Preds ..B1.5
movl -12(%ebp), %eax #8.2
movl %eax, -4(%ebp) #8.2
# LOE
..B1.7: # Preds ..B1.6 ..B1.5
movl -4(%ebp), %eax #8.2
movl -40(%ebp), %edx #8.2
movl %edx, -16(%ebp) #8.2
movl -16(%ebp), %edx #9.14
cmpl %eax, %edx #8.2
jle ..B1.10 # Prob 50% #8.2
# LOE
..B1.8: # Preds ..B1.11 ..B1.7 ..B1.4
addl $-8, %esp #8.2
movl $.2.1_2_kmpc_loc_struct_pack.1, (%esp) #8.2
movl -36(%ebp), %eax #8.2
movl %eax, 4(%esp) #8.2
call __kmpc_for_static_fini #8.2
# LOE
..B1.17: # Preds ..B1.8
addl $8, %esp #8.2
jmp ..B1.12 # Prob 100% #8.2
# LOE
..B1.10: # Preds ..B1.7 ..B1.11
pushl %edi #10.3
movl -16(%ebp), %eax #10.7
movl %eax, (%esp) #10.7
call sub #10.3
# LOE
..B1.18: # Preds ..B1.10
popl %ecx #10.3
# LOE
..B1.11: # Preds ..B1.18
movl -4(%ebp), %eax #9.23
incl -16(%ebp) #9.23
movl -16(%ebp), %edx #9.14
cmpl %eax, %edx #9.2
jle ..B1.10 # Prob 50% #9.2
jmp ..B1.8 # Prob 100% #9.2
# LOE
..B1.12: # Preds ..B1.17 ..B1.2
leave #11.1
ret #11.1
.align 2,0x90
# LOE
# mark_end;
.type fubar, at function
.size fubar,.-fubar
.data
.align 4
.align 4
.2.1_2_kmpc_loc_struct_pack.0:
.long 0
.long 2
.long 0
.long 0
.long .2.1_2__kmpc_loc_pack.0
.2.1_2__kmpc_loc_pack.0:
.byte 59
.byte 47
.byte 103
.byte 47
.byte 103
.byte 49
.byte 53
.byte 47
.byte 98
.byte 114
.byte 111
.byte 110
.byte 101
.byte 118
.byte 101
.byte 116
.byte 47
.byte 111
.byte 112
.byte 101
.byte 110
.byte 109
.byte 112
.byte 98
.byte 101
.byte 110
.byte 99
.byte 104
.byte 95
.byte 67
.byte 95
.byte 118
.byte 50
.byte 47
.byte 109
.byte 99
.byte 114
.byte 95
.byte 114
.byte 117
.byte 110
.byte 115
.byte 47
.byte 98
.byte 97
.byte 115
.byte 105
.byte 99
.byte 95
.byte 108
.byte 111
.byte 111
.byte 112
.byte 46
.byte 99
.byte 59
.byte 102
.byte 117
.byte 98
.byte 97
.byte 114
.byte 59
.byte 54
.byte 59
.byte 54
.byte 59
.byte 59
.space 1 # pad
.2.1_2_kmpc_loc_struct_pack.1:
.long 0
.long 2
.long 0
.long 0
.long .2.1_2__kmpc_loc_pack.1
.2.1_2__kmpc_loc_pack.1:
.byte 59
.byte 47
.byte 103
.byte 47
.byte 103
.byte 49
.byte 53
.byte 47
.byte 98
.byte 114
.byte 111
.byte 110
.byte 101
.byte 118
.byte 101
.byte 116
.byte 47
.byte 111
.byte 112
.byte 101
.byte 110
.byte 109
.byte 112
.byte 98
.byte 101
.byte 110
.byte 99
.byte 104
.byte 95
.byte 67
.byte 95
.byte 118
.byte 50
.byte 47
.byte 109
.byte 99
.byte 114
.byte 95
.byte 114
.byte 117
.byte 110
.byte 115
.byte 47
.byte 98
.byte 97
.byte 115
.byte 105
.byte 99
.byte 95
.byte 108
.byte 111
.byte 111
.byte 112
.byte 46
.byte 99
.byte 59
.byte 102
.byte 117
.byte 98
.byte 97
.byte 114
.byte 59
.byte 56
.byte 59
.byte 49
.byte 49
.byte 59
.byte 59
.data
# -- End fubar
.data
.section .note.GNU-stack, ""
# End
More information about the Omp
mailing list