[Omp] Overhead of #pragma omp for static nowait

Greg Bronevetsky greg at bronevetsky.com
Sun Dec 10 02:19:20 PST 2006


I'm getting my numbers from the EPCC microbenchmarks and the overhead
numbers provided by the scheduling microbenchmark are the following:
let seq = time of the following loop:
  for(i=0; i<n; i++) { delay(); }
let par = time of following loop:
  int num_threads=omp_num_threads();
  #pragma omp for
  for(i=0; i<n*num_threads; i++) { delay(); }
overhead = par-seq

One interesting phenomenon that I've noticed with this test is that on the
IA32 machine (I haven't tried the IA64 yet) and an IBM machine the loop
overhead is static for upto a certain number of iterations per thread
(this number is different for different chunk sizes). For larger numbers
iterations per thread the overhead then rises linearly with the number of
iterations. From this it is possible to compute the base overhead of
"#pragma omp for schedule(static) nowait" as well as the per-iteration
overhead. From these (admittedly crude) calculations, it appears that on
both machines the base overhead is on the order of 1-2us while the
per-iteration overhead is a few ns. Thus, the mystery overhead seems to be
incurred at loop start-up or termination.

I've looked at the assembly for your example code from Intel 9.1 for IA32
and IA64 and while I don't really understand what it is doing, one thing
I've noticed is that while both versions call __kmpc_global_thread_num
before entering the parallel loop, neither version calls a function to get
the total number of threads. Since this seems to be necessary to compute
the iteration schedule at compile-time, this implies that the generated
code is doing something else. However, when you look at the code generated
by the IA32 compiler with optimizations set to -O0, the code is quite
small (<100 assembly instructions), implying that whatever the compiler is
doing it is probably rather simple. I'm attaching the assembly files for
both -O0 and -O3 on both IA32 and IA64. Please tell me if you can derive
more from it than I can.

                      Greg Bronevetsky

> Intel's compiler does something close to that except that the
> for bounds for each chunk are computed by calling library calls
> rather than inline. I'm sure you can determine the code gen
> strategy for other platforms by inspecting the assembly code.
> Normally something like "icc -S -openmp foo.c" will give you
> what you are looking for. Try this simple code with an Intel64
> compiler and look at the assembly:
> 
> fubar(int n)
> {
>     int i;
> #pragma omp for schedule(static)
>     for (i = 0; i < n; ++i)
>         sub(i);
> }
> 
> I find it hard to believe that the overhead of calling a
> couple of functions really accounts for two orders of magnitiude
> performance difference. Perhaps there is a problem with your
> analysis?
> 
> -----Original Message-----
> From: omp-bounces at openmp.org [mailto:omp-bounces at openmp.org] On Behalf
> Of Greg Bronevetsky
> Sent: Friday, December 08, 2006 1:59 PM
> To: omp at openmp.org
> Subject: RE: [Omp] Overhead of #pragma omp for static nowait
> 
> I mean the following compiler transformation:
>   #pragma omp for static(1) nowait
>   for(int i=0; i<n; i++){}
> should become:
>   for(int i=omp_get_thread_num(); i<n; i+=omp_get_num_threads())
>   {}
> 
> and
>   #pragma omp for static nowait
>   for(int i=0; i<n; i++){}
> should become:
>   // the id of the last thread that gets 1 more iteration than others
>   int midPoint=n%omp_get_num_threads();
>   // number of iterations assigned to threads with smaller ids
>   int itersBeforeMe;
>   if(omp_get_thread_num()<=midPoint)
>      itersBeforeMe = omp_get_thread_num()*(n/omp_get_num_threads()+1);
>   else
>      itersBeforeMe = midPoint*(n/omp_get_num_threads()+1)+
>  
> (omp_get_thread_num()-midPoint)*(n/omp_get_num_threads());
>   // number of iterations assigned to this thread
>   int numIter;
>   if(omp_get_thread_num()<=midPoint)
>      numIter = n/omp_get_num_threads()+1;
>   else
>      numIter = n/omp_get_num_threads();
> 
>   for(int i=itersBeforeMe; i<itersBeforeMe+numIter; i++)
>   {}
> 
> Other chunk sizes or loop bounds would involve more complex arithmetic
> to
> set up loop bounds but the basic idea is pretty much the same. The
> overall
> cost of the above implementation of "#pragma omp for static(1) nowait"
> should be several ns per iteration. However, I am seeing much higher
> overheads in my experiments.
> 
>                              Greg Bronevetsky
> 
> On Fri, 8 Dec 2006, Meadows, Lawrence F wrote:
> 
> > What do you mean by "converting to a set of serial loops" 
> > 
> > -----Original Message-----
> > From: omp-bounces at openmp.org [mailto:omp-bounces at openmp.org] On Behalf
> > Of Greg Bronevetsky
> > Sent: Friday, December 08, 2006 12:48 PM
> > To: omp at openmp.org
> > Subject: [Omp] Overhead of #pragma omp for static nowait
> > 
> > I have recently executed the EPCC microbenchmarks on several machines
> > and
> > noticed that there is a consistent overhead of ~1us (~several thousand
> > cycles) for #pragma omp for static nowait and its variants on the
> > platforms I've tried. Given the simplicity of this scheduling policy,
> it
> > seems to me that it should be possible to convert the parallel loop
> into
> > a
> > set of serial loops at compile-time. This would result in a loop that
> > requires no inter-thread communication and costs only a few tens of
> > cycles. 
> > 
> > What is the reason for this much-higher than expected overhead? Is it
> > just
> > that the above compiler analysis is not typically performed or is
> there
> > a
> > more fundamental reason. Here at LLNL, we have applications that would
> > like to use OpenMP to parallelize loops with ~50 iterations and ~.25us
> > of
> > work per iteration. ~1us overheads for the #pragma omp for static
> nowait
> > make OpenMP too expensive for this task.
> > 
> >                              Greg Bronevetsky
> > 
> > _______________________________________________
> > Omp mailing list
> > Omp at openmp.org
> > http://openmp.org/mailman/listinfo/omp
> > 
> > 
> 
> _______________________________________________
> Omp mailing list
> Omp at openmp.org
> http://openmp.org/mailman/listinfo/omp
> 
> 





-------------- next part --------------
// mark_description "Intel(R) C++ Compiler for Itanium(R)-based applications";
// mark_description "Version 9.1    Build 20060523 %s";
// mark_description "-long_double -Xlinker -rpath -Xlinker /usr/local/intel/compiler91/lib -O3 -openmp -lm -lm -S -o basic_loop.";
// mark_description "O3.s";
	//.radix C
	.file "basic_loop.c"
	.section .text, "xa", "progbits"
	.align 64
// -- Begin fubar
	.proc fubar#
// Block 0: entry  Pred:     Succ: 21  -GO
// Freq 1.0e+00
	.global fubar#

fubar:
fubar??unw:
 {   .mmi
	alloc	r33=ar.pfs,1,10,8,0			//0: {6:1:basic_loop.c} 234
	add	sp=-32,sp				//0: {6:1} 235
	mov	r34=b0					//0: {6:1} 2
 }
 {   .mii
	mov	r36=gp					//0: {6:1} 218
	add	r3=@ltoff($2$1_2_kmpc_loc_struct_pack$0#),gp ;;//0: {6:1} 5
	mov	r35=ar.lc				//1: {6:1} 3
 }
 {   .mmi
	ld8	r2=[r3]					//1: {6:1} 6
	nop.m	0
	nop.i	0 ;;
 }
 {   .mii
	alloc	r31=ar.pfs,1,4,1,0			//2: {6:1} 232
	mov	r37=r2					//2: {6:1} 7
	nop.i	0 ;;
// Block 21:  Pred: 0     Succ: 1  -GO
// Freq 1.0e+00
 }
 {   .mib
	nop.m	0
	nop.i	0
	br.call.sptk	b0=__kmpc_global_thread_num# ;;//3: {6:1} 8
// Block 1:  Pred: 21     Succ: 2 3  -GO
// Freq 1.0e+00
 }
 {   .mmi
	alloc	r31=ar.pfs,1,10,8,0			//5: {6:1} 233
	mov	gp=r36					//5: {6:1} 211
	add	r42=40,sp				//5: {6:1} 10
 }
 {   .mmi
	add	r38=-1,r32				//5: {8:2} 20
	add	r28=16,sp				//5: {8:2} 31
	cmp4.le.unc	p7,p6=r32,r0 ;;			//5: {9:2} 12
 }
 {   .mmi
	st4	[r42]=r8				//6: {6:1} 11
  (p6)	add	r18=24,sp				//6: {8:2} 26
  (p6)	add	r37=32,sp				//6: {8:2} 21
 }
 {   .mmi
  (p6)	add	r32=36,sp				//6: {8:2} 18
	add	r30=1,r0				//6: {8:2} 25
  (p6)	add	r17=28,sp ;;				//6: {8:2} 23
 }
 {   .mib
	nop.m	0
	nop.i	0
// Branch taken probability 0.50
  (p7)	br.cond.dptk	.b1_2 ;;			//7: {9:2} 13
// Block 3:  Pred: 1     Succ: 4  -GO
// Freq 5.0e-01
 }
 {   .mmi
	st4	[r18]=r30				//0: {8:2} 27
	st4	[r37]=r38				//0: {8:2} 22
	add	r14=@ltoff($2$1_2_kmpc_loc_struct_pack$1#),gp//0: {8:2} 28
 }
 {   .mmi
	mov	r44=r8					//0: {8:2} 34
	mov	r46=r17					//0: {8:2} 36
	add	r45=34,r0 ;;				//0: {8:2} 30
 }
 {   .mmi
	ld8	r41=[r14]				//1: {8:2} 29
	st4	[r17]=r0				//1: {8:2} 24
	mov	r47=r32					//1: {8:2} 37
 }
 {   .mmi
	st4	[r32]=r0				//1: {8:2} 19
	mov	r49=r18					//1: {8:2} 39
	mov	r48=r37 ;;				//1: {8:2} 38
 }
 {   .mii
	st8	[r28]=r30				//2: {8:2} 32
	mov	r43=r41					//2: {8:2} 33
	mov	r50=r30					//2: {8:2} 40
 }
 {   .mmb
	nop.m	0
	nop.m	0
	br.call.sptk	b0=__kmpc_for_static_init_4# ;;//2: {8:2} 41
// Block 4:  Pred: 3     Succ: 5 6  -GO
// Freq 5.0e-01
 }
 {   .mmi
	ld4	r8=[r37]				//4: {8:2} 43
	ld4	r40=[r32]				//4: {8:2} 42
	mov	gp=r36 ;;				//4: {8:2} 212
 }
 {   .mii
	cmp4.gt.unc	p15,p0=r40,r38			//5: {8:2} 44
	cmp4.le.unc	p14,p0=r8,r38			//5: {8:2} 50
	nop.i	0
 }
 {   .mbb
	nop.m	0
// Branch taken probability 0.50
  (p15)br.cond.dptk	.b1_5				//5: {8:2} 45
// Block 6:  Pred: 4     Succ: 7 8  -G
// Freq 2.5e-01
// Branch taken probability 0.50
  (p14)br.cond.dptk	.b1_7 ;;			//5: {8:2} 51
// Block 8:  Pred: 6     Succ: 7  -G
// Freq 1.2e-01
 }
 {   .mii
	mov	r8=r38					//0: {8:2} 58
	nop.i	0
	nop.i	0 ;;
// Block 7:  Pred: 6 8     Succ: 9 10  -G
// Freq 0.0e+00
 }
.b1_7: 
 {   .mmi
	add	r30=1,r0				//0: {8:2} 59
	add	r37=1,r40				//0: {8:2} 68
	sxt4	r17=r40					//0: {8:2} 52
 }
 {   .mmi
	mov	r38=r40					//0: {8:2} 67
	add	r10=1,r0				//0: {8:2} 60
	sxt4	r16=r8 ;;				//0: {8:2} 53
 }
 {   .mii
	sub	r15=r16,r17				//1: {8:2} 54
	mov	r32=r17					//1: {8:2} 71
	nop.i	0 ;;
 }
 {   .mmi
	add	r39=1,r15				//2: {8:2} 55
	nop.m	0
	nop.i	0 ;;
 }
 {   .mii
	add	r2=-1,r39				//3: {8:2} 74
	cmp.lt.unc	p6,p0=16,r39			//3: {8:2} 56
	cmp.ge.unc	p9,p8=r39,r0			//3: {8:2} 132
 }
 {   .mbb
	cmp.gt.unc	p12,p0=1,r39			//3: {8:2} 72
// Branch taken probability 0.50
  (p6)	br.cond.dptk	.b1_9				//3: {8:2} 57
// Block 10:  Pred: 7     Succ: 5 11  -G
// Freq 0.0e+00
// Branch taken probability 0.50
  (p12)br.cond.dptk	.b1_5 ;;			//0: {8:2} 73
// Block 11: prolog  Pred: 10     Succ: 12  -G
// Freq 2.5e+00
 }
 {   .mii
	nop.m	0
	mov	ar.lc=r2				//0: {0:0} 75
	nop.i	0 ;;
// Block 12: lentry  Pred: 11 13     Succ: 13  -G
// Freq 5.0e+00
 }
.b1_12: 
 {   .mib
	mov	r43=r32					//0: {10:3} 76
	nop.i	0
	br.call.sptk	b0=sub# ;;			//0: {10:3} 78
// Block 13: lexit ltail  Pred: 12     Succ: 12 55  -GO
// Freq 2.5e+00
 }
 {   .mib
	mov	gp=r36					//0: {10:3} 213
	add	r32=1,r32				//0: {8:2} 80
// Branch taken probability 0.99
	br.cloop.sptk	.b1_12 ;;			//0: {8:2} 81
// Block 55:  Pred: 13     Succ: 5  -O
// Freq 2.5e-02
 }
 {   .mib
	nop.m	0
	nop.i	0
	br.cond.sptk	.b1_5 ;;			//0: {8:2} 236
// Block 9: collapsed  Pred: 7     Succ: 14 15  -G
// Freq 0.0e+00
 }
.b1_9: 
 {   .mii
  (p9)	add	r20=0,r0				//0: {8:2} 63
  (p8)	add	r20=1,r0				//0: {8:2} 64
	nop.i	0 ;;
 }
 {   .mmi
	add	r19=r39,r20 ;;				//1: {8:2} 65
	nop.m	0
	shr	r18=r19,1 ;;				//2: {8:2} 66
 }
 {   .mib
	cmp.gt.unc	p7,p0=1,r18			//3: {8:2} 69
	add	r25=-1,r18				//3: {8:2} 85
// Branch taken probability 0.50
  (p7)	br.cond.dptk	.b1_14 ;;			//3: {8:2} 70
// Block 15: prolog  Pred: 9     Succ: 16  -GO
// Freq 1.3e-02
 }
 {   .mii
	mov	r32=r10					//0: {8:2} 228
	sxt4	r24=r25 ;;				//0: {8:2} 86
	mov	ar.lc=r24 ;;				//1: {8:2} 87
// Block 16: lentry  Pred: 15 18     Succ: 17  -GO
// Freq 2.5e+00
 }
.b1_16: 
 {   .mii
	nop.m	0
	nop.i	0
	mov	r43=r38					//0: {10:3} 88
 }
 {   .mib
	add	r38=2,r38				//0: {8:2} 95
	add	r32=1,r32				//0: {8:2} 96
	br.call.sptk	b0=sub# ;;			//0: {10:3} 89
// Block 17:  Pred: 16     Succ: 18  -GO
// Freq 2.5e+00
 }
 {   .mii
	mov	gp=r36					//2: {10:3} 214
	mov	r43=r37					//2: {10:3} 91
	add	r37=2,r37				//2: {8:2} 94
 }
 {   .mmb
	nop.m	0
	nop.m	0
	br.call.sptk	b0=sub# ;;			//2: {10:3} 92
// Block 18: lexit ltail  Pred: 17     Succ: 16 19  -GO
// Freq 2.5e+00
 }
 {   .mib
	mov	gp=r36					//4: {10:3} 215
	nop.i	0
// Branch taken probability 0.99
	br.cloop.sptk	.b1_16 ;;			//4: {8:2} 97
// Block 19: epilog  Pred: 18     Succ: 14  -GO
// Freq 5.0e-01
 }
 {   .mmi
	mov	r10=r32 ;;				//0: {8:2} 231
	shladd	r26=r10,1,r0				//1: {8:2} 98
	nop.i	0 ;;
 }
 {   .mii
	add	r30=-1,r26				//2: {8:2} 99
	nop.i	0
	nop.i	0 ;;
// Block 14:  Pred: 9 19     Succ: 5 20  -GO
// Freq 1.0e+00
 }
.b1_14: 
 {   .mii
	add	r29=r40,r30				//0: {10:3} 100
	sxt4	r28=r30 ;;				//0: {8:2} 82
	add	r3=-1,r29				//1: {10:3} 101
 }
 {   .mib
	cmp.lt.unc	p10,p13=r39,r28			//1: {8:2} 83
	nop.i	0
// Branch taken probability 0.50
  (p10)br.cond.dptk	.b1_5 ;;			//1: {8:2} 84
// Block 20:  Pred: 14     Succ: 26  -GO
// Freq 5.0e+00
 }
 {   .mib
  (p13)mov	r43=r3					//0: {10:3} 102
	nop.i	0
  (p13)br.call.dptk	b0=sub# ;;			//0: {10:3} 103
// Block 26:  Pred: 20     Succ: 5  -GO
// Freq 5.0e+00
 }
 {   .mii
	mov	gp=r36					//2: {10:3} 216
	nop.i	0
	nop.i	0 ;;
// Block 5: epilog  Pred: 4 55 14 10 26     Succ: 22  -GO
// Freq 0.0e+00
 }
.b1_5: 
 {   .mib
	ld4	r44=[r42]				//0: {8:2} 46
	mov	r43=r41					//0: {8:2} 47
	br.call.sptk	b0=__kmpc_for_static_fini# ;;	//0: {8:2} 49
// Block 22:  Pred: 5     Succ: 2  -GO
// Freq 0.0e+00
// Block 2: exit  Pred: 1 22     Succ:  -GO
// Freq 1.0e+00
 }
.b1_2: 
 {   .mii
	add	sp=32,sp				//0: {11:1} 237
	mov	ar.pfs=r33 ;;				//0: {11:1} 15
	mov	ar.lc=r35 ;;				//1: {11:1} 16
 }
 {   .mib
	mov	gp=r36					//2: {8:2} 217
	mov	b0=r34					//2: {11:1} 14
	br.ret.sptk.many	b0 ;;			//2: {11:1} 17
 }
	.section	.IA_64.unwind_info,	"a", "progbits"
	.align 8
__udt_fubar??unw:
	data8 0x1000000000003				// length: 24 bytes
							// flags: 0x00
							// version: 1
	string "\x60\x0c"				//R3: prologue size 12
	string "\xe0\x01\x02"				//P7: mem_stack_f t/off 0x1 size 32
	string "\xe6\x00"				//P7: pfs_when t/off 0x0
	string "\xb1\x21"				//P3: pfs_gr r33
	string "\xe4\x02"				//P7: rp_when t/off 0x2
	string "\xb0\xa2"				//P3: rp_gr r34
	string "\xea\x05"				//P7: lc_when t/off 0x5
	string "\xb2\xa3"				//P3: lc_gr r35
	string "\x61\x84\x01"				//R3: body size 132
	string "\x81"					//B1: label_state 1
	string "\xc0\x05"				//B2: epilog time 5 ecount 0
	string "\x00"
	.section .IA_64.unwind, "ao", "unwind"
	data8 @segrel(fubar??unw#)
	data8 @segrel(fubar??unw#+0x300)
	data8 @segrel(__udt_fubar??unw)
	.section .data, "wa", "progbits"
	.align 16
$2$1_2_kmpc_loc_struct_pack$0:
	data4.ua 0	// s32
	data4.ua 2	// s32
	data4.ua 0	// s32
	data4.ua 0	// s32
	data8.ua $2$1_2__kmpc_loc_pack$0#	// p64
	.skip 8	// pad
$2$1_2_kmpc_loc_struct_pack$1:
	data4.ua 0	// s32
	data4.ua 2	// s32
	data4.ua 0	// s32
	data4.ua 0	// s32
	data8.ua $2$1_2__kmpc_loc_pack$1#	// p64
$2$1_2__kmpc_loc_pack$0:
	data1 59	// s8
	data1 47	// s8
	data1 103	// s8
	data1 47	// s8
	data1 103	// s8
	data1 49	// s8
	data1 53	// s8
	data1 47	// s8
	data1 98	// s8
	data1 114	// s8
	data1 111	// s8
	data1 110	// s8
	data1 101	// s8
	data1 118	// s8
	data1 101	// s8
	data1 116	// s8
	data1 47	// s8
	data1 111	// s8
	data1 112	// s8
	data1 101	// s8
	data1 110	// s8
	data1 109	// s8
	data1 112	// s8
	data1 98	// s8
	data1 101	// s8
	data1 110	// s8
	data1 99	// s8
	data1 104	// s8
	data1 95	// s8
	data1 67	// s8
	data1 95	// s8
	data1 118	// s8
	data1 50	// s8
	data1 47	// s8
	data1 116	// s8
	data1 104	// s8
	data1 117	// s8
	data1 110	// s8
	data1 100	// s8
	data1 101	// s8
	data1 114	// s8
	data1 95	// s8
	data1 114	// s8
	data1 117	// s8
	data1 110	// s8
	data1 115	// s8
	data1 47	// s8
	data1 98	// s8
	data1 97	// s8
	data1 115	// s8
	data1 105	// s8
	data1 99	// s8
	data1 95	// s8
	data1 108	// s8
	data1 111	// s8
	data1 111	// s8
	data1 112	// s8
	data1 46	// s8
	data1 99	// s8
	data1 59	// s8
	data1 102	// s8
	data1 117	// s8
	data1 98	// s8
	data1 97	// s8
	data1 114	// s8
	data1 59	// s8
	data1 54	// s8
	data1 59	// s8
	data1 54	// s8
	data1 59	// s8
	data1 59	// s8
	.skip 1	// pad
$2$1_2__kmpc_loc_pack$1:
	data1 59	// s8
	data1 47	// s8
	data1 103	// s8
	data1 47	// s8
	data1 103	// s8
	data1 49	// s8
	data1 53	// s8
	data1 47	// s8
	data1 98	// s8
	data1 114	// s8
	data1 111	// s8
	data1 110	// s8
	data1 101	// s8
	data1 118	// s8
	data1 101	// s8
	data1 116	// s8
	data1 47	// s8
	data1 111	// s8
	data1 112	// s8
	data1 101	// s8
	data1 110	// s8
	data1 109	// s8
	data1 112	// s8
	data1 98	// s8
	data1 101	// s8
	data1 110	// s8
	data1 99	// s8
	data1 104	// s8
	data1 95	// s8
	data1 67	// s8
	data1 95	// s8
	data1 118	// s8
	data1 50	// s8
	data1 47	// s8
	data1 116	// s8
	data1 104	// s8
	data1 117	// s8
	data1 110	// s8
	data1 100	// s8
	data1 101	// s8
	data1 114	// s8
	data1 95	// s8
	data1 114	// s8
	data1 117	// s8
	data1 110	// s8
	data1 115	// s8
	data1 47	// s8
	data1 98	// s8
	data1 97	// s8
	data1 115	// s8
	data1 105	// s8
	data1 99	// s8
	data1 95	// s8
	data1 108	// s8
	data1 111	// s8
	data1 111	// s8
	data1 112	// s8
	data1 46	// s8
	data1 99	// s8
	data1 59	// s8
	data1 102	// s8
	data1 117	// s8
	data1 98	// s8
	data1 97	// s8
	data1 114	// s8
	data1 59	// s8
	data1 56	// s8
	data1 59	// s8
	data1 49	// s8
	data1 49	// s8
	data1 59	// s8
	data1 59	// s8
	.section .text, "xa", "progbits"
// -- End fubar
	.endp fubar#
	.type	__kmpc_for_static_fini#, at function
	.global __kmpc_for_static_fini#
	.type	__kmpc_for_static_init_4#, at function
	.global __kmpc_for_static_init_4#
	.type	__kmpc_global_thread_num#, at function
	.global __kmpc_global_thread_num#
	.type	sub#, at function
	.global sub#
// End
-------------- next part --------------
// mark_description "Intel(R) C++ Compiler for Itanium(R)-based applications";
// mark_description "Version 9.1    Build 20060523 %s";
// mark_description "-long_double -Xlinker -rpath -Xlinker /usr/local/intel/compiler91/lib -O0 -openmp -lm -lm -S -o basic_loop.";
// mark_description "O0.s";
	//.radix C
	.file "basic_loop.c"
	.section .text, "xa", "progbits"
	.align 64
// -- Begin fubar
	.proc fubar#
// Block 0: entry  Pred:     Succ: 13  -
// Freq 0.0e+00
	.global fubar#

fubar:
fubar??unw:
 {   .mmi
	alloc	r33=ar.pfs,1,26,8,0			//0: {6:1:basic_loop.c} 164
	add	sp=-192,sp				//0: {6:1} 165
	mov	r35=gp ;;				//0: {6:1} 113
 }
 {   .mii
	nop.m	0
	mov	r34=b0					//1: {6:1} 2
	mov	r36=r32					//1: {6:1} 3
 }
 {   .mmi
	add	r37=144,sp ;;				//1: {6:1} 166
	st8	[r37]=r36				//2: {6:1} 6
	add	r38=@ltoff($2$1_2_kmpc_loc_struct_pack$0#),gp ;;//2: {6:1} 7
 }
 {   .mii
	ld8	r39=[r38]				//3: {6:1} 8
	add	r20=64,sp				//3: {6:1} 116
	nop.i	0 ;;
 }
 {   .mii
	st8	[r20]=r35				//4: {6:1} 117
	add	r20=72,sp				//4: {6:1} 118
	nop.i	0 ;;
 }
 {   .mii
	st8	[r20]=r39				//5: {6:1} 119
	add	r20=72,sp				//5: {6:1} 120
	nop.i	0 ;;
 }
 {   .mmi
	ld8	r35=[r20] ;;				//6: {6:1} 121
	mov	r59=r35					//7: {6:1} 9
	nop.i	0
// Block 13:  Pred: 0     Succ: 1  -
// Freq 0.0e+00
 }
 {   .mib
	nop.m	0
	nop.i	0
	br.call.sptk	b0=__kmpc_global_thread_num# ;;//7: {6:1} 10
// Block 1:  Pred: 13     Succ: 2 3  -
// Freq 0.0e+00
 }
 {   .mmi
	add	r20=64,sp ;;				//9: {6:1} 122
	ld8	r35=[r20]				//10: {6:1} 123
	nop.i	0 ;;
 }
 {   .mii
	mov	gp=r35					//11: {6:1} 108
	mov	r36=r8					//11: {6:1} 107
	nop.i	0
 }
 {   .mmi
	add	r37=24,sp ;;				//11: {6:1} 12
	st4	[r37]=r36				//12: {6:1} 13
	add	r38=28,sp ;;				//12: {9:7} 14
 }
 {   .mii
	st4	[r38]=r0				//13: {9:7} 15
	add	r39=28,sp				//13: {9:2} 16
	nop.i	0 ;;
 }
 {   .mii
	ld4	r40=[r39]				//14: {9:2} 17
	add	r41=144,sp				//14: {9:2} 167
	nop.i	0 ;;
 }
 {   .mmi
	ld4	r42=[r41] ;;				//15: {9:2} 19
	cmp4.ge	p8,p0=r40,r42				//16: {9:2} 20
	add	r19=0,r0 ;;				//16: {9:2} 124
 }
 {   .mii
  (p8)	add	r19=1,r0				//17: {9:2} 125
	add	r20=80,sp				//17: {9:2} 126
	nop.i	0 ;;
 }
 {   .mib
	st8	[r20]=r19				//18: {9:2} 127
	nop.i	0
// Branch taken probability 0.50
  (p8)	br.cond.dptk	.b1_2 ;;			//18: {9:2} 21
// Block 3:  Pred: 1     Succ: 4  -
// Freq 0.0e+00
 }
 {   .mmi
	add	r35=32,sp ;;				//0: {8:2} 25
	st4	[r35]=r0				//1: {8:2} 26
	add	r36=144,sp ;;				//1: {8:2} 168
 }
 {   .mmi
	ld4	r37=[r36] ;;				//2: {8:2} 28
	add	r38=-1,r37				//3: {8:2} 29
	add	r39=36,sp ;;				//3: {8:2} 30
 }
 {   .mii
	st4	[r39]=r38				//4: {8:2} 31
	add	r40=144,sp				//4: {8:2} 169
	nop.i	0 ;;
 }
 {   .mmi
	ld4	r41=[r40] ;;				//5: {8:2} 33
	add	r42=-1,r41				//6: {8:2} 34
	add	r43=40,sp ;;				//6: {8:2} 35
 }
 {   .mii
	st4	[r43]=r42				//7: {8:2} 36
	add	r44=44,sp				//7: {8:2} 37
	nop.i	0 ;;
 }
 {   .mii
	st4	[r44]=r0				//8: {8:2} 38
	add	r45=48,sp				//8: {8:2} 39
	add	r46=1,r0 ;;				//8: {8:2} 40
 }
 {   .mii
	st4	[r45]=r46				//9: {8:2} 41
	add	r47=@ltoff($2$1_2_kmpc_loc_struct_pack$1#),gp//9: {8:2} 42
	nop.i	0 ;;
 }
 {   .mii
	ld8	r48=[r47]				//10: {8:2} 43
	add	r49=24,sp				//10: {8:2} 44
	nop.i	0 ;;
 }
 {   .mii
	ld4	r50=[r49]				//11: {8:2} 45
	add	r51=34,r0				//11: {8:2} 46
	add	r52=44,sp				//11: {8:2} 47
 }
 {   .mmb
	add	r53=32,sp				//11: {8:2} 48
	add	r54=36,sp				//11: {8:2} 49
	nop.b	0 ;;
 }
 {   .mii
	add	r55=48,sp				//12: {8:2} 50
	add	r56=1,r0				//12: {8:2} 51
	nop.i	0
 }
 {   .mmb
	add	r57=16,sp				//12: {8:2} 52
	add	r58=1,r0				//12: {8:2} 53
	nop.b	0 ;;
 }
 {   .mii
	st8	[r57]=r58				//13: {8:2} 54
	mov	r59=r48					//13: {8:2} 55
	mov	r60=r50					//13: {8:2} 56
 }
 {   .mmb
	mov	r61=r51					//13: {8:2} 57
	mov	r62=r52					//13: {8:2} 58
	nop.b	0 ;;
 }
 {   .mii
	mov	r63=r53					//14: {8:2} 59
	mov	r64=r54					//14: {8:2} 60
	nop.i	0
 }
 {   .mmb
	mov	r65=r55					//14: {8:2} 61
	mov	r66=r56					//14: {8:2} 62
	br.call.sptk	b0=__kmpc_for_static_init_4# ;;//14: {8:2} 63
// Block 4:  Pred: 3     Succ: 5 6  -
// Freq 0.0e+00
 }
 {   .mmi
	add	r20=64,sp ;;				//16: {8:2} 128
	ld8	r35=[r20]				//17: {8:2} 129
	nop.i	0 ;;
 }
 {   .mii
	mov	gp=r35					//18: {8:2} 109
	add	r36=32,sp				//18: {8:2} 64
	nop.i	0 ;;
 }
 {   .mii
	ld4	r37=[r36]				//19: {8:2} 65
	add	r38=36,sp				//19: {8:2} 66
	nop.i	0 ;;
 }
 {   .mii
	ld4	r39=[r38]				//20: {8:2} 67
	add	r40=48,sp				//20: {8:2} 68
	nop.i	0 ;;
 }
 {   .mii
	ld4	r41=[r40]				//21: {8:2} 69
	add	r42=40,sp				//21: {8:2} 70
	nop.i	0 ;;
 }
 {   .mmi
	ld4	r43=[r42] ;;				//22: {8:2} 71
	cmp4.gt	p8,p0=r37,r43				//23: {8:2} 72
	add	r20=88,sp ;;				//23: {8:2} 130
 }
 {   .mii
	st8	[r20]=r37				//24: {8:2} 131
	add	r20=96,sp				//24: {8:2} 132
	nop.i	0 ;;
 }
 {   .mii
	st8	[r20]=r39				//25: {8:2} 133
	add	r19=0,r0				//25: {8:2} 134
	nop.i	0 ;;
 }
 {   .mii
  (p8)	add	r19=1,r0				//26: {8:2} 135
	add	r20=104,sp				//26: {8:2} 136
	nop.i	0 ;;
 }
 {   .mib
	st8	[r20]=r19				//27: {8:2} 137
	nop.i	0
// Branch taken probability 0.50
  (p8)	br.cond.dptk	.b1_5 ;;			//27: {8:2} 73
// Block 6:  Pred: 4     Succ: 7 8  -
// Freq 0.0e+00
 }
 {   .mmi
	add	r35=40,sp ;;				//0: {8:2} 81
	ld4	r36=[r35]				//1: {8:2} 82
	add	r20=96,sp ;;				//1: {8:2} 138
 }
 {   .mmi
	ld8	r37=[r20] ;;				//2: {8:2} 139
	cmp4.le	p8,p0=r37,r36				//3: {8:2} 83
	add	r19=0,r0 ;;				//3: {8:2} 140
 }
 {   .mii
  (p8)	add	r19=1,r0				//4: {8:2} 141
	add	r20=112,sp				//4: {8:2} 142
	nop.i	0 ;;
 }
 {   .mib
	st8	[r20]=r19				//5: {8:2} 143
	nop.i	0
// Branch taken probability 0.50
  (p8)	br.cond.dptk.many	.b1_7 ;;		//5: {8:2} 84
// Block 8:  Pred: 6     Succ: 7  -
// Freq 0.0e+00
 }
 {   .mmi
	add	r35=40,sp ;;				//0: {8:2} 91
	ld4	r36=[r35]				//1: {8:2} 92
	add	r20=96,sp ;;				//1: {8:2} 144
 }
 {   .mii
	st8	[r20]=r36				//2: {8:2} 145
	nop.i	0
	nop.i	0 ;;
// Block 7:  Pred: 6 8     Succ: 9 5  -
// Freq 0.0e+00
 }
.b1_7: 
 {   .mii
	add	r35=28,sp				//0: {8:2} 85
	add	r20=88,sp				//0: {8:2} 146
	nop.i	0 ;;
 }
 {   .mmi
	ld8	r36=[r20] ;;				//1: {8:2} 147
	st4	[r35]=r36				//2: {8:2} 86
	add	r37=28,sp ;;				//2: {8:2} 87
 }
 {   .mii
	ld4	r38=[r37]				//3: {8:2} 88
	add	r20=96,sp				//3: {8:2} 148
	nop.i	0 ;;
 }
 {   .mmi
	ld8	r39=[r20] ;;				//4: {8:2} 149
	cmp4.le	p8,p0=r38,r39				//5: {8:2} 89
	add	r19=0,r0 ;;				//5: {8:2} 150
 }
 {   .mii
  (p8)	add	r19=1,r0				//6: {8:2} 151
	add	r20=120,sp				//6: {8:2} 152
	nop.i	0 ;;
 }
 {   .mib
	st8	[r20]=r19				//7: {8:2} 153
	nop.i	0
// Branch taken probability 0.50
  (p8)	br.cond.dptk	.b1_9 ;;			//7: {8:2} 90
// Block 5:  Pred: 10 7 4     Succ: 14  -
// Freq 0.0e+00
 }
.b1_5: 
 {   .mmi
	add	r35=@ltoff($2$1_2_kmpc_loc_struct_pack$1#),gp ;;//0: {8:2} 74
	ld8	r36=[r35]				//1: {8:2} 75
	add	r37=24,sp ;;				//1: {8:2} 76
 }
 {   .mii
	ld4	r38=[r37]				//2: {8:2} 77
	mov	r59=r36					//2: {8:2} 78
	nop.i	0 ;;
 }
 {   .mib
	mov	r60=r38					//3: {8:2} 79
	nop.i	0
	br.call.sptk	b0=__kmpc_for_static_fini# ;;	//3: {8:2} 80
// Block 14:  Pred: 5     Succ: 2  -
// Freq 0.0e+00
 }
 {   .mmi
	add	r20=64,sp ;;				//5: {8:2} 154
	ld8	r35=[r20]				//6: {8:2} 155
	nop.i	0 ;;
 }
 {   .mib
	mov	gp=r35					//7: {8:2} 110
	nop.i	0
	br.cond.sptk	.b1_2 ;;			//7: {8:2} 114
// Block 9:  Pred: 15 7     Succ: 10  -
// Freq 0.0e+00
 }
.b1_9: 
 {   .mmi
	add	r35=28,sp ;;				//0: {10:3} 93
	ld4	r36=[r35]				//1: {10:3} 94
	nop.i	0 ;;
 }
 {   .mib
	mov	r59=r36					//2: {10:3} 95
	nop.i	0
	br.call.sptk	b0=sub# ;;			//2: {10:3} 96
// Block 10:  Pred: 9     Succ: 5 15  -
// Freq 0.0e+00
 }
 {   .mmi
	add	r20=64,sp ;;				//4: {10:3} 156
	ld8	r35=[r20]				//5: {10:3} 157
	nop.i	0 ;;
 }
 {   .mii
	mov	gp=r35					//6: {10:3} 112
	add	r36=28,sp				//6: {9:23} 98
	nop.i	0 ;;
 }
 {   .mmi
	ld4	r37=[r36] ;;				//7: {9:23} 99
	add	r38=1,r37				//8: {9:23} 100
	add	r39=28,sp ;;				//8: {9:23} 101
 }
 {   .mii
	st4	[r39]=r38				//9: {9:23} 102
	add	r40=28,sp				//9: {9:2} 103
	nop.i	0 ;;
 }
 {   .mii
	ld4	r41=[r40]				//10: {9:2} 104
	add	r20=96,sp				//10: {9:2} 158
	nop.i	0 ;;
 }
 {   .mmi
	ld8	r42=[r20] ;;				//11: {9:2} 159
	cmp4.gt	p8,p0=r41,r42				//12: {9:2} 105
	add	r19=0,r0 ;;				//12: {9:2} 160
 }
 {   .mii
  (p8)	add	r19=1,r0				//13: {9:2} 161
	add	r20=128,sp				//13: {9:2} 162
	nop.i	0 ;;
 }
 {   .mbb
	st8	[r20]=r19				//14: {9:2} 163
// Branch taken probability 0.50
  (p8)	br.cond.dptk	.b1_5				//14: {9:2} 106
// Block 15:  Pred: 10     Succ: 9  -
// Freq 0.0e+00
	br.cond.sptk	.b1_9 ;;			//14: {9:2} 170
// Block 2: exit  Pred: 1 14     Succ:  -
// Freq 0.0e+00
 }
.b1_2: 
 {   .mii
	nop.m	0
	mov	b0=r34 ;;				//0: {11:1} 22
	mov	ar.pfs=r33				//1: {11:1} 23
 }
 {   .mib
	add	sp=192,sp				//1: {11:1} 171
	nop.i	0
	br.ret.sptk.many	b0 ;;			//1: {11:1} 24
 }
	.section	.IA_64.unwind_info,	"a", "progbits"
	.align 8
__udt_fubar??unw:
	data8 0x1000000000003				// length: 24 bytes
							// flags: 0x00
							// version: 1
	string "\x60\x15"				//R3: prologue size 21
	string "\xe0\x01\x0c"				//P7: mem_stack_f t/off 0x1 size 192
	string "\xe6\x00"				//P7: pfs_when t/off 0x0
	string "\xb1\x21"				//P3: pfs_gr r33
	string "\xe4\x04"				//P7: rp_when t/off 0x4
	string "\xb0\xa2"				//P3: rp_gr r34
	string "\x61\xc0\x01"				//R3: body size 192
	string "\x81"					//B1: label_state 1
	string "\xc0\x02"				//B2: epilog time 2 ecount 0
	string "\x00\x00\x00\x00\x00"
	.section .IA_64.unwind, "ao", "unwind"
	data8 @segrel(fubar??unw#)
	data8 @segrel(fubar??unw#+0x470)
	data8 @segrel(__udt_fubar??unw)
	.section .data, "wa", "progbits"
	.align 16
$2$1_2_kmpc_loc_struct_pack$0:
	data4.ua 0	// s32
	data4.ua 2	// s32
	data4.ua 0	// s32
	data4.ua 0	// s32
	data8.ua $2$1_2__kmpc_loc_pack$0#	// p64
	.skip 8	// pad
$2$1_2_kmpc_loc_struct_pack$1:
	data4.ua 0	// s32
	data4.ua 2	// s32
	data4.ua 0	// s32
	data4.ua 0	// s32
	data8.ua $2$1_2__kmpc_loc_pack$1#	// p64
$2$1_2__kmpc_loc_pack$0:
	data1 59	// s8
	data1 47	// s8
	data1 103	// s8
	data1 47	// s8
	data1 103	// s8
	data1 49	// s8
	data1 53	// s8
	data1 47	// s8
	data1 98	// s8
	data1 114	// s8
	data1 111	// s8
	data1 110	// s8
	data1 101	// s8
	data1 118	// s8
	data1 101	// s8
	data1 116	// s8
	data1 47	// s8
	data1 111	// s8
	data1 112	// s8
	data1 101	// s8
	data1 110	// s8
	data1 109	// s8
	data1 112	// s8
	data1 98	// s8
	data1 101	// s8
	data1 110	// s8
	data1 99	// s8
	data1 104	// s8
	data1 95	// s8
	data1 67	// s8
	data1 95	// s8
	data1 118	// s8
	data1 50	// s8
	data1 47	// s8
	data1 116	// s8
	data1 104	// s8
	data1 117	// s8
	data1 110	// s8
	data1 100	// s8
	data1 101	// s8
	data1 114	// s8
	data1 95	// s8
	data1 114	// s8
	data1 117	// s8
	data1 110	// s8
	data1 115	// s8
	data1 47	// s8
	data1 98	// s8
	data1 97	// s8
	data1 115	// s8
	data1 105	// s8
	data1 99	// s8
	data1 95	// s8
	data1 108	// s8
	data1 111	// s8
	data1 111	// s8
	data1 112	// s8
	data1 46	// s8
	data1 99	// s8
	data1 59	// s8
	data1 102	// s8
	data1 117	// s8
	data1 98	// s8
	data1 97	// s8
	data1 114	// s8
	data1 59	// s8
	data1 54	// s8
	data1 59	// s8
	data1 54	// s8
	data1 59	// s8
	data1 59	// s8
	.skip 1	// pad
$2$1_2__kmpc_loc_pack$1:
	data1 59	// s8
	data1 47	// s8
	data1 103	// s8
	data1 47	// s8
	data1 103	// s8
	data1 49	// s8
	data1 53	// s8
	data1 47	// s8
	data1 98	// s8
	data1 114	// s8
	data1 111	// s8
	data1 110	// s8
	data1 101	// s8
	data1 118	// s8
	data1 101	// s8
	data1 116	// s8
	data1 47	// s8
	data1 111	// s8
	data1 112	// s8
	data1 101	// s8
	data1 110	// s8
	data1 109	// s8
	data1 112	// s8
	data1 98	// s8
	data1 101	// s8
	data1 110	// s8
	data1 99	// s8
	data1 104	// s8
	data1 95	// s8
	data1 67	// s8
	data1 95	// s8
	data1 118	// s8
	data1 50	// s8
	data1 47	// s8
	data1 116	// s8
	data1 104	// s8
	data1 117	// s8
	data1 110	// s8
	data1 100	// s8
	data1 101	// s8
	data1 114	// s8
	data1 95	// s8
	data1 114	// s8
	data1 117	// s8
	data1 110	// s8
	data1 115	// s8
	data1 47	// s8
	data1 98	// s8
	data1 97	// s8
	data1 115	// s8
	data1 105	// s8
	data1 99	// s8
	data1 95	// s8
	data1 108	// s8
	data1 111	// s8
	data1 111	// s8
	data1 112	// s8
	data1 46	// s8
	data1 99	// s8
	data1 59	// s8
	data1 102	// s8
	data1 117	// s8
	data1 98	// s8
	data1 97	// s8
	data1 114	// s8
	data1 59	// s8
	data1 56	// s8
	data1 59	// s8
	data1 49	// s8
	data1 49	// s8
	data1 59	// s8
	data1 59	// s8
	.section .text, "xa", "progbits"
// -- End fubar
	.endp fubar#
	.type	__kmpc_for_static_fini#, at function
	.global __kmpc_for_static_fini#
	.type	__kmpc_for_static_init_4#, at function
	.global __kmpc_for_static_init_4#
	.type	__kmpc_global_thread_num#, at function
	.global __kmpc_global_thread_num#
	.type	sub#, at function
	.global sub#
// End
-------------- next part --------------
# -- Machine type IA32
# mark_description "Intel(R) C++ Compiler for 32-bit applications, Version 9.1    Build 20060519Z %s";
# mark_description "-long_double -Xlinker -rpath -Xlinker /usr/local/intel/compiler91/lib -no-ipo -S -O3 -openmp -lm -lm";
	.ident "Intel(R) C++ Compiler for 32-bit applications, Version 9.1    Build 20060519Z %s"
	.ident "-long_double -Xlinker -rpath -Xlinker /usr/local/intel/compiler91/lib -no-ipo -S -O3 -openmp -lm -lm"
	.file "basic_loop.c"
	.text
# -- Begin  fubar
# mark_begin;
       .align    2,0x90
	.globl fubar
fubar:
# parameter 1: 32 + %esp
..B1.1:                         # Preds ..B1.0
        pushl     %edi                                          #6.1
        pushl     %esi                                          #6.1
        pushl     %ebx                                          #6.1
        subl      $16, %esp                                     #6.1
        movl      32(%esp), %edi                                #5.1
        pushl     $.2.1_2_kmpc_loc_struct_pack.0                #6.1
        call      __kmpc_global_thread_num                      #6.1
                                # LOE eax ebp esi edi
..B1.16:                        # Preds ..B1.1
        popl      %ecx                                          #6.1
        movl      %eax, %ebx                                    #6.1
                                # LOE ebx ebp esi edi
..B1.2:                         # Preds ..B1.16
        testl     %edi, %edi                                    #9.2
        jle       ..B1.13       # Prob 10%                      #9.2
                                # LOE ebx ebp esi edi
..B1.3:                         # Preds ..B1.2
        lea       8(%esp), %eax                                 #8.2
        xorl      %edx, %edx                                    #8.2
        movl      %edx, 4(%esp)                                 #8.2
        movl      %edx, (%esp)                                  #8.2
        lea       12(%esp), %edx                                #8.2
        lea       -1(%edi), %edi                                #8.2
        movl      %edi, 8(%esp)                                 #8.2
        movl      $1, %ecx                                      #8.2
        movl      %ecx, 12(%esp)                                #8.2
        pushl     %ecx                                          #8.2
        pushl     %ecx                                          #8.2
        pushl     %edx                                          #8.2
        pushl     %eax                                          #8.2
        lea       20(%esp), %edx                                #8.2
        pushl     %edx                                          #8.2
        lea       20(%esp), %edx                                #8.2
        pushl     %edx                                          #8.2
        pushl     $34                                           #8.2
        pushl     %ebx                                          #8.2
        pushl     $.2.1_2_kmpc_loc_struct_pack.1                #8.2
        call      __kmpc_for_static_init_4                      #8.2
                                # LOE ebx ebp esi edi
..B1.17:                        # Preds ..B1.3
        addl      $36, %esp                                     #8.2
                                # LOE ebx ebp esi edi
..B1.4:                         # Preds ..B1.17
        movl      4(%esp), %edx                                 #8.2
        movl      8(%esp), %eax                                 #8.2
        cmpl      %edi, %edx                                    #8.2
        jg        ..B1.12       # Prob 50%                      #8.2
                                # LOE eax edx ebx ebp esi edi
..B1.5:                         # Preds ..B1.4
        cmpl      %edi, %eax                                    #8.2
        jle       ..B1.7        # Prob 50%                      #8.2
                                # LOE eax edx ebx ebp esi edi
..B1.6:                         # Preds ..B1.5
        movl      %edi, %eax                                    #8.2
                                # LOE eax edx ebx ebp esi
..B1.7:                         # Preds ..B1.6 ..B1.5
        cmpl      %eax, %edx                                    #8.2
        jg        ..B1.12       # Prob 50%                      #8.2
                                # LOE eax edx ebx ebp esi
..B1.8:                         # Preds ..B1.7
        movl      %eax, %esi                                    #
        movl      %edx, %edi                                    #
                                # LOE ebx ebp esi edi
..B1.9:                         # Preds ..B1.10 ..B1.8
        pushl     %edi                                          #10.3
        call      sub                                           #10.3
                                # LOE ebx ebp esi edi
..B1.18:                        # Preds ..B1.9
        popl      %ecx                                          #10.3
                                # LOE ebx ebp esi edi
..B1.10:                        # Preds ..B1.18
        addl      $1, %edi                                      #9.23
        cmpl      %esi, %edi                                    #9.2
        jle       ..B1.9        # Prob 82%                      #9.2
                                # LOE ebx ebp esi edi
..B1.12:                        # Preds ..B1.10 ..B1.7 ..B1.4
        pushl     %ebx                                          #6.1
        pushl     $.2.1_2_kmpc_loc_struct_pack.1                #6.1
        call      __kmpc_for_static_fini                        #8.2
                                # LOE ebp esi
..B1.19:                        # Preds ..B1.12
        addl      $8, %esp                                      #8.2
                                # LOE ebp esi
..B1.13:                        # Preds ..B1.19 ..B1.2
        addl      $16, %esp                                     #11.1
        popl      %ebx                                          #11.1
        popl      %esi                                          #11.1
        popl      %edi                                          #11.1
        ret                                                     #11.1
        .align    2,0x90
                                # LOE
# mark_end;
	.type	fubar, at function
	.size	fubar,.-fubar
	.data
	.align 4
	.align 4
.2.1_2_kmpc_loc_struct_pack.0:
	.long	0
	.long	2
	.long	0
	.long	0
	.long	.2.1_2__kmpc_loc_pack.0
.2.1_2__kmpc_loc_pack.0:
	.byte	59
	.byte	47
	.byte	103
	.byte	47
	.byte	103
	.byte	49
	.byte	53
	.byte	47
	.byte	98
	.byte	114
	.byte	111
	.byte	110
	.byte	101
	.byte	118
	.byte	101
	.byte	116
	.byte	47
	.byte	111
	.byte	112
	.byte	101
	.byte	110
	.byte	109
	.byte	112
	.byte	98
	.byte	101
	.byte	110
	.byte	99
	.byte	104
	.byte	95
	.byte	67
	.byte	95
	.byte	118
	.byte	50
	.byte	47
	.byte	109
	.byte	99
	.byte	114
	.byte	95
	.byte	114
	.byte	117
	.byte	110
	.byte	115
	.byte	47
	.byte	98
	.byte	97
	.byte	115
	.byte	105
	.byte	99
	.byte	95
	.byte	108
	.byte	111
	.byte	111
	.byte	112
	.byte	46
	.byte	99
	.byte	59
	.byte	102
	.byte	117
	.byte	98
	.byte	97
	.byte	114
	.byte	59
	.byte	54
	.byte	59
	.byte	54
	.byte	59
	.byte	59
	.space 1	# pad
.2.1_2_kmpc_loc_struct_pack.1:
	.long	0
	.long	2
	.long	0
	.long	0
	.long	.2.1_2__kmpc_loc_pack.1
.2.1_2__kmpc_loc_pack.1:
	.byte	59
	.byte	47
	.byte	103
	.byte	47
	.byte	103
	.byte	49
	.byte	53
	.byte	47
	.byte	98
	.byte	114
	.byte	111
	.byte	110
	.byte	101
	.byte	118
	.byte	101
	.byte	116
	.byte	47
	.byte	111
	.byte	112
	.byte	101
	.byte	110
	.byte	109
	.byte	112
	.byte	98
	.byte	101
	.byte	110
	.byte	99
	.byte	104
	.byte	95
	.byte	67
	.byte	95
	.byte	118
	.byte	50
	.byte	47
	.byte	109
	.byte	99
	.byte	114
	.byte	95
	.byte	114
	.byte	117
	.byte	110
	.byte	115
	.byte	47
	.byte	98
	.byte	97
	.byte	115
	.byte	105
	.byte	99
	.byte	95
	.byte	108
	.byte	111
	.byte	111
	.byte	112
	.byte	46
	.byte	99
	.byte	59
	.byte	102
	.byte	117
	.byte	98
	.byte	97
	.byte	114
	.byte	59
	.byte	56
	.byte	59
	.byte	49
	.byte	49
	.byte	59
	.byte	59
	.data
# -- End  fubar
	.data
	.section .note.GNU-stack, ""
# End
-------------- next part --------------
# -- Machine type IA32
# mark_description "Intel(R) C++ Compiler for 32-bit applications, Version 9.1    Build 20060519Z %s";
# mark_description "-long_double -Xlinker -rpath -Xlinker /usr/local/intel/compiler91/lib -no-ipo -S -O0 -openmp -lm -lm";
	.ident "Intel(R) C++ Compiler for 32-bit applications, Version 9.1    Build 20060519Z %s"
	.ident "-long_double -Xlinker -rpath -Xlinker /usr/local/intel/compiler91/lib -no-ipo -S -O0 -openmp -lm -lm"
	.file "basic_loop.c"
	.data
	.text
# -- Begin  fubar
# mark_begin;
       .align    2,0x90
	.globl fubar
fubar:
# parameter 1: 8 + %ebp
..B1.1:                         # Preds ..B1.0
        pushl     %ebp                                          #6.1
        movl      %esp, %ebp                                    #6.1
        subl      $40, %esp                                     #6.1
        pushl     %edi                                          #6.1
        movl      $.2.1_2_kmpc_loc_struct_pack.0, (%esp)        #6.1
        call      __kmpc_global_thread_num                      #6.1
                                # LOE eax
..B1.15:                        # Preds ..B1.1
        popl      %ecx                                          #6.1
        movl      %eax, -8(%ebp)                                #6.1
                                # LOE
..B1.2:                         # Preds ..B1.15
        movl      -8(%ebp), %eax                                #6.1
        movl      %eax, -36(%ebp)                               #6.1
        movl      $0, -16(%ebp)                                 #9.7
        movl      -16(%ebp), %eax                               #9.14
        movl      8(%ebp), %edx                                 #9.18
        cmpl      %edx, %eax                                    #9.2
        jge       ..B1.12       # Prob 50%                      #9.2
                                # LOE
..B1.3:                         # Preds ..B1.2
        xorl      %eax, %eax                                    #8.2
        movl      %eax, -28(%ebp)                               #8.2
        movl      8(%ebp), %edx                                 #9.18
        decl      %edx                                          #8.2
        movl      %edx, -24(%ebp)                               #8.2
        movl      8(%ebp), %edx                                 #9.18
        decl      %edx                                          #8.2
        movl      %edx, -12(%ebp)                               #8.2
        movl      %eax, -32(%ebp)                               #8.2
        movl      $1, %eax                                      #8.2
        movl      %eax, -20(%ebp)                               #8.2
        addl      $-36, %esp                                    #8.2
        movl      $.2.1_2_kmpc_loc_struct_pack.1, (%esp)        #8.2
        movl      -36(%ebp), %edx                               #8.2
        movl      %edx, 4(%esp)                                 #8.2
        movl      $34, 8(%esp)                                  #8.2
        lea       -32(%ebp), %edx                               #8.2
        movl      %edx, 12(%esp)                                #8.2
        lea       -28(%ebp), %edx                               #8.2
        movl      %edx, 16(%esp)                                #8.2
        lea       -24(%ebp), %edx                               #8.2
        movl      %edx, 20(%esp)                                #8.2
        lea       -20(%ebp), %edx                               #8.2
        movl      %edx, 24(%esp)                                #8.2
        movl      %eax, 28(%esp)                                #9.23
        movl      %eax, 32(%esp)                                #8.2
        call      __kmpc_for_static_init_4                      #8.2
                                # LOE
..B1.16:                        # Preds ..B1.3
        addl      $36, %esp                                     #8.2
                                # LOE
..B1.4:                         # Preds ..B1.16
        movl      -28(%ebp), %eax                               #8.2
        movl      %eax, -40(%ebp)                               #8.2
        movl      -24(%ebp), %edx                               #8.2
        movl      %edx, -4(%ebp)                                #8.2
        movl      -12(%ebp), %edx                               #8.2
        cmpl      %edx, %eax                                    #8.2
        jg        ..B1.8        # Prob 50%                      #8.2
                                # LOE
..B1.5:                         # Preds ..B1.4
        movl      -12(%ebp), %eax                               #8.2
        movl      -4(%ebp), %edx                                #8.2
        cmpl      %eax, %edx                                    #8.2
        jle       ..B1.7        # Prob 50%                      #8.2
                                # LOE
..B1.6:                         # Preds ..B1.5
        movl      -12(%ebp), %eax                               #8.2
        movl      %eax, -4(%ebp)                                #8.2
                                # LOE
..B1.7:                         # Preds ..B1.6 ..B1.5
        movl      -4(%ebp), %eax                                #8.2
        movl      -40(%ebp), %edx                               #8.2
        movl      %edx, -16(%ebp)                               #8.2
        movl      -16(%ebp), %edx                               #9.14
        cmpl      %eax, %edx                                    #8.2
        jle       ..B1.10       # Prob 50%                      #8.2
                                # LOE
..B1.8:                         # Preds ..B1.11 ..B1.7 ..B1.4
        addl      $-8, %esp                                     #8.2
        movl      $.2.1_2_kmpc_loc_struct_pack.1, (%esp)        #8.2
        movl      -36(%ebp), %eax                               #8.2
        movl      %eax, 4(%esp)                                 #8.2
        call      __kmpc_for_static_fini                        #8.2
                                # LOE
..B1.17:                        # Preds ..B1.8
        addl      $8, %esp                                      #8.2
        jmp       ..B1.12       # Prob 100%                     #8.2
                                # LOE
..B1.10:                        # Preds ..B1.7 ..B1.11
        pushl     %edi                                          #10.3
        movl      -16(%ebp), %eax                               #10.7
        movl      %eax, (%esp)                                  #10.7
        call      sub                                           #10.3
                                # LOE
..B1.18:                        # Preds ..B1.10
        popl      %ecx                                          #10.3
                                # LOE
..B1.11:                        # Preds ..B1.18
        movl      -4(%ebp), %eax                                #9.23
        incl      -16(%ebp)                                     #9.23
        movl      -16(%ebp), %edx                               #9.14
        cmpl      %eax, %edx                                    #9.2
        jle       ..B1.10       # Prob 50%                      #9.2
        jmp       ..B1.8        # Prob 100%                     #9.2
                                # LOE
..B1.12:                        # Preds ..B1.17 ..B1.2
        leave                                                   #11.1
        ret                                                     #11.1
        .align    2,0x90
                                # LOE
# mark_end;
	.type	fubar, at function
	.size	fubar,.-fubar
	.data
	.align 4
	.align 4
.2.1_2_kmpc_loc_struct_pack.0:
	.long	0
	.long	2
	.long	0
	.long	0
	.long	.2.1_2__kmpc_loc_pack.0
.2.1_2__kmpc_loc_pack.0:
	.byte	59
	.byte	47
	.byte	103
	.byte	47
	.byte	103
	.byte	49
	.byte	53
	.byte	47
	.byte	98
	.byte	114
	.byte	111
	.byte	110
	.byte	101
	.byte	118
	.byte	101
	.byte	116
	.byte	47
	.byte	111
	.byte	112
	.byte	101
	.byte	110
	.byte	109
	.byte	112
	.byte	98
	.byte	101
	.byte	110
	.byte	99
	.byte	104
	.byte	95
	.byte	67
	.byte	95
	.byte	118
	.byte	50
	.byte	47
	.byte	109
	.byte	99
	.byte	114
	.byte	95
	.byte	114
	.byte	117
	.byte	110
	.byte	115
	.byte	47
	.byte	98
	.byte	97
	.byte	115
	.byte	105
	.byte	99
	.byte	95
	.byte	108
	.byte	111
	.byte	111
	.byte	112
	.byte	46
	.byte	99
	.byte	59
	.byte	102
	.byte	117
	.byte	98
	.byte	97
	.byte	114
	.byte	59
	.byte	54
	.byte	59
	.byte	54
	.byte	59
	.byte	59
	.space 1	# pad
.2.1_2_kmpc_loc_struct_pack.1:
	.long	0
	.long	2
	.long	0
	.long	0
	.long	.2.1_2__kmpc_loc_pack.1
.2.1_2__kmpc_loc_pack.1:
	.byte	59
	.byte	47
	.byte	103
	.byte	47
	.byte	103
	.byte	49
	.byte	53
	.byte	47
	.byte	98
	.byte	114
	.byte	111
	.byte	110
	.byte	101
	.byte	118
	.byte	101
	.byte	116
	.byte	47
	.byte	111
	.byte	112
	.byte	101
	.byte	110
	.byte	109
	.byte	112
	.byte	98
	.byte	101
	.byte	110
	.byte	99
	.byte	104
	.byte	95
	.byte	67
	.byte	95
	.byte	118
	.byte	50
	.byte	47
	.byte	109
	.byte	99
	.byte	114
	.byte	95
	.byte	114
	.byte	117
	.byte	110
	.byte	115
	.byte	47
	.byte	98
	.byte	97
	.byte	115
	.byte	105
	.byte	99
	.byte	95
	.byte	108
	.byte	111
	.byte	111
	.byte	112
	.byte	46
	.byte	99
	.byte	59
	.byte	102
	.byte	117
	.byte	98
	.byte	97
	.byte	114
	.byte	59
	.byte	56
	.byte	59
	.byte	49
	.byte	49
	.byte	59
	.byte	59
	.data
# -- End  fubar
	.data
	.section .note.GNU-stack, ""
# End


More information about the Omp mailing list