[Pkg-gridengine-devel] Bug#543649: gridengine-exec: sge_execd very CPU hungry for no apparent reason

Mario Lang mlang at tugraz.at
Wed Aug 26 10:50:40 UTC 2009


Package: gridengine-exec
Version: 6.2-4
Severity: minor

sge_execd appears to be very CPU hungry for no apparent reason.
The following is the topmost lines of "top" on my execution host,
currently running a MPICH2 job with 16 processes.  The execution
host has 32 cores:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
25089 lang      20   0 3029m 2.9g 6772 R  100  2.3   4:56.87 hpcc_acml_mpich
25093 lang      20   0 3030m 2.9g 6892 R  100  2.3   4:56.82 hpcc_acml_mpich
25079 lang      20   0 3030m 2.9g 6984 R  100  2.3   4:56.90 hpcc_acml_mpich
25080 lang      20   0 3030m 2.9g 6460 R  100  2.3   4:56.96 hpcc_acml_mpich
25081 lang      20   0 3030m 2.9g 6468 R  100  2.3   4:57.00 hpcc_acml_mpich
25082 lang      20   0 3029m 2.9g 7108 R  100  2.3   4:56.99 hpcc_acml_mpich
25083 lang      20   0 3028m 2.9g 5780 R  100  2.3   4:56.93 hpcc_acml_mpich
25084 lang      20   0 3029m 2.9g 6856 R  100  2.3   4:56.91 hpcc_acml_mpich
25085 lang      20   0 3030m 2.9g 6528 R  100  2.3   4:56.90 hpcc_acml_mpich
25086 lang      20   0 3030m 2.9g 6520 R  100  2.3   4:56.83 hpcc_acml_mpich
25087 lang      20   0 3030m 2.9g 7696 R  100  2.3   4:56.96 hpcc_acml_mpich
25088 lang      20   0 3030m 2.9g 6404 R  100  2.3   4:56.23 hpcc_acml_mpich
25090 lang      20   0 3030m 2.9g 6884 R  100  2.3   4:56.78 hpcc_acml_mpich
25091 lang      20   0 3030m 2.9g 6848 R  100  2.3   4:56.83 hpcc_acml_mpich
25092 lang      20   0 3030m 2.9g 7928 R  100  2.3   4:56.91 hpcc_acml_mpich
25094 lang      20   0 3029m 2.9g 6028 R  100  2.3   4:56.81 hpcc_acml_mpich
30263 sgeadmin  20   0  118m 2232 1728 S   38  0.0  11:54.75 sge_execd

$ ps auxw|grep 30263
sgeadmin 30263  0.4  0.0 121440  2244 ?        Sl   Aug24  12:44 /usr/lib/gridengine/sge_execd
$ date
Wed Aug 26 12:27:47 CEST 2009

sge_execd is running since two days now, and has already consumed 12
minutes of CPU time (on a 2.3 GHZ Opteron).  Given that I have just run
a few test jobs, this looks very wasteful.  sge_execd seems to bounce
around between 5% and sometimes even 30% CPU use during execution of
this parallel job.

Attached is the output of the following command:

# time strace -r -osge_execd.strace -p30263
Process 30263 attached - interrupt to quit
^CProcess 30263 detached

real    2m53.132s
user    0m2.104s
sys     0m14.473s
# wc -l sge_execd.strace
508030 sge_execd.strace

500k syscalls in 3 minutes (31MB unbzipped).

-------------- next part --------------
A non-text attachment was scrubbed...
Name: sge_execd.strace.bz2
Type: application/octet-stream
Size: 709250 bytes
Desc: strace of sge_execd
URL: <http://lists.alioth.debian.org/pipermail/pkg-gridengine-devel/attachments/20090826/574606e8/attachment-0001.obj>
-------------- next part --------------

-- 
Regards,
      Mario Lang

Graz University of Technology        mailto:mlang at TUGraz.at
Department Computing               http://www.ZID.TUGraz.at/lang/
Phone: +43 (0) 316 / 873 - 6897
    /____________________________________________________________/
  /_Apparently a teacher has been arrested in the UK in possession_/
 /of a compass, protractor, and straight edge. It is claimed he is a/
/member of the Al Gebra movement, bearing weapons of math instruction/


More information about the Pkg-gridengine-devel mailing list