Thursday, September 21, 2006

Woodcrest: Asked ICT to reserve a set of UID/GID to avoid clashes between hep and ict.

Biomed Challenge: Asked Yannick to see if we can find another solution then inbound connectivity to the nodes for the flex licence.

QMUL: try to understand what causes such a low number of running jobs while there is a lot of jobs queued. see attached plots.
  • DNS problem: very high load on the dns server. Process zombie when trying to kill it. All Grid services stuck. Could restart the dns and the situation seems to be stabilized.
  • Maui conf: Reservation did not work if ops jobs where submitted on the long queue. This is because the reservation period was not set to infinity.
Dzero:
  • dzero station sandbox full. All dzero sites affected. Frederic was waiting that QMUL is back to send jobs there for testing.
  • I have investigated lesc to try to understand what causes such a low number of running jobs while there is a lot of jobs queued. see attached plots. On the left number of scheduled jobs and on the right running.



Other Stuff:
  • SAM Monitoring: Asked all sites to check their status in SAM
  • Asked to be mapped to ops to test UCL srm. No answer yet to the ticket #12970
  • Installed new apel accounting rpms at UCL-HEP. Will see tomorrow if it has solved the problem.

No comments: