Biomed Challenge: Asked Yannick to see if we can find another solution then inbound connectivity to the nodes for the flex licence.
QMUL: try to understand what causes such a low number of running jobs while there is a lot of jobs queued. see attached plots.
- DNS problem: very high load on the dns server. Process zombie when trying to kill it. All Grid services stuck. Could restart the dns and the situation seems to be stabilized.
- Maui conf: Reservation did not work if ops jobs where submitted on the long queue. This is because the reservation period was not set to infinity.
- dzero station sandbox full. All dzero sites affected. Frederic was waiting that QMUL is back to send jobs there for testing.
- I have investigated lesc to try to understand what causes such a low number of running jobs while there is a lot of jobs queued. see attached plots. On the left number of scheduled jobs and on the right running.
Other Stuff:
- SAM Monitoring: Asked all sites to check their status in SAM
- Asked to be mapped to ops to test UCL srm. No answer yet to the ticket #12970
- Installed new apel accounting rpms at UCL-HEP. Will see tomorrow if it has solved the problem.
No comments:
Post a Comment