Message boards : Theory Application : input/output error - No output found
Message board moderation

To post messages, you must log in.

AuthorMessage
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,380,724
RAC: 102,161
Message 45079 - Posted: 23 Jun 2021, 10:05:40 UTC

I've just had several cases where the CPU stopped being utilized a few minutes after start - but the task did not terminate and continued running.
Unfortunataly, a few of these task ran about 16 hours before I detected that something was wrong. Here an example:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=318193899

Excerpt from the stderr:

2021-06-22 19:13:19 (18096): Guest Log: 19:14:26 CEST +02:00 2021-06-22: cranky: [INFO] Running Container 'runc'.
2021-06-22 19:15:05 (18096): Guest Log: standard_init_linux.go:203: exec user process caused "input/output error"
2021-06-22 19:15:05 (18096): Guest Log: 19:16:12 CEST +02:00 2021-06-22: cranky: [INFO] Container 'runc' finished with status code 1.
2021-06-22 19:15:05 (18096): Guest Log: 19:16:12 CEST +02:00 2021-06-22: cranky: [INFO] Preparing output.
2021-06-22 19:15:05 (18096): Guest Log: 19:16:12 CEST +02:00 2021-06-22: cranky: [ERROR] No output found

anyone any idea what the reason for this problem may be?

It's just too unfortunate that such erroneous tasks don't therminate by themselves, but run forever until one detects the problem.
With having numerous Theory tasks running on several PCs and notebooks, I cannot check back every single task whether it's running okay or not every 10 minutes :-(
ID: 45079 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,128,280
RAC: 105,358
Message 45080 - Posted: 23 Jun 2021, 12:11:01 UTC - in response to Message 45079.  

This thread is also about your question:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5581#44080
Some Theory tasks not starting well, for ever what the reason is.
ID: 45080 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,380,724
RAC: 102,161
Message 45081 - Posted: 23 Jun 2021, 18:25:56 UTC - in response to Message 45080.  

This thread is also about your question:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5581#44080
Some Theory tasks not starting well, for ever what the reason is.
the error described in the cited thread was a different one (too bad that the task cited in that thread is no longer on the server). But the result was basically the same, i.e. the tasks do not get processed properly, but they keep running forever.
Really too bad that there is no automatic mechanism by which a task terminates itself once it does not function the way it's supposed to.
ID: 45081 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,380,724
RAC: 102,161
Message 45179 - Posted: 11 Aug 2021, 18:17:50 UTC

today, I had quite a number of tasks with this error:

2021-08-11 07:56:04 (6480): Guest Log: standard_init_linux.go:203: exec user process caused "input/output error"
2021-08-11 07:56:04 (6480): Guest Log: 07:56:04 CEST +02:00 2021-08-11: cranky: [INFO] Container 'runc' finished with status code 1.
2021-08-11 07:56:04 (6480): Guest Log: 07:56:04 CEST +02:00 2021-08-11: cranky: [INFO] Preparing output.
2021-08-11 07:56:04 (6480): Guest Log: 07:56:04 CEST +02:00 2021-08-11: cranky: [ERROR] No output found.

https://lhcathome.cern.ch/lhcathome/result.php?resultid=323544464

this is really annoying, because the WU keeps running and running and running ... probably for days and weeks, until one finds out.

Why can the developers not put in some mechanism which makes the WU terminate once this problem occurs ?
Obviously, this problems comes up quite often.
ID: 45179 · Report as offensive     Reply Quote

Message boards : Theory Application : input/output error - No output found


©2024 CERN