Thread 'Atlas tasks are failing, server status show also 0 tasks ready to send'

Author	Message
Harri Liljeroos Send message Joined: 28 Sep 04 Posts: 794 Credit: 63,781,707 RAC: 29,585	Message 32001 - Posted: 20 Aug 2017, 8:05:02 UTC t two latest tasks failed and show this error: [pre]WARNING Transform now exiting early with exit code 15 (No events to process: 4400 (skipEvents) >= 4400 (inputEvents of EVNT)[/pre] The server status page shows that the ready to send queue is empty. Same for sixtrack also. So time to select another subproject. ID: 32001 · Reply Quote

David Cameron Project administrator Project developer Project scientist Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0	Message 32020 - Posted: 21 Aug 2017, 7:33:41 UTC - in response to Message 32001. Indeed we ran out of tasks over the weekend, I have asked for more to be submitted. ID: 32020 · Reply Quote

rbpeake Send message Joined: 17 Sep 04 Posts: 106 Credit: 36,549,147 RAC: 18	Message 32025 - Posted: 21 Aug 2017, 21:07:29 UTC - in response to Message 32020. Seems like a lot of validate errors. Regards, Bob P. ID: 32025 · Reply Quote

Harri Liljeroos Send message Joined: 28 Sep 04 Posts: 794 Credit: 63,781,707 RAC: 29,585	Message 32037 - Posted: 22 Aug 2017, 6:49:02 UTC Server status page shows new Atlas tasks are available. ID: 32037 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2724 Credit: 299,002,782 RAC: 23,871	Message 32044 - Posted: 22 Aug 2017, 10:26:59 UTC Rather short for a WU with 198 MB initial download, isn't it? Usual walltimes on this host are >5 h. https://lhcathome.cern.ch/lhcathome/result.php?resultid=153867426 ID: 32044 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2280 Credit: 178,780,681 RAC: 260	Message 32048 - Posted: 22 Aug 2017, 18:57:30 UTC Server show ZERO Tasks to send. ID: 32048 · Reply Quote

tullio Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0	Message 32080 - Posted: 24 Aug 2017, 14:07:16 UTC Last modified: 24 Aug 2017, 14:08:14 UTC After a failed attempt to raise the VirtualMachine memory via the VirtualBox Manager, which resulted in a extremely long computing time and a final failure, I have crunched three Atlas tasks on my Linux laptop with a E-450 AMD CPU. They all finished in time and are validated. They are running alongside a climateprediction.net task with a very extended deadline (one year) and they stop momentarily waiting for memory, then resume crunching. I am glad to be able to run at least some LHC@home tasks, while my SUN WS on Linux and Windows 10 PC are crunching Einstein@home and SETI@home tasks both CPU and GPU. Tullio ID: 32080 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2724 Credit: 299,002,782 RAC: 23,871	Message 32082 - Posted: 24 Aug 2017, 15:24:55 UTC - in response to Message 32080. Unfortunately your ATLAS WUs don't deliver valid scientific results, they are only rewarded for CPU time and that's the reason why they are marked as "valid" in the tasklist. You may look into your error logs, e.g. https://lhcathome.cern.ch/lhcathome/result.php?resultid=153646199. There you find: 2017-08-23 23:19:09 (30418): Setting Memory Size for VM. (4200MB) 2017-08-23 23:19:10 (30418): Setting CPU Count for VM. (2) Setting #CPUs to 2 is not recommended on this host, especially if there is any other project that runs concurrently. You may run ATLAS on a 1-core setting and suspend all other BOINC tasks while it is executed. Setting RAM size to only 4200 MB is also not recommended as it leads to the following errors: 2017-08-24 00:07:21 (30418): Guest Log: PyJobTransforms.transform.execute 2017-08-24 00:03:19,485 CRITICAL Transform executor raised TransformValidationException: Non-zero return code from EVNTtoHITS (65); Logfile error in log.EVNTtoHITS: "AthMpEvtLoopMgr FATAL makePool failed for AthMpEvtLoopMgr.SharedEvtQueueProvider" 2017-08-24 00:07:21 (30418): Guest Log: PyJobTransforms.transform.execute 2017-08-24 00:03:24,036 WARNING Transform now exiting early with exit code 65 (Non-zero return code from EVNTtoHITS (65); Logfile error in log.EVNTtoHITS: "AthMpEvtLoopMgr FATAL makePool failed for AthMpEvtLoopMgr.SharedEvtQueueProvider") Even on a 1-core setting, a RAM size of 4200 MB may be not enough for the current batch. What is also suspect: - the very short runtimes of your WUs - a result file named HITS.* is missing in the log ID: 32082 · Reply Quote

tullio Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0	Message 32083 - Posted: 24 Aug 2017, 16:21:43 UTC - in response to Message 32082. OK, I've limited the number of cores to 1 and I shall watch what happens. But I have only 8 GB RAM on that PC, the mobo allows only that, and cannot starve other projects. Tullio ID: 32083 · Reply Quote

tullio Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0	Message 32084 - Posted: 24 Aug 2017, 21:32:46 UTC One core task started, looks OK, using 3400 MB. Tullio ID: 32084 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2724 Credit: 299,002,782 RAC: 23,871	Message 32086 - Posted: 25 Aug 2017, 7:08:48 UTC - in response to Message 32084. @ tullio Compared to other hosts and the current ATLAS batch I would expect completion times between 12 h and 15 h for your E-450. Since you wrote your last post it reported a couple of ATLAS WUs with completion times of less than 1 h. Although all of them are rewarded they still don't deliver what you probably expect. If you like to spend the time for a test you may rise the RAM setting for a 1-core ATLAS VM to 5000 MB and start only this VM. No other BOINC app should run or even be left in RAM during the test. The critical phase is short after the start when the VM extracts the EVNT.* file. You may check the stderr.txt in the slots dir of the running VM for the messages in my previous post. If this test ends successfully you may repeat it with another project (no vbox project) running concurrently and/or with a slightly reduced RAM setting for your VM until the errors occur again. If the test doesn't end successfully your E-450 is to weak to run ATLAS (and probably also CMS and LHCb). Then you may repeat the test with Theory Simulation at it's default RAM setting. Hope you'll get a success. ID: 32086 · Reply Quote

tullio Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0	Message 32098 - Posted: 26 Aug 2017, 6:55:52 UTC - in response to Message 32086. In the error log of the last failed task I found a phrase about an extension pack not installed. Mea culpa, mea culpa, mea maxima culpa. I installed it and now I shall watch the next task when the laptop finishes two SETI@home tasks I downloaded to keep it crunching something. Thanks anyway for your suggestion, I get very little help from anyone else. Tullio ID: 32098 · Reply Quote

Toby Broom Volunteer moderator Send message Joined: 27 Sep 08 Posts: 900 Credit: 773,137,317 RAC: 213,809	Message 32103 - Posted: 26 Aug 2017, 8:04:00 UTC tulio, yetis check list is good if you have issues: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4161#29359 This one failed with too much disk usuage, you can adjust the setting to 8GB for BOINC in settings. https://lhcathome.cern.ch/lhcathome/result.php?resultid=153909057 ID: 32103 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2724 Credit: 299,002,782 RAC: 23,871	Message 32105 - Posted: 26 Aug 2017, 8:18:30 UTC - in response to Message 32103. tulio, yetis check list is good if you have issues: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4161#29359 This one failed with too much disk usuage, you can adjust the setting to 8GB for BOINC in settings. https://lhcathome.cern.ch/lhcathome/result.php?resultid=153909057 But beside that the processing time and the error log looks very good. Could have been a success. Nonetheless I would spend more RAM than the configured 3400 MB. ID: 32105 · Reply Quote

tullio Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0	Message 32111 - Posted: 26 Aug 2017, 12:41:58 UTC - in response to Message 32103. Toby, I have 14.30 GB available to BOINC on the HP Laptop with a 1 TB hybrid disk, by Seagate if I remember. But after LHC "consolidation" all LHC tasks fail on all PCs, two Linux and one Windows, save SixTrack. I am testing Atlas@home on the slowest machine, the HP laptop with AMD E-450 CPU while the two other PCs run SETI@home and Einstein@home CPU and GPU tasks with no problem. The Windows 10 PC, updated every month by Microsoft, has 22 GB RAM and 1 TB disk. The SUN WS, my oldest machine, has 1 TB disk, 8 GB RAM and a GTX 750 Ti GPU board. The Windows PC has a GTX 1050 Ti GPU board, with Pascal microprocessor and 640 GPU cores. Maybe CERN should start thinking about GPUs, now that they have taken a role in SKA array computing, according to CERN Courier. Tullio ID: 32111 · Reply Quote

Toby Broom Volunteer moderator Send message Joined: 27 Sep 08 Posts: 900 Credit: 773,137,317 RAC: 213,809	Message 32119 - Posted: 26 Aug 2017, 17:22:38 UTC - in response to Message 32105. Last modified: 26 Aug 2017, 17:26:55 UTC I think the 3400MB is visable in BOINC UI? When you made the appconfig this over rides the usage in the VM to 5000MB(?). The UI however isn't updated so there is the difference. ID: 32119 · Reply Quote

Toby Broom Volunteer moderator Send message Joined: 27 Sep 08 Posts: 900 Credit: 773,137,317 RAC: 213,809	Message 32120 - Posted: 26 Aug 2017, 17:25:41 UTC - in response to Message 32111. Last modified: 26 Aug 2017, 17:27:29 UTC There was some talk of a sixtrack app for GPU but I think they will target AVX 1st as it's simpler to do. I feel like it will be a long time for GPU, it's not so simple to run Fortran on GPU. The other project will never use GPU as the VM is there to make it easy for the scientist not easy for us ;). If it wasn't easy for them then they wouldn't exist. I just set mine to 250GB, it never uses that much and I'm sure I would notice before it actually use it. ID: 32120 · Reply Quote

tullio Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0	Message 32149 - Posted: 31 Aug 2017, 6:19:16 UTC - in response to Message 32119. I've written the app_config.xml file as suggested which should bring the memory to 5000 MB. But Atlas tasks still start at 3400 MB. This on the SUN WS, a Linux box. Tullio ID: 32149 · Reply Quote

Jim1348 Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0	Message 32151 - Posted: 31 Aug 2017, 6:56:50 UTC - in response to Message 32149. Last modified: 31 Aug 2017, 6:57:35 UTC I've written the app_config.xml file as suggested which should bring the memory to 5000 MB. But Atlas tasks still start at 3400 MB. This on the SUN WS, a Linux box. Tullio That is OK, the app_config.xml just sets the maximum amount of memory that can be used. But setting it to 5000 MB fixed the problem for me, even though the Atlas tasks still show as 3400 MB. ID: 32151 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2724 Credit: 299,002,782 RAC: 23,871	Message 32152 - Posted: 31 Aug 2017, 7:01:05 UTC - in response to Message 32149. tullio wrote: I've written the app_config.xml file as suggested which should bring the memory to 5000 MB. But Atlas tasks still start at 3400 MB. This on the SUN WS, a Linux box. Tullio Do you have a message like the following in your BOINC client's logfile? Do 31 Aug 2017 08:49:44 CEST \| LHC@home \| Found app_config.xml If not, reload your config files (e.g. via client menu -> options) or restart the client/computer. The changed settings don't affect WUs that are already running, i.e. reside in a "slots" dir. You may also post your app_config.xml here. ID: 32152 · Reply Quote