1) Message boards : Number crunching : VM Job Unmanageable? (Message 30445)
Posted 21 May 2017 by Terrible T
Post:
Had the same issue, seeing the 'F:\tinderbox\.......' made me kind of paranoia, as I don't have an F drive. (And I don't WannaCry..)
Did project reset, remove/ install VM, no louck.

Finally found a mention somewhere that following file might be missing/corrupt:
C:\Users\USERNAME\.VirtualBox\VirtualBox.xml

File VirtualBox.xml was empty, replaced it with the VirtualBox.xml-prev, and
was back up and running.
Looks to me as a scientist put the wrong VM data file onto the project.
2) Message boards : ATLAS application : never ending tasks here (Message 30363)
Posted 16 May 2017 by Terrible T
Post:
New record ?

WU 67789777 ran for nearly 24hrs, was about to abort but it suddenly decided to be finished..
Nice score though; 7,214.07 pts.
In logfile no "hits" , so wonder if any scientific value for this one
3) Message boards : ATLAS application : Very long tasks in the queue (Message 29733)
Posted 31 Mar 2017 by Terrible T
Post:
Also had a lot of validate errors overnight. Almost all with the same msg in the log, all at msg# 11. ( e.g. WU 62921130)

Guest Log: PyJobTransforms.trfExe._writeAthenaWrapper 2017-03-31 06:37:41,435 INFO Valgrind not engaged
: PyJobTransforms.trfExe.preExecute 2017-03-31 06:37:41,435 INFO Athena will be executed in a subshell via ['./runwrapper.EVNTtoHITS.sh']
: PyJobTransforms.trfExe.execute 2017-03-31 06:37:41,435 INFO Starting execution of EVNTtoHITS (['./runwrapper.EVNTtoHITS.sh'])
: PyJobTransforms.trfExe.execute 2017-03-31 06:43:42,116 INFO EVNTtoHITS executor returns 33
: PyJobTransforms.trfExe.validate 2017-03-31 06:43:43,039 ERROR Validation of return code failed: Non-zero return code from EVNTtoHITS (33) (Error code 65)
: Guest Log: PyJobTransforms.trfExe.validate 2017-03-31 06:43:43,066 INFO Scanning logfile log.EVNTtoHITS for errors
: PyJobTransforms.trfValidation.scanLogFile 2017-03-31 06:43:43,138 WARNING Found message number 11 at level ERROR - this and further messages will be supressed from the report
: PyJobTransforms.transform.execute 2017-03-31 06:43:43,139 CRITICAL Transform executor raised TransformValidationException: Non-zero return code from EVNTtoHITS (33); Logfile error in log.EVNTtoHITS: "AtlasFieldSvc FATAL Could not book callback for /GLOBAL/BField/Maps"
: PyJobTransforms.transform.execute 2017-03-31 06:43:46,329 WARNING Transform now exiting early with exit code 65 (Non-zero return code from EVNTtoHITS (33); Logfile error in log.EVNTtoHITS: "AtlasFieldSvc FATAL Could not book callback for /GLOBAL/BField/Maps")


Faulty batch of WU's?
4) Message boards : ATLAS application : never ending tasks here (Message 29117)
Posted 9 Mar 2017 by Terrible T
Post:
Yesterday neverending (looping?) multicore tasks appeared, which just keep running. Have aborted 1 task , stoped 1 through VBox, updated VBox,
still endless loop, see VBox log. Any body an idea?

00:00:50.246422 VMMDev: Guest Log: VBoxGuest: VBoxGuestCommonGuestCapsAcquire: pSession(0xffff880310b2d610), OR(0x0), NOT(0xffffffff), flags(0x0)
00:02:16.236076 VMMDev: Guest Log: Copying input files into RunAtlas.
00:02:18.692928 VMMDev: Guest Log: Copied input files into RunAtlas.
00:02:20.860967 VMMDev: Guest Log: copied the webapp to /var/www
00:02:20.950547 VMMDev: Guest Log: This vm does not need to setup http proxy
00:02:21.031455 VMMDev: Guest Log: ATHENA_PROC_NUMBER=11
00:02:21.101961 VMMDev: Guest Log: Starting ATLAS job. (PandaID=3260989220)
00:54:04.894395 VMMDev: Guest Log: Copying input files into RunAtlas.
00:54:06.537649 VMMDev: Guest Log: Copied input files into RunAtlas.
00:54:06.974127 VMMDev: Guest Log: copied the webapp to /var/www
00:54:07.030660 VMMDev: Guest Log: This vm does not need to setup http proxy
00:54:07.079244 VMMDev: Guest Log: ATHENA_PROC_NUMBER=11
00:54:07.166297 VMMDev: Guest Log: Starting ATLAS job. (PandaID=3260989220)
5) Message boards : ATLAS application : never ending tasks here (Message 29083)
Posted 6 Mar 2017 by Terrible T
Post:
Also had (when using computer ) some tasks running longer than expected, .
Will happen around 80% completion, processor load nil.
When either suspending job, or opening VM Virtual Box, the job status in the 'Task' pane will change from 'running' to 'uploading', and task will report succesfull.
(see Task 123141235)

Is the process not able to give an 'file completed' to the VM when using the computer? e.g. too low process priority or similar I/O conflict?
6) Message boards : ATLAS application : LHC@Home consolidation - ATLAS (Message 28743)
Posted 30 Jan 2017 by Terrible T
Post:
Yes had Max#CPU at 10 cores for first WU, then increased to 11.
re memory should I understand that the total required memory is
10.5GB for the VM's AND 10.5GB 'direct' memory?
Also noted WU progress slowing down , 70% at 20mins; 80% at 30 mins; and 90% 45mins.
Also noted only 1 core full used, the other 9 around 10-15%.
7) Message boards : ATLAS application : LHC@Home consolidation - ATLAS (Message 28738)
Posted 30 Jan 2017 by Terrible T
Post:
Got some 10core WU's, noticed they use a lot of memory, especially near the end of calculations, ~12GB ('in use' in Resource monitor). This on top of the VM memory (16GB ('Standby') in my case.

Also run time is appr. 50min, where the manager shows 17min before starting.

anybody similar numbers?
8) Message boards : News : VM applications broken by the Windows 10 update KB3206632 (Message 28737)
Posted 30 Jan 2017 by Terrible T
Post:
Also had ,after Win10 latest update last week, the "VM Hypervisor failed to enter and online state in a timely fashion" coming up on both ATLAS@home and LHC@home, with the VM box included with BOINC .
Uninstalled VM Virtual box and installed latest version from web, now happily crunching again.



©2024 CERN