Message boards : ATLAS application : Just more of the same failures
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
greg_be

Send message
Joined: 28 Dec 08
Posts: 318
Credit: 4,148,677
RAC: 2,010
Message 41739 - Posted: 27 Feb 2020, 17:08:04 UTC

I don't get it guys.
I am about 50% failure rate on these tasks.
I have all the different restrictions and run times and all that other stuff in my app.config file, but yet I still get tasks with no or little CPU usage. 9hrs plus run time and not completed and stalled out as stated in previous threads.

Vbox is up to date. Extensions ok.
Memory for this is 8000MB
CPU and threads is set for 4, preferences on account page set for 4.
Max allowed tasks to download 1 and to run 1

What crashes on my machine runs fine on linux and apple.
I am not sure what else to do.
The out of memory stuff you see was when the memory level was set for 10,000.

Can automatic system cleaners foul up ATLAS?
I honestly don't know what else to do!

It's weird though, there is no pattern I can find, its just random.
Sometimes after a reboot ATLAS settles back down.
Currently I see ATLAS taking 6600MB of memory and combined with other BOINC projects I am only using 50% of my Memory.

But what is with the 1% or less CPU usage? That is where the task always stalls or can not complete in 9hrs.

When I do have a good run, I can finish it in 2-4 hrs.

Any ideas?
I'm out of ideas.
ID: 41739 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,888,499
RAC: 138,302
Message 41740 - Posted: 27 Feb 2020, 17:26:18 UTC - in response to Message 41739.  

Your recent logs show that your VMs run with 2241 MB RAM.
2020-02-27 14:49:02 (9324): Setting Memory Size for VM. (2241MB)
2020-02-27 14:49:02 (9324): Setting CPU Count for VM. (4)

Far too low for a 4-core setup.
ATLAS VMs require 3000+ncores*900 MB RAM, in your case 6600 MB.

I suspect that your VMs start but due to the low RAM setting they run into heavy swapping which slows them down.
Hence the low CPU usage.
ID: 41740 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 318
Credit: 4,148,677
RAC: 2,010
Message 41741 - Posted: 27 Feb 2020, 19:28:16 UTC - in response to Message 41740.  
Last modified: 27 Feb 2020, 19:49:48 UTC

Your recent logs show that your VMs run with 2241 MB RAM.
2020-02-27 14:49:02 (9324): Setting Memory Size for VM. (2241MB)
2020-02-27 14:49:02 (9324): Setting CPU Count for VM. (4)

Far too low for a 4-core setup.
ATLAS VMs require 3000+ncores*900 MB RAM, in your case 6600 MB.

I suspect that your VMs start but due to the low RAM setting they run into heavy swapping which slows them down.
Hence the low CPU usage.



So I see that, but how do i change that permanently?
What I am looking at does not allow me to change anything.
I tried something I saw on a web page about messing with the machine in use, but when I did what it told me I lost the task. So, looking at a video I see I should be able to use the slider, but on the old task I couldn't.
https://superuser.com/questions/926339/how-to-change-the-ram-allocated-to-an-os-in-virtualbox and a video https://www.youtube.com/watch?v=viT8mwSiC4k
Think the first page is not the right one and in the video, I had no access to the slider while the machine was running.

BUT- How do I increase the memory size all the time permanently? This is what I would really like to know.
Is that something you can write into app.config or what?
I don't have the time or interest to mess around with each new machine for each new task.

And then this question, how come the memory is set so low on the VM by default?[/url]
ID: 41741 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,888,499
RAC: 138,302
Message 41742 - Posted: 27 Feb 2020, 20:17:49 UTC - in response to Message 41741.  

Don't mess around in your VirtualBox UI with any setting of your VM!
The vboxwrapper will automatically configure it based on the values it gets from the project server.

You wrote "app.config", sometimes "app config".
Doublecheck the hints. It's filename is "app_config.xml"


Best would be that you start your configuration from the scratch following all hints step by step.


1. Set your BOINC projects to "no new tasks"
2. Cancel all vbox tasks
3. Do a project update to report all pending tasks
4. Shutdown your BOINC client
5. Edit your web preferences and set "Max # jobs=1" and "Max # CPUs=4"
6. Deselect all LHC@home apps except ATLAS
7. Downgrade VirtualBox to version 6.0.18 to ensure 6.1.x doesn't introduce additional problems
8. Install virtual box additions version 6.0.18
9. Remove any existing app_config.xml
10. Restart your computer
11. Start your BOINC client
12. Allow new tasks for LHC@home and request a fresh ATLAS task
13. Once the task is running locate the stderr.txt in the /slots/x/ folder and check for lines that show the #cores and the RAM setting
14. From your BOINC GUI call the task's console window and check ALT-F3 (=top) for the RAM setting inside the VM
15. Check ALT-F2 for the ATLAS event progress monitoring


If you set 7000 MB somewhere in a file this would be written to the logfile.
The fact that your logfile shows a different setting points out that your configuration is broken somewhere.
This is homemade since the value of 2241 MB will not be set by the project server.

BTW:
7000 MB is also not the recommended setting for a 4-core setup.
ID: 41742 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 318
Credit: 4,148,677
RAC: 2,010
Message 41743 - Posted: 28 Feb 2020, 0:05:59 UTC - in response to Message 41742.  

Don't mess around in your VirtualBox UI with any setting of your VM!
The vboxwrapper will automatically configure it based on the values it gets from the project server.

You wrote "app.config", sometimes "app config".
Doublecheck the hints. It's filename is "app_config.xml"


Best would be that you start your configuration from the scratch following all hints step by step.


1. Set your BOINC projects to "no new tasks"
2. Cancel all vbox tasks
3. Do a project update to report all pending tasks
4. Shutdown your BOINC client
5. Edit your web preferences and set "Max # jobs=1" and "Max # CPUs=4"
6. Deselect all LHC@home apps except ATLAS
7. Downgrade VirtualBox to version 6.0.18 to ensure 6.1.x doesn't introduce additional problems
8. Install virtual box additions version 6.0.18
9. Remove any existing app_config.xml
10. Restart your computer
11. Start your BOINC client
12. Allow new tasks for LHC@home and request a fresh ATLAS task
13. Once the task is running locate the stderr.txt in the /slots/x/ folder and check for lines that show the #cores and the RAM setting
14. From your BOINC GUI call the task's console window and check ALT-F3 (=top) for the RAM setting inside the VM
15. Check ALT-F2 for the ATLAS event progress monitoring


If you set 7000 MB somewhere in a file this would be written to the logfile.
The fact that your logfile shows a different setting points out that your configuration is broken somewhere.
This is homemade since the value of 2241 MB will not be set by the project server.

BTW:
7000 MB is also not the recommended setting for a 4-core setup.

-------
yeah sorry..app_config is correctly named. Just my inability to recall it correctly
7000 is just a value I selected. I see that in Boinc Tasks monitoring program that it is more like 6600 or sometimes a little higher.
As for the rest..well I will look at that in 8-10 hrs after I recharge my human batteries.
Thanks for all the suggestions.
ID: 41743 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 318
Credit: 4,148,677
RAC: 2,010
Message 41744 - Posted: 28 Feb 2020, 0:42:20 UTC - in response to Message 41743.  

[quote]Don't mess around in your VirtualBox UI with any setting of your VM!
The vboxwrapper will automatically configure it based on the values it gets from the project server.

You wrote "app.config", sometimes "app config".
Doublecheck the hints. It's filename is "app_config.xml"


Best would be that you start your configuration from the scratch following all hints step by step.


1. Set your BOINC projects to "no new tasks"
2. Cancel all vbox tasks
3. Do a project update to report all pending tasks
4. Shutdown your BOINC client
5. Edit your web preferences and set "Max # jobs=1" and "Max # CPUs=4"
6. Deselect all LHC@home apps except ATLAS
7. Downgrade VirtualBox to version 6.0.18 to ensure 6.1.x doesn't introduce additional problems
8. Install virtual box additions version 6.0.18
9. Remove any existing app_config.xml
10. Restart your computer
11. Start your BOINC client
12. Allow new tasks for LHC@home and request a fresh ATLAS task
13. Once the task is running locate the stderr.txt in the /slots/x/ folder and check for lines that show the #cores and the RAM setting
14. From your BOINC GUI call the task's console window and check ALT-F3 (=top) for the RAM setting inside the VM
15. Check ALT-F2 for the ATLAS event progress monitoring


If you set 7000 MB somewhere in a file this would be written to the logfile.
The fact that your logfile shows a different setting points out that your configuration is broken somewhere.
This is homemade since the value of 2241 MB will not be set by the project server.

BTW:
7000 MB is also not the recommended setting for a 4-core setup.

-------
yeah sorry..app_config is correctly named. Just my inability to recall it correctly
7000 is just a value I selected. I see that in Boinc Tasks monitoring program that it is more like 6600 or sometimes a little higher.

---
0130 CET
Downgrade complete
app_config renamed with .old
web settings changed
2020-02-28 01:25:07 (7736): Setting Memory Size for VM. (2241MB)
2020-02-28 01:25:07 (7736): Setting CPU Count for VM. (4)
Running 3 tasks without a problem (let some stay that were already downloaded)
Memory load with 3 ATLAS simultaneously and other projects uses only 65% of physical memory.
CPU usage is good per ATLAS task 48,95,65% (3 different tasks running at the same time)
So I guess that made a good change.

I will look again after i get up and see how things look.
ID: 41744 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 318
Credit: 4,148,677
RAC: 2,010
Message 41745 - Posted: 28 Feb 2020, 7:57:51 UTC
Last modified: 28 Feb 2020, 7:58:07 UTC

Looks like everything is ok now.
After the backlog of tasks gets cleared out, is it safe to add CMS and Theory back into the mix?
ID: 41745 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,888,499
RAC: 138,302
Message 41746 - Posted: 28 Feb 2020, 8:09:53 UTC - in response to Message 41744.  

You must be more precise!
Your computer does not forgive a single typo in your configuration.
And in your posts it leads to confusion if you use incorrect terms.
See here:
yeah sorry..app_config is correctly named. Just my inability to recall it correctly
.
.
.
app_config renamed with .old

Again, it must be "app_config.xml" instead of just "app_config".



Running 3 tasks without a problem (let some stay that were already downloaded)

Why did you ignore steps (2.) and (3.)?
They ensure you work on a fresh task instead of a previously downloaded.

From 3 tasks you mentioned as running only 1 is listed as finished in the project DB.
The good thing: It produced a HITS file

The bad things (from your logfile) https://lhcathome.cern.ch/lhcathome/result.php?resultid=265168654:
2020-02-28 01:25:07 (7736): Setting Memory Size for VM. (2241MB)

If your setup is not broken, this should be 6600MB.


2020-02-28 01:25:04 (7736): Detected: vboxwrapper 26197
.
.
.
2020-02-28 01:25:54 (7736): Guest Log:  *** Starting ATLAS job. (PandaID=4653804579 taskID=20660732) ***
2020-02-28 01:26:06 (7736): Stopping VM.
2020-02-28 01:26:07 (7736): Error in stop VM for VM: -108
Command:
VBoxManage -q controlvm "boinc_c5af58479869929a" savestate
Output:

2020-02-28 01:26:07 (7736): VM did not stop when requested.
2020-02-28 01:26:07 (7736): VM was successfully terminated.
2020-02-28 01:27:32 (15740): Detected: vboxwrapper 26197

Your VM started twice, 2020-02-28 01:25:04 and 2020-02-28 01:27:32.
Why did it stop?
What did you change between 01:26:07 and 01:27:32?


I see that in Boinc Tasks monitoring program...

Just another imprecise term or do you use other tools beside the original BOINC Manager to control/modify your BOINC setup?


You may insert the following step in the to-do-list:
3a. Do a project reset

Then repeat steps (1.), (2.), (3.), (3a.), (4.), (9.), (11.), (12.), (13.), (14.), (15.).
ID: 41746 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 318
Credit: 4,148,677
RAC: 2,010
Message 41752 - Posted: 28 Feb 2020, 15:20:54 UTC
Last modified: 28 Feb 2020, 15:37:31 UTC

1. Set your BOINC projects to "no new tasks" - Done
2. Cancel all vbox tasks - Done No projects are using VBOX at this time.
3. Do a project update to report all pending tasks - All aborted and updated. They would fail anyway.
4. Shutdown your BOINC client - Done
5. Edit your web preferences and set "Max # jobs=1" and "Max # CPUs=4" - Done
6. Deselect all LHC@home apps except ATLAS - Done
7. Downgrade VirtualBox to version 6.0.18 to ensure 6.1.x doesn't introduce additional problems - Done
8. Install virtual box additions version 6.0.18 - Done
9. Remove any existing app_config.xml -Done = Running 0 app_config.xml
10. Restart your computer - Done
11. Start your BOINC client - Done
12. Allow new tasks for LHC@home and request a fresh ATLAS task - Done
13. Once the task is running locate the stderr.txt in the /slots/x/ folder and check for lines that show the #cores and the RAM setting - In queue with other projects will update when ATLAS starts.
14. From your BOINC GUI call the task's console window and check ALT-F3 (=top) for the RAM setting inside the VM - Waiting for ATLAS to start in queue
15. Check ALT-F2 for the ATLAS event progress monitoring - waiting.


Boinc Tasks - https://efmer.com/ I use it strictly as a monitoring program and to make it easier to report tasks manually. It also clumps all the same style of tasks together and makes it easier to see what is being run, how much run time/cpu time, percent of cpu used per task, percent done/left, memory usage, etc. For modifying memory, queue size, suspending projects and stuff like that I use BOINC.
-------------------

Your VM started twice, 2020-02-28 01:25:04 and 2020-02-28 01:27:32.
Why did it stop? It autostarted after install and I needed it to restart to install the extensions pack.
What did you change between 01:26:07 and 01:27:32? - see above
ID: 41752 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 318
Credit: 4,148,677
RAC: 2,010
Message 41755 - Posted: 28 Feb 2020, 20:48:41 UTC

Freaking task bombed. Low memory.
Get real!
Reinstating the memory restriction option.
This is getting stupid!
ID: 41755 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 318
Credit: 4,148,677
RAC: 2,010
Message 41756 - Posted: 28 Feb 2020, 21:05:19 UTC

app_config.xml reinstated. 1 task at a time download 1 task in process.
ATLAS only, no other LHC projects running.
Memory set for 6600.
This is the start log.
If this fails to run properly, I'm done.
I have followed your steps to the T.


2020-02-28 21:58:38 (11180): Detected: vboxwrapper 26197
2020-02-28 21:58:38 (11180): Detected: BOINC client v7.7
2020-02-28 21:58:39 (11180): Detected: VirtualBox VboxManage Interface (Version: 6.0.18)
2020-02-28 21:58:39 (11180): Successfully copied 'init_data.xml' to the shared directory.
2020-02-28 21:58:42 (11180): Create VM. (boinc_9705fa3b03d0857e, slot#18)
2020-02-28 21:58:42 (11180): Setting Memory Size for VM. (6600MB) <----
2020-02-28 21:58:43 (11180): Setting CPU Count for VM. (4) <---
2020-02-28 21:58:43 (11180): Setting Chipset Options for VM.
2020-02-28 21:58:43 (11180): Setting Boot Options for VM.
2020-02-28 21:58:44 (11180): Setting Network Configuration for NAT.
2020-02-28 21:58:44 (11180): Enabling VM Network Access.
2020-02-28 21:58:44 (11180): Disabling USB Support for VM.
2020-02-28 21:58:44 (11180): Disabling COM Port Support for VM.
2020-02-28 21:58:45 (11180): Disabling LPT Port Support for VM.
2020-02-28 21:58:45 (11180): Disabling Audio Support for VM.
2020-02-28 21:58:46 (11180): Disabling Clipboard Support for VM.
2020-02-28 21:58:46 (11180): Disabling Drag and Drop Support for VM.
2020-02-28 21:58:46 (11180): Adding storage controller(s) to VM.
2020-02-28 21:58:47 (11180): Adding virtual disk drive to VM. (vm_image.vdi)
2020-02-28 21:58:47 (11180): Adding VirtualBox Guest Additions to VM.
2020-02-28 21:58:47 (11180): Adding network bandwidth throttle group to VM. (Defaulting to 1024GB)
2020-02-28 21:58:47 (11180): forwarding host port 50234 to guest port 80
2020-02-28 21:58:48 (11180): Enabling remote desktop for VM.
2020-02-28 21:58:48 (11180): Enabling shared directory for VM.
2020-02-28 21:58:49 (11180): Starting VM using VBoxManage interface. (boinc_9705fa3b03d0857e, slot#18)
2020-02-28 21:58:54 (11180): Successfully started VM. (PID = '3252')
2020-02-28 21:58:54 (11180): Reporting VM Process ID to BOINC.
2020-02-28 21:58:54 (11180): Guest Log: BIOS: VirtualBox 6.0.18

2020-02-28 21:59:28 (11180): Guest Log: *** Starting ATLAS job. (PandaID=4656314433 taskID=20660752) ***

2020-02-28 21:59:30 (11180): Guest Log: 00:00:10.003198 timesync vgsvcTimeSyncWorker: Radical guest time change: -3 600 782 656 000ns (GuestNow=1 582 923 557 997 531 000 ns GuestLast=1 582 927 158 780 187 000 ns fSetTimeLastLoop=true )
ID: 41756 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,888,499
RAC: 138,302
Message 41757 - Posted: 28 Feb 2020, 21:35:15 UTC - in response to Message 41756.  

No patience?
There's no need to use an app_config.xml since it overwrites the default settings with exactly the same values.

The logs tell us that the tasks are starting fine until this lines appear:
2020-02-28 21:36:09 (8896): VM is no longer is a running state. It is in 'lse, errorID=HostMemoryLow message="Unable to allocate and lock memory. The virtual machine will be paused. Please close applications to free up memory or close the VM"
'.
2020-02-28 21:36:09 (8896): VM state change detected. (old = 'Running', new = 'lse, errorID=HostMemoryLow message="Unable to allocate and lock memory. The virtual machine will be paused. Please close applications to free up memory or close the VM"
')



There's an old thread in the VirtualBox message board regarding "errorID=HostMemoryLow":
https://forums.virtualbox.org/viewtopic.php?f=6&t=85816
It states that the error might be caused by an antivirus app or another program that allocates/fragments the available RAM.
ID: 41757 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 318
Credit: 4,148,677
RAC: 2,010
Message 41758 - Posted: 28 Feb 2020, 23:45:46 UTC - in response to Message 41757.  
Last modified: 29 Feb 2020, 0:42:19 UTC

No patience?
There's no need to use an app_config.xml since it overwrites the default settings with exactly the same values.

The logs tell us that the tasks are starting fine until this lines appear:
2020-02-28 21:36:09 (8896): VM is no longer is a running state. It is in 'lse, errorID=HostMemoryLow message="Unable to allocate and lock memory. The virtual machine will be paused. Please close applications to free up memory or close the VM"
'.
2020-02-28 21:36:09 (8896): VM state change detected. (old = 'Running', new = 'lse, errorID=HostMemoryLow message="Unable to allocate and lock memory. The virtual machine will be paused. Please close applications to free up memory or close the VM"
')



There's an old thread in the VirtualBox message board regarding "errorID=HostMemoryLow":
https://forums.virtualbox.org/viewtopic.php?f=6&t=85816
It states that the error might be caused by an antivirus app or another program that allocates/fragments the available RAM.

--------------
Let me put it this way, I have been (bleeeeep) around with problems on ATLAS for I don't know how long. I have about a 48% success rate before this discussion. My errors and aborts for stalls is greater than my success rate, hence my frustration.

I don't get why with app_config.xml things work fine, but without it, it crashes.
I will do a test over the weekend and see if it will work with or without it.

As far as antivirus, I don't have anything other than windows, the only other thing, I am not running any sort of memory compression programs, so I have no idea what would cause that error.

With ATLAS , WCG (rainfall),Universe@home,Amicable numbers and GPU grid and FAH I am using 80% of my total memory. Rainfall is the highest after ATLAS, but only at 416MB, FAH is only using 22MB

BOINC and all its components are stored on my digital c: drive. So you can't defrag.

I had a crash for some reason while I was AFK, so I had to reboot, but the ATLAS task resumed where it left off and is running just fine. .2000% every 2 seconds and maxes out at 117% CPU which I really wonder how you can use more than 100%, but whatever. To me that says its working correctly.

2020-02-29 00:46:21 (11404): Status Report: Elapsed Time: '6000.000000'
2020-02-29 00:46:21 (11404): Status Report: CPU Time: '21490.062500'
53.7%

I'll have a look again when I get up in the morning, since your on the same time zone you will see it when you get up or come back on. As of this post it has 2:38 to go and is 42*% complete, which sets it on course to finish in its normal 5-6 hr range if everything goes right.
ID: 41758 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,888,499
RAC: 138,302
Message 41761 - Posted: 29 Feb 2020, 8:31:42 UTC - in response to Message 41752.  

1. Set your BOINC projects to "no new tasks" - Done
2. Cancel all vbox tasks - Done No projects are using VBOX at this time.
3. Do a project update to report all pending tasks - All aborted and updated. They would fail anyway.
???
4. Shutdown your BOINC client - Done

I'm missing step (3a.)
In a previous post I suggested to do a project reset.
It appears that you never did it, did you?

A project reset would have cleaned weird settings from your client_state.xml and it ensures you get a fresh copy of the ATLAS vdi, just in case the recent one got damaged.
You may consider to run your LHC buffer empty and then do that project reset.
ID: 41761 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 318
Credit: 4,148,677
RAC: 2,010
Message 41764 - Posted: 29 Feb 2020, 9:41:46 UTC - in response to Message 41761.  
Last modified: 29 Feb 2020, 9:43:16 UTC

1. Set your BOINC projects to "no new tasks" - Done
2. Cancel all vbox tasks - Done No projects are using VBOX at this time.
3. Do a project update to report all pending tasks - All aborted and updated. They would fail anyway.
???
4. Shutdown your BOINC client - Done

I'm missing step (3a.)
In a previous post I suggested to do a project reset.
It appears that you never did it, did you?

A project reset would have cleaned weird settings from your client_state.xml and it ensures you get a fresh copy of the ATLAS vdi, just in case the recent one got damaged.
You may consider to run your LHC buffer empty and then do that project reset.

-------------------
I did that. Just must not have copied it.
2 tasks completed ok over night with the app_config.xml in place.
Will remove that later today and see what happens and do a project reset so it is fresh without the file.
ID: 41764 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 318
Credit: 4,148,677
RAC: 2,010
Message 41768 - Posted: 29 Feb 2020, 12:59:03 UTC

So...I had 2 from last night go ok.
A third this morning ran into a VM memory lock error or something like that.
You can see that in the results.

Now...to test your statement that ATLAS should run without the override preferences, I removed app_config.xml and reset the project and downloaded a new master file and a new task.

The BOINC log is this:
2/29/2020 1:39:28 PM | LHC@home | URL https://lhcathome.cern.ch/lhcathome/; Computer ID 10556945; resource share 150
2/29/2020 1:39:28 PM | | General prefs: using separate prefs for home
2/29/2020 1:39:28 PM | | Reading preferences override file
2/29/2020 1:39:28 PM | | Preferences:
2/29/2020 1:39:28 PM | | max memory usage when active: 23300.62 MB
2/29/2020 1:39:28 PM | | max memory usage when idle: 24526.97 MB
2/29/2020 1:39:29 PM | | max disk usage: 147.16 GB
2/29/2020 1:39:29 PM | | (to change preferences, visit a project web site or select Preferences in the Manager)
2/29/2020 1:39:29 PM | | Setting up project and slot directories
2/29/2020 1:39:29 PM | | Checking active tasks
2/29/2020 1:39:29 PM | | Setting up GUI RPC socket
2/29/2020 1:39:29 PM | | Checking presence of 1952 project files
2/29/2020 1:39:30 PM | | Suspending computation - user request
2/29/2020 1:40:12 PM | LHC@home | work fetch resumed by user
2/29/2020 1:40:13 PM | LHC@home | update requested by user
2/29/2020 1:40:26 PM | LHC@home | Master file download succeeded
2/29/2020 1:40:31 PM | LHC@home | Sending scheduler request: Requested by user.
2/29/2020 1:40:31 PM | LHC@home | Requesting new tasks for CPU
2/29/2020 1:40:32 PM | LHC@home | Scheduler request completed: got 1 new tasks
2/29/2020 1:40:34 PM | LHC@home | Started download of vboxwrapper_26198ab7_windows_x86_64.exe
2/29/2020 1:40:34 PM | LHC@home | Started download of ATLAS_vbox_2.00_job.xml
2/29/2020 1:40:36 PM | LHC@home | Finished download of vboxwrapper_26198ab7_windows_x86_64.exe
2/29/2020 1:40:36 PM | LHC@home | Finished download of ATLAS_vbox_2.00_job.xml
2/29/2020 1:40:36 PM | LHC@home | Started download of ATLAS_vbox_2.00_image.vdi
2/29/2020 1:40:36 PM | LHC@home | Started download of cDJKDmLaESwn9Rq4apoT9bVoABFKDmABFKDmykzUDmABFKDm8RygYn_EVNT.19609605._002487.pool.root.1
2/29/2020 1:41:22 PM | LHC@home | Finished download of cDJKDmLaESwn9Rq4apoT9bVoABFKDmABFKDmykzUDmABFKDm8RygYn_EVNT.19609605._002487.pool.root.1
2/29/2020 1:41:22 PM | LHC@home | Started download of cDJKDmLaESwn9Rq4apoT9bVoABFKDmABFKDmykzUDmABFKDm8RygYn_input.tar.gz
2/29/2020 1:41:23 PM | LHC@home | Finished download of cDJKDmLaESwn9Rq4apoT9bVoABFKDmABFKDmykzUDmABFKDm8RygYn_input.tar.gz
2/29/2020 1:41:23 PM | LHC@home | Started download of rte_cDJKDmLaESwn9Rq4apoT9bVoABFKDmABFKDmykzUDmABFKDm8RygYn.tar.gz
2/29/2020 1:41:24 PM | LHC@home | Finished download of rte_cDJKDmLaESwn9Rq4apoT9bVoABFKDmABFKDmykzUDmABFKDm8RygYn.tar.gz
2/29/2020 1:41:24 PM | LHC@home | Started download of boinc_job_script.cR4S2y
2/29/2020 1:41:25 PM | LHC@home | Finished download of boinc_job_script.cR4S2y
2/29/2020 1:43:38 PM | LHC@home | Finished download of ATLAS_vbox_2.00_image.vdi

System stats:

2/29/2020 1:39:28 PM | | Memory: 23.95 GB physical, 34.45 GB virtual
2/29/2020 1:39:28 PM | | max memory usage when active: 23300.62 MB
2/29/2020 1:39:28 PM | | max memory usage when idle: 24526.97 MB
2/29/2020 1:39:28 PM | | VirtualBox version: 6.0.18


I run these projects as well:
| Amicable Numbers | resource share 100
| Asteroids@home | resource share 100
| Einstein@Home | resource share 100
| GPUGRID | resource share 170
| Milkyway@Home resource share 100
| Moo! Wrapper | resource share 100
| PrimeGrid | resource share 100
| Rosetta@home resource share 100
| Universe@Home resource share 100
| World Community Grid resource share 100


Now you should know everything about my computer that is not shown in my profile.

BTW...I went into Wise 365 and found a way to set and exclusion for the entire folder related to BOINC.
So any future cleanup will avoid this folder, no matter if it is manual or automatic.

So only Windows or BOINC can make any changes to that directory and Windows is automatic, but should have no reason to touch it.
ID: 41768 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 318
Credit: 4,148,677
RAC: 2,010
Message 41769 - Posted: 29 Feb 2020, 13:25:37 UTC

No app_config.xml and it dies immediately with a low memory error and I know there was more than enough memory!

Now...going to put app_config.xml back and reset and try again.
ID: 41769 · Report as offensive     Reply Quote
greg_be

Send message
Joined: 28 Dec 08
Posts: 318
Credit: 4,148,677
RAC: 2,010
Message 41772 - Posted: 29 Feb 2020, 16:33:01 UTC - in response to Message 41769.  

No app_config.xml and it dies immediately with a low memory error and I know there was more than enough memory!

Now...going to put app_config.xml back and reset and try again.


This last task crashed BOINC twice and the specific error is:
'lse, errorID=HostMemoryLow message="Unable to allocate and lock memory. The virtual machine will be paused. Please close applications to free up memory or close the VM

Which I know is crap. My record is now 10 ok 33 error and 2 invalid.
Now do you see why I get frustrated?
I am going to abort this latest ATLAS task that loaded in, reset the project for the 3rd time, remove app_config.xml, clean my computer with wise and cccleaner and then try again.
If none of this works, I just give up.
I have never ever had this much trouble with a project!
ID: 41772 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,888,499
RAC: 138,302
Message 41773 - Posted: 29 Feb 2020, 16:58:20 UTC - in response to Message 41772.  

I doubt this is caused by LHC/ATLAS.
Hundreds of wingmen get it running even on computers with a similar configuration or much less RAM.
I guess ATLAS is just a victim of a local issue on that specific machine.

Could be caused by hardware like a corrupt RAM device, temperature or power supply issues (2 GPUS).
Could be caused by BIOS errors/settings.
Could be caused by a piece of software or a combination of different programs.

Much work to systematically test all of that.
ID: 41773 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,084,038
RAC: 105,553
Message 41774 - Posted: 29 Feb 2020, 18:25:56 UTC

29 Feb 2020, 9:42:08 UTC Fertig und Bestätigt 7,839.57 27,598.06 229.54 ATLAS Simulation v2.00 (vbox64_mt_mcore_atlas) windows_x86_64
29 Feb 2020, 6:21:23 UTC Fertig und Bestätigt 6,705.57 23,104.97 194.11 ATLAS Simulation v2.00 (vbox64_mt_mcore_atlas) windows_x86_64
29 Feb 2020, 2:48:15 UTC Fertig und Bestätigt 13,220.30 46,301.91 390.31 ATLAS Simulation v2.00 (vbox64_mt_mcore_atlas) windows_x86_64
This three tasks finished today successful with hitsfile, so you have something special after this time.
Can you see in Windowslog what going wrong after 9:42 UTC
ID: 41774 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : ATLAS application : Just more of the same failures


©2024 CERN