Message boards : ATLAS application : queue is empty
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 · Next

AuthorMessage
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 418
Credit: 5,667,249
RAC: 4
Message 48606 - Posted: 21 Sep 2023, 16:31:43 UTC - in response to Message 48605.  
Last modified: 21 Sep 2023, 16:31:57 UTC

If something changes so the task is pointless to continue, would you rather waste even more time or abort? It's not their fault they realised what they asked you to do is now incorrect.
ID: 48606 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 675
Credit: 43,605,640
RAC: 15,716
Message 48607 - Posted: 21 Sep 2023, 17:57:44 UTC

They could have announced it on the forum and leave the decision to abort to the crunchers. They could also have tested them on the dev-site first. But it is what it is, there's no point in crying over spilled milk.
ID: 48607 · Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 418
Credit: 5,667,249
RAC: 4
Message 48608 - Posted: 21 Sep 2023, 18:18:21 UTC - in response to Message 48607.  

They could have announced it on the forum and leave the decision to abort to the crunchers. They could also have tested them on the dev-site first. But it is what it is, there's no point in crying over spilled milk.
I wasn't aware Boinc had the facility to remotely abort a started task. Perhaps it's just other projects don't like doing so. I'm not sure why anyone would want to finish a task which is just going to be discarded at the project end.
ID: 48608 · Report as offensive     Reply Quote
Profile rbpeake

Send message
Joined: 17 Sep 04
Posts: 99
Credit: 30,689,027
RAC: 5,068
Message 48609 - Posted: 22 Sep 2023, 0:39:17 UTC - in response to Message 48608.  
Last modified: 22 Sep 2023, 0:42:32 UTC

They could have announced it on the forum and leave the decision to abort to the crunchers. They could also have tested them on the dev-site first. But it is what it is, there's no point in crying over spilled milk.
I wasn't aware Boinc had the facility to remotely abort a started task. Perhaps it's just other projects don't like doing so. I'm not sure why anyone would want to finish a task which is just going to be discarded at the project end.


Is the way to successfully complete these long units is to use more processors to accelerate completion time of the work unit? When does the "kill switch" appear, after about a day and a half of processing?

I just answered my own question. Both units were cancelled by the server at precisely 112,671.63 seconds.
Regards,
Bob P.
ID: 48609 · Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 418
Credit: 5,667,249
RAC: 4
Message 48610 - Posted: 22 Sep 2023, 0:55:17 UTC - in response to Message 48609.  

Is the way to successfully complete these long units is to use more processors to accelerate completion time of the work unit? When does the "kill switch" appear, after about a day and a half of processing?

I just answered my own question. Both units were cancelled by the server at precisely 112,671.63 seconds.
I don't understand using anything less than the 8 threads the program will use. Why do twice as many at once in double the time?
ID: 48610 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2096
Credit: 159,533,873
RAC: 140,702
Message 48611 - Posted: 22 Sep 2023, 5:29:48 UTC - in response to Message 48609.  
Last modified: 22 Sep 2023, 5:34:16 UTC

What is, when this timelimit from one and a half day, is a test in production, to check something?
Last Thursday and this Thursday, Atlas was running Tasks.
ID: 48611 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1688
Credit: 103,492,452
RAC: 118,689
Message 48613 - Posted: 22 Sep 2023, 13:20:47 UTC
Last modified: 22 Sep 2023, 13:21:26 UTC

the server status page suddenly shows about 1.700 unsent tasks.
The question is: will they also be aborted by the server after many hours, and hence it would not make sense to stark crunching them ???
Also: where can one see how many events these tasks have?
ID: 48613 · Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 418
Credit: 5,667,249
RAC: 4
Message 48614 - Posted: 22 Sep 2023, 14:07:00 UTC - in response to Message 48613.  

I'm just going to run them. I've grabbed a bunch on my 5 fastest computers. Might aswell help out with whatever they're trying to do/test/fix.
ID: 48614 · Report as offensive     Reply Quote
Toggleton

Send message
Joined: 4 Mar 17
Posts: 20
Credit: 8,152,814
RAC: 9,663
Message 48615 - Posted: 22 Sep 2023, 15:27:31 UTC - in response to Message 48613.  
Last modified: 22 Sep 2023, 15:48:53 UTC

Also: where can one see how many events these tasks have?

This is line 172 in my current /var/lib/boinc/slots/0/pilotlog.txt (not 100% sure as i did not have a shorter one of this run yet)

payload execution command:

export ATHENA_CORE_NUMBER=12;export ATHENA_PROC_NUMBER=12;export PANDA_RESOURCE='BOINC_MCORE';export FRONTIER_ID=//...cutted out..// --maxEvents=2000 --..........

2023-09-22 11:37:52,557 | WARNING | container name not defined in CRIC
2023-09-22 11:37:48,914 | INFO | executing command: export ATHENA_CORE_NUMBER=12;export ATHENA_PROC_NUMBER=12;export PANDA_RESOURCE='BOINC_MCORE';export FRONTIER_ID= //did cut quite some stuff out// --inputEVNTFile=EVNT.123456789._000123.pool.root.1 --maxEvents=2000

and when you have one running you can look at /var/lib/boinc/slots/0/PanDA_Pilot-123456789/eventLoopHeartBeat.txt there you can see how many events of that workunit are already finished.
ID: 48615 · Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 418
Credit: 5,667,249
RAC: 4
Message 48616 - Posted: 22 Sep 2023, 16:31:47 UTC
Last modified: 22 Sep 2023, 17:10:30 UTC

If the percentage complete in Boinc is correct I'm doing 3 tasks at once on my Ryzen 9 3900XT, in 3 hours. They don't look excessively large to me. Atlas has always taken about that long. They're at 50%, so Boinc could of course be insanely wrong.

Edit: one finished in 2h11m (9h37 CPU).

It's been accepted as valid, although I notice the one I did 6 days ago on the same machine was 10 times longer, maybe this is a one off: https://lhcathome.cern.ch/lhcathome/results.php?userid=55945&offset=0&show_names=0&state=4&appid=14
ID: 48616 · Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 418
Credit: 5,667,249
RAC: 4
Message 48617 - Posted: 22 Sep 2023, 18:37:51 UTC
Last modified: 22 Sep 2023, 18:39:07 UTC

Is it my imagination or has the memory requirement dropped? All running Atlases are showing 4.4-4.8GB in Boinc. And one of them is a 4 core computer, the others are 8 cores per task of the computer's 24 cores. I thought Atlas used to be 2GB + 1GB per core, making 10GB for 8 cores and 6GB for 4 cores?
ID: 48617 · Report as offensive     Reply Quote
hadron

Send message
Joined: 4 Sep 22
Posts: 57
Credit: 8,425,039
RAC: 18,614
Message 48618 - Posted: 22 Sep 2023, 19:03:56 UTC

What gives? For one Atlas task, running on 4 threads, I get a credit of nearly 13,000. For another task, running on 8 threads, I get credit of only 3200. Both tasks required nearly the same amount of CPU time, over 530,000 secs.
ID: 48618 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1688
Credit: 103,492,452
RAC: 118,689
Message 48620 - Posted: 22 Sep 2023, 19:38:48 UTC - in response to Message 48615.  

Also: where can one see how many events these tasks have?

This is line 172 in my current /var/lib/boinc/slots/0/pilotlog.txt (not 100% sure as i did not have a shorter one of this run yet)

payload execution command:

export ATHENA_CORE_NUMBER=12;export ATHENA_PROC_NUMBER=12;export PANDA_RESOURCE='BOINC_MCORE';export FRONTIER_ID=//...cutted out..// --maxEvents=2000 --..........

2023-09-22 11:37:52,557 | WARNING | container name not defined in CRIC
2023-09-22 11:37:48,914 | INFO | executing command: export ATHENA_CORE_NUMBER=12;export ATHENA_PROC_NUMBER=12;export PANDA_RESOURCE='BOINC_MCORE';export FRONTIER_ID= //did cut quite some stuff out// --inputEVNTFile=EVNT.123456789._000123.pool.root.1 --maxEvents=2000

and when you have one running you can look at /var/lib/boinc/slots/0/PanDA_Pilot-123456789/eventLoopHeartBeat.txt there you can see how many events of that workunit are already finished.
thanks for the information. Please tell me where I can find these files
ID: 48620 · Report as offensive     Reply Quote
Toggleton

Send message
Joined: 4 Mar 17
Posts: 20
Credit: 8,152,814
RAC: 9,663
Message 48621 - Posted: 22 Sep 2023, 19:45:57 UTC - in response to Message 48618.  

The credit system that it AFAIK uses is a bit weird. https://boinc.berkeley.edu/trac/wiki/CreditNew
If i remember right did the credit per Workunit move around quite a bit when a new version was released. Guess right now with more users that come back cause of new ATLAS work and the different long tasks(2000events and it sounds like the new are shorter) is the change bigger. But AFIAK did it smooth out after a few days of Atlas running with constant flow of work.
ID: 48621 · Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 418
Credit: 5,667,249
RAC: 4
Message 48622 - Posted: 22 Sep 2023, 20:04:52 UTC - in response to Message 48618.  

What gives? For one Atlas task, running on 4 threads, I get a credit of nearly 13,000. For another task, running on 8 threads, I get credit of only 3200. Both tasks required nearly the same amount of CPU time, over 530,000 secs.
Credit is never accurate. LHC is in fact banned from Gridcoin because you can cheat the credit here very easily.

On Asteroids at the moment, due to a GPU version which is slower than the CPU version, if my GPU task is compared to a CPU wingman, I get normal credit. But if my wingman is also on GPU, the credit is calculated as time multiplied by GPU speed, and I get 100 times more.

Who cares anyway, credit is just a toy, as long as you get the science done.
ID: 48622 · Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 418
Credit: 5,667,249
RAC: 4
Message 48623 - Posted: 22 Sep 2023, 20:13:35 UTC - in response to Message 48620.  

thanks for the information. Please tell me where I can find these files
The Windows equivalent is presumably C:\ProgramData\BOINC\slots\2\boinc_133f818bb0d17e70\Logs but I've no idea how you find the correct slot out of hundreds to look in.
ID: 48623 · Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 418
Credit: 5,667,249
RAC: 4
Message 48624 - Posted: 22 Sep 2023, 20:15:02 UTC - in response to Message 48621.  

The credit system that it AFAIK uses is a bit weird. https://boinc.berkeley.edu/trac/wiki/CreditNew
If i remember right did the credit per Workunit move around quite a bit when a new version was released. Guess right now with more users that come back cause of new ATLAS work and the different long tasks(2000events and it sounds like the new are shorter) is the change bigger. But AFIAK did it smooth out after a few days of Atlas running with constant flow of work.
You'd think they'd just give you 1 credit per 1000 calculations or something. It's not particle physics.
ID: 48624 · Report as offensive     Reply Quote
hadron

Send message
Joined: 4 Sep 22
Posts: 57
Credit: 8,425,039
RAC: 18,614
Message 48625 - Posted: 22 Sep 2023, 20:49:58 UTC - in response to Message 48622.  

Who cares anyway, credit is just a toy, as long as you get the science done.

Well, for one thing, if the average credit displayed on the boinc manager statistics screen is chugging away at a constant level, then suddenly drops unexpectedly, then it is time to go looking for signs of a problem -- such as tasks which suddenly start failing in droves.
Furthermore, it helps if one's contribution to the effort is appreciated by the people who run the projects. The only way I can see for such appreciation to be seen on an ongoing basis is through the credit system.
ID: 48625 · Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 418
Credit: 5,667,249
RAC: 4
Message 48626 - Posted: 22 Sep 2023, 21:14:50 UTC - in response to Message 48625.  

Furthermore, it helps if one's contribution to the effort is appreciated by the people who run the projects. The only way I can see for such appreciation to be seen on an ongoing basis is through the credit system.
Cash would be a show of appreciation. Then again there are those who ran Collatz for credit and didn't believe there was a point to the maths. "It takes all sorts to make a world" as my gran used to say.
ID: 48626 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 675
Credit: 43,605,640
RAC: 15,716
Message 48627 - Posted: 22 Sep 2023, 21:18:17 UTC - in response to Message 48617.  

Is it my imagination or has the memory requirement dropped? All running Atlases are showing 4.4-4.8GB in Boinc. And one of them is a 4 core computer, the others are 8 cores per task of the computer's 24 cores. I thought Atlas used to be 2GB + 1GB per core, making 10GB for 8 cores and 6GB for 4 cores?

The memory requirements dropped when the current version 3.01 was released in May. I remember it was talked then that 4 GB is enough for any number of cores (1...8). But I see the same reporting in Boinc as you see.
ID: 48627 · Report as offensive     Reply Quote
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 · Next

Message boards : ATLAS application : queue is empty


©2024 CERN