Message boards : Theory Application : New Version 263.60
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 337
Credit: 237,918
RAC: 0
Message 35660 - Posted: 28 Jun 2018, 9:23:49 UTC

Updates the CernVM cache and now uses OpenHTC.io, for CVMFS. It also supports multicore VMs.
ID: 35660 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1284
Credit: 23,161,850
RAC: 2,258
Message 35667 - Posted: 28 Jun 2018, 12:31:47 UTC - in response to Message 35660.  

It also supports multicore VMs.
any particular minimum RAM requirements for multicore operation (similar to ATLAS, for which exists a specific formular on basis of which the RAM is calculated, depending on the number of CPU cores) ?
ID: 35667 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1479
Credit: 79,409,718
RAC: 81,052
Message 35668 - Posted: 28 Jun 2018, 12:35:43 UTC - in response to Message 35667.  

any particular minimum RAM requirements

730 MB instead of 630 MB according to the new entry in client_state.xml


Is the project server that busy?
My first host is currently downloading the new vdi file at 13 KBps.
ID: 35668 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 35678 - Posted: 28 Jun 2018, 16:33:31 UTC - in response to Message 35668.  

Is the project server that busy?
My first host is currently downloading the new vdi file at 13 KBps.


Uh-oh. Is the ATLAS download problem spreading now to Theory? On my host the dl speed dropped to 125 KBps when it's normally > 1000 KBps.
ID: 35678 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1479
Credit: 79,409,718
RAC: 81,052
Message 35682 - Posted: 28 Jun 2018, 17:41:58 UTC

Meanwhile I'm running 2 WUs on 2 different BOINC instances but on the same host.
One instance is configured to use my local proxy, the other instance goes direct (at least it "thinks" it does).


Some comments

Both stderr.txt files include the following message:
2018-06-28 18:54:49 (82587): VBOX_JOB::parse(): unexpected text enable_screenshots_on_error/



Where is the configuration for openhtc.io?
According to the tests with CMS a few months ago I would expect a line like:
Guest Log: 2.4.4.0 3508 0 24896 6527 3 1 183730 10240000 2 65024 0 20 95 13 40 http://s1bnl-cvmfs.openhtc.io/cvmfs/grid.cern.ch http://<ip_of_my_local_proxy>:3128 1

Instead the logs show that CVMFS ignores my local proxy and configures a CERN repository and a CERN proxy:
Guest Log: 2.4.4.0 3508 0 24896 6527 3 1 183730 10240000 2 65024 0 20 95 13 40 http://cvmfs-stratum-one.cern.ch/cvmfs/grid.cern.ch http://128.142.33.31:3125 1
ID: 35682 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1284
Credit: 23,161,850
RAC: 2,258
Message 35684 - Posted: 28 Jun 2018, 18:29:00 UTC - in response to Message 35668.  

Is the project server that busy?
My first host is currently downloading the new vdi file at 13 KBps.
hm, here the new vdi file got downloaded in about 2 minutes.
ID: 35684 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1284
Credit: 23,161,850
RAC: 2,258
Message 35689 - Posted: 29 Jun 2018, 4:43:36 UTC - in response to Message 35684.  

hm, here the new vdi file got downloaded in about 2 minutes.
edit: the aforesaid is true for only one of my computers.
For the others, the downloads got stuck and are sitting there for several hours so far. What can I do to get them continued?
ID: 35689 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 962
Credit: 34,014,074
RAC: 8,140
Message 35690 - Posted: 29 Jun 2018, 6:18:11 UTC

ID: 35690 · Report as offensive     Reply Quote
gyllic

Send message
Joined: 9 Dec 14
Posts: 202
Credit: 2,533,390
RAC: 16
Message 35691 - Posted: 29 Jun 2018, 6:28:47 UTC - in response to Message 35682.  

Instead the logs show that CVMFS ignores my local proxy and configures a CERN repository and a CERN proxy:
Guest Log: 2.4.4.0 3508 0 24896 6527 3 1 183730 10240000 2 65024 0 20 95 13 40 http://cvmfs-stratum-one.cern.ch/cvmfs/grid.cern.ch http://128.142.33.31:3125 1
same here
ID: 35691 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 337
Credit: 237,918
RAC: 0
Message 35692 - Posted: 29 Jun 2018, 7:34:37 UTC - in response to Message 35682.  

Thanks, I will look into this later.
ID: 35692 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 337
Credit: 237,918
RAC: 0
Message 35693 - Posted: 29 Jun 2018, 7:35:54 UTC - in response to Message 35692.  

I will investigate the download/upload issues. What are people experiencing?
ID: 35693 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 962
Credit: 34,014,074
RAC: 8,140
Message 35694 - Posted: 29 Jun 2018, 7:43:56 UTC

We had this Problems in Atlas the last days.
Down- or upload are stalled and not finished.
The Speed was downgrading.
Temporary it was possible to paus Network-using in Boinc and then reconnect in boinc.
But this need a manual activity.
In Atlas it is running ok since last night up to now.
ID: 35694 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 337
Credit: 237,918
RAC: 0
Message 35695 - Posted: 29 Jun 2018, 8:10:50 UTC - in response to Message 35693.  

The traffic doesn't look too high. It may be the ceph-fuse mount.
ID: 35695 · Report as offensive     Reply Quote
gyllic

Send message
Joined: 9 Dec 14
Posts: 202
Credit: 2,533,390
RAC: 16
Message 35699 - Posted: 29 Jun 2018, 10:09:40 UTC - in response to Message 35692.  

in the log it writes:

2018-06-29 08:17:46 (5244): Guest Log: [DEBUG] Detected squid proxy http://192.168.1.2:3128

but then:

2018-06-29 08:18:52 (5244): Guest Log: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE
2018-06-29 08:18:52 (5244): Guest Log: 2.4.4.0 3582 1 25760 6531 3 1 183730 10240000 2 65024 0 15 100 0 0 http://cvmfs-stratum-one.cern.ch/cvmfs/grid.cern.ch http://128.142.168.202:3125 1

https://lhcathome.cern.ch/lhcathome/result.php?resultid=199212943
ID: 35699 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 337
Credit: 237,918
RAC: 0
Message 35700 - Posted: 29 Jun 2018, 12:53:26 UTC - in response to Message 35695.  

I have rebooted the ceph-fuse mount. Please let me know if that helps.
ID: 35700 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1479
Credit: 79,409,718
RAC: 81,052
Message 35703 - Posted: 29 Jun 2018, 14:18:38 UTC - in response to Message 35700.  

ID: 35703 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 962
Credit: 6,350,817
RAC: 548
Message 35705 - Posted: 29 Jun 2018, 14:25:35 UTC

My first task with this version has troubles after 2 longer (>6hrs) suspends (VM snapshot written to disk).
===> [runRivet] Thu Jun 28 22:10:40 CEST 2018 [boinc pp jets 7000 10 - pythia8 8.170 default-MBR 100000 238]
After resuming the 2nd time, I noticed that the running log reports
45800 events processed
45900 events processed

but there is no pythia process busy, so also no progress in the running log.
I dont get a new job for the VM and the VM is also not killed (shutdown in shared folder).
I'll kill the VM myself and try to reproduce above handlings.
ID: 35705 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 35712 - Posted: 30 Jun 2018, 9:23:45 UTC

This new multi-core app seems like a good idea but what do we do with those looping Sherpa jobs? With the single-core tasks we simply terminated the task gracefully. For example, if we terminate a 4-core task to stop a looper then we also kill 3 other jobs that are likely not looping which means the CPU cycles spent on those 3 go to waste.

Hopefullly Sherpa jobs are not being sent out under the multi-core plan and are being restricted to the single-core plan?
ID: 35712 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1284
Credit: 23,161,850
RAC: 2,258
Message 35713 - Posted: 30 Jun 2018, 12:16:13 UTC

what I found out was that the tasks from the new version yield considerably less credit points compared to before. Regardless whether in 1-core or multicore mode.
ID: 35713 · Report as offensive     Reply Quote
Profile MAGIC Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 949
Credit: 40,374,622
RAC: 4,955
Message 35720 - Posted: 1 Jul 2018, 1:18:22 UTC
Last modified: 1 Jul 2018, 1:20:45 UTC

We just seem to have this luck with the server when we move a project over here but they will work here and the credits will be what you expect after you run some and the tasks times can be different length Valids.

My long one so far is Run time - 20 hours 42 min 1 sec and CPU time 1 days 10 hours 27 min
And shorter Valid here have been 1 hour and 3 hour run times.

BUT I have ran these multi-core tests for a couple years now and they work fine and I had over 4000 Theory multi-core tasks before moving them over here.

The server is being a problem and the 29th and 30th we got those typical *[ERROR] Condor exited after 847s without running a job*

But even though the server is slow getting that new vdi (like the Atlas project) they seem to be running ok now (my last 6 here are Valids)

Of course even though Theory tasks don't need the Ram like Atlas or CMS you still have to check your settings but my 8-core hosts seem happy running the 4 X 2-core tasks only using 5GB ram and 70% CPU

I have tested many different multi-core from 3's up to 8-core tasks and they all ran Valids but I found for my hosts I use here run best running the 4 X 2-core tasks.
Volunteer Mad Scientist For Life
ID: 35720 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Theory Application : New Version 263.60


©2020 CERN