Message boards :
Theory Application :
New Version 263.60
Message board moderation
Author | Message |
---|---|
![]() Send message Joined: 20 Jun 14 Posts: 407 Credit: 238,712 RAC: 0 ![]() ![]() |
Updates the CernVM cache and now uses OpenHTC.io, for CVMFS. It also supports multicore VMs. |
Send message Joined: 18 Dec 15 Posts: 1908 Credit: 144,946,868 RAC: 82,780 ![]() ![]() ![]() |
It also supports multicore VMs.any particular minimum RAM requirements for multicore operation (similar to ATLAS, for which exists a specific formular on basis of which the RAM is calculated, depending on the number of CPU cores) ? |
![]() Send message Joined: 15 Jun 08 Posts: 2683 Credit: 286,886,839 RAC: 54,793 ![]() ![]() |
any particular minimum RAM requirements 730 MB instead of 630 MB according to the new entry in client_state.xml Is the project server that busy? My first host is currently downloading the new vdi file at 13 KBps. |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 ![]() ![]() |
Is the project server that busy? Uh-oh. Is the ATLAS download problem spreading now to Theory? On my host the dl speed dropped to 125 KBps when it's normally > 1000 KBps. |
![]() Send message Joined: 15 Jun 08 Posts: 2683 Credit: 286,886,839 RAC: 54,793 ![]() ![]() |
Meanwhile I'm running 2 WUs on 2 different BOINC instances but on the same host. One instance is configured to use my local proxy, the other instance goes direct (at least it "thinks" it does). Some comments Both stderr.txt files include the following message: 2018-06-28 18:54:49 (82587): VBOX_JOB::parse(): unexpected text enable_screenshots_on_error/ Where is the configuration for openhtc.io? According to the tests with CMS a few months ago I would expect a line like: Guest Log: 2.4.4.0 3508 0 24896 6527 3 1 183730 10240000 2 65024 0 20 95 13 40 http://s1bnl-cvmfs.openhtc.io/cvmfs/grid.cern.ch http://<ip_of_my_local_proxy>:3128 1 Instead the logs show that CVMFS ignores my local proxy and configures a CERN repository and a CERN proxy: Guest Log: 2.4.4.0 3508 0 24896 6527 3 1 183730 10240000 2 65024 0 20 95 13 40 http://cvmfs-stratum-one.cern.ch/cvmfs/grid.cern.ch http://128.142.33.31:3125 1 |
Send message Joined: 18 Dec 15 Posts: 1908 Credit: 144,946,868 RAC: 82,780 ![]() ![]() ![]() |
Is the project server that busy?hm, here the new vdi file got downloaded in about 2 minutes. |
Send message Joined: 18 Dec 15 Posts: 1908 Credit: 144,946,868 RAC: 82,780 ![]() ![]() ![]() |
hm, here the new vdi file got downloaded in about 2 minutes.edit: the aforesaid is true for only one of my computers. For the others, the downloads got stuck and are sitting there for several hours so far. What can I do to get them continued? |
Send message Joined: 2 May 07 Posts: 2277 Credit: 178,709,076 RAC: 100,489 ![]() ![]() |
|
Send message Joined: 9 Dec 14 Posts: 202 Credit: 2,539,793 RAC: 175 ![]() ![]() |
Instead the logs show that CVMFS ignores my local proxy and configures a CERN repository and a CERN proxy:same here |
![]() Send message Joined: 20 Jun 14 Posts: 407 Credit: 238,712 RAC: 0 ![]() ![]() |
Thanks, I will look into this later. |
![]() Send message Joined: 20 Jun 14 Posts: 407 Credit: 238,712 RAC: 0 ![]() ![]() |
I will investigate the download/upload issues. What are people experiencing? |
Send message Joined: 2 May 07 Posts: 2277 Credit: 178,709,076 RAC: 100,489 ![]() ![]() |
We had this Problems in Atlas the last days. Down- or upload are stalled and not finished. The Speed was downgrading. Temporary it was possible to paus Network-using in Boinc and then reconnect in boinc. But this need a manual activity. In Atlas it is running ok since last night up to now. |
![]() Send message Joined: 20 Jun 14 Posts: 407 Credit: 238,712 RAC: 0 ![]() ![]() |
The traffic doesn't look too high. It may be the ceph-fuse mount. |
Send message Joined: 9 Dec 14 Posts: 202 Credit: 2,539,793 RAC: 175 ![]() ![]() |
in the log it writes: 2018-06-29 08:17:46 (5244): Guest Log: [DEBUG] Detected squid proxy http://192.168.1.2:3128 but then: 2018-06-29 08:18:52 (5244): Guest Log: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE 2018-06-29 08:18:52 (5244): Guest Log: 2.4.4.0 3582 1 25760 6531 3 1 183730 10240000 2 65024 0 15 100 0 0 http://cvmfs-stratum-one.cern.ch/cvmfs/grid.cern.ch http://128.142.168.202:3125 1 https://lhcathome.cern.ch/lhcathome/result.php?resultid=199212943 |
![]() Send message Joined: 20 Jun 14 Posts: 407 Credit: 238,712 RAC: 0 ![]() ![]() |
I have rebooted the ceph-fuse mount. Please let me know if that helps. |
![]() Send message Joined: 15 Jun 08 Posts: 2683 Credit: 286,886,839 RAC: 54,793 ![]() ![]() |
Please let me know if that helps. Unfortunately not. https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4592&postid=35702 https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4741&postid=35701 |
Send message Joined: 14 Jan 10 Posts: 1461 Credit: 9,859,193 RAC: 2,531 ![]() ![]() |
My first task with this version has troubles after 2 longer (>6hrs) suspends (VM snapshot written to disk). ===> [runRivet] Thu Jun 28 22:10:40 CEST 2018 [boinc pp jets 7000 10 - pythia8 8.170 default-MBR 100000 238] After resuming the 2nd time, I noticed that the running log reports 45800 events processed 45900 events processed but there is no pythia process busy, so also no progress in the running log. I dont get a new job for the VM and the VM is also not killed (shutdown in shared folder). I'll kill the VM myself and try to reproduce above handlings. |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 ![]() ![]() |
This new multi-core app seems like a good idea but what do we do with those looping Sherpa jobs? With the single-core tasks we simply terminated the task gracefully. For example, if we terminate a 4-core task to stop a looper then we also kill 3 other jobs that are likely not looping which means the CPU cycles spent on those 3 go to waste. Hopefullly Sherpa jobs are not being sent out under the multi-core plan and are being restricted to the single-core plan? |
Send message Joined: 18 Dec 15 Posts: 1908 Credit: 144,946,868 RAC: 82,780 ![]() ![]() ![]() |
what I found out was that the tasks from the new version yield considerably less credit points compared to before. Regardless whether in 1-core or multicore mode. |
![]() ![]() Send message Joined: 24 Oct 04 Posts: 1234 Credit: 79,792,572 RAC: 76,274 ![]() ![]() |
We just seem to have this luck with the server when we move a project over here but they will work here and the credits will be what you expect after you run some and the tasks times can be different length Valids. My long one so far is Run time - 20 hours 42 min 1 sec and CPU time 1 days 10 hours 27 min And shorter Valid here have been 1 hour and 3 hour run times. BUT I have ran these multi-core tests for a couple years now and they work fine and I had over 4000 Theory multi-core tasks before moving them over here. The server is being a problem and the 29th and 30th we got those typical *[ERROR] Condor exited after 847s without running a job* But even though the server is slow getting that new vdi (like the Atlas project) they seem to be running ok now (my last 6 here are Valids) Of course even though Theory tasks don't need the Ram like Atlas or CMS you still have to check your settings but my 8-core hosts seem happy running the 4 X 2-core tasks only using 5GB ram and 70% CPU I have tested many different multi-core from 3's up to 8-core tasks and they all ran Valids but I found for my hosts I use here run best running the 4 X 2-core tasks. Volunteer Mad Scientist For Life ![]() |
©2025 CERN