Thread 'New Version 263.60'

Author	Message
Laurence Project administrator Project developer Send message Joined: 20 Jun 14 Posts: 431 Credit: 256,317 RAC: 27	Message 35660 - Posted: 28 Jun 2018, 9:23:49 UTC Updates the CernVM cache and now uses OpenHTC.io, for CVMFS. It also supports multicore VMs. ID: 35660 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1992 Credit: 163,198,858 RAC: 101,503	Message 35667 - Posted: 28 Jun 2018, 12:31:47 UTC - in response to Message 35660. It also supports multicore VMs. any particular minimum RAM requirements for multicore operation (similar to ATLAS, for which exists a specific formular on basis of which the RAM is calculated, depending on the number of CPU cores) ? ID: 35667 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2760 Credit: 305,500,600 RAC: 124,780	Message 35668 - Posted: 28 Jun 2018, 12:35:43 UTC - in response to Message 35667. any particular minimum RAM requirements 730 MB instead of 630 MB according to the new entry in client_state.xml Is the project server that busy? My first host is currently downloading the new vdi file at 13 KBps. ID: 35668 · Reply Quote

bronco Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0	Message 35678 - Posted: 28 Jun 2018, 16:33:31 UTC - in response to Message 35668. Is the project server that busy? My first host is currently downloading the new vdi file at 13 KBps. Uh-oh. Is the ATLAS download problem spreading now to Theory? On my host the dl speed dropped to 125 KBps when it's normally > 1000 KBps. ID: 35678 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2760 Credit: 305,500,600 RAC: 124,780	Message 35682 - Posted: 28 Jun 2018, 17:41:58 UTC ile I'm running 2 WUs on 2 different BOINC instances but on the same host. One instance is configured to use my local proxy, the other instance goes direct (at least it "thinks" it does). Some comments Both stderr.txt files include the following message: [pre]2018-06-28 18:54:49 (82587): VBOX_JOB::parse(): unexpected text enable_screenshots_on_error/[/pre] Where is the configuration for openhtc.io? According to the tests with CMS a few months ago I would expect a line like: [pre]Guest Log: 2.4.4.0 3508 0 24896 6527 3 1 183730 10240000 2 65024 0 20 95 13 40 http://s1bnl-cvmfs.openhtc.io/cvmfs/grid.cern.ch http://<ip_of_my_local_proxy>:3128 1[/pre] Instead the logs show that CVMFS ignores my local proxy and configures a CERN repository and a CERN proxy: [pre]Guest Log: 2.4.4.0 3508 0 24896 6527 3 1 183730 10240000 2 65024 0 20 95 13 40 http://cvmfs-stratum-one.cern.ch/cvmfs/grid.cern.ch http://128.142.33.31:3125 1[/pre] ID: 35682 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1992 Credit: 163,198,858 RAC: 101,503	Message 35684 - Posted: 28 Jun 2018, 18:29:00 UTC - in response to Message 35668. Is the project server that busy? My first host is currently downloading the new vdi file at 13 KBps. hm, here the new vdi file got downloaded in about 2 minutes. ID: 35684 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1992 Credit: 163,198,858 RAC: 101,503	Message 35689 - Posted: 29 Jun 2018, 4:43:36 UTC - in response to Message 35684. hm, here the new vdi file got downloaded in about 2 minutes. edit: the aforesaid is true for only one of my computers. For the others, the downloads got stuck and are sitting there for several hours so far. What can I do to get them continued? ID: 35689 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2305 Credit: 179,727,092 RAC: 5,767	Message 35690 - Posted: 29 Jun 2018, 6:18:11 UTC see this message: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4741&postid=35632#35632 ID: 35690 · Reply Quote

gyllic Send message Joined: 9 Dec 14 Posts: 202 Credit: 2,660,212 RAC: 1	Message 35691 - Posted: 29 Jun 2018, 6:28:47 UTC - in response to Message 35682. ]Instead the logs show that CVMFS ignores my local proxy and configures a CERN repository and a CERN proxy: [pre]Guest Log: 2.4.4.0 3508 0 24896 6527 3 1 183730 10240000 2 65024 0 20 95 13 40 http://cvmfs-stratum-one.cern.ch/cvmfs/grid.cern.ch http://128.142.33.31:3125 1[/pre][/quote]same here ID: 35691 · Reply Quote

Laurence Project administrator Project developer Send message Joined: 20 Jun 14 Posts: 431 Credit: 256,317 RAC: 27	Message 35692 - Posted: 29 Jun 2018, 7:34:37 UTC - in response to Message 35682. Thanks, I will look into this later. ID: 35692 · Reply Quote

Laurence Project administrator Project developer Send message Joined: 20 Jun 14 Posts: 431 Credit: 256,317 RAC: 27	Message 35693 - Posted: 29 Jun 2018, 7:35:54 UTC - in response to Message 35692. I will investigate the download/upload issues. What are people experiencing? ID: 35693 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2305 Credit: 179,727,092 RAC: 5,767	Message 35694 - Posted: 29 Jun 2018, 7:43:56 UTC We had this Problems in Atlas the last days. Down- or upload are stalled and not finished. The Speed was downgrading. Temporary it was possible to paus Network-using in Boinc and then reconnect in boinc. But this need a manual activity. In Atlas it is running ok since last night up to now. ID: 35694 · Reply Quote

Laurence Project administrator Project developer Send message Joined: 20 Jun 14 Posts: 431 Credit: 256,317 RAC: 27	Message 35695 - Posted: 29 Jun 2018, 8:10:50 UTC - in response to Message 35693. The traffic doesn't look too high. It may be the ceph-fuse mount. ID: 35695 · Reply Quote

gyllic Send message Joined: 9 Dec 14 Posts: 202 Credit: 2,660,212 RAC: 1	Message 35699 - Posted: 29 Jun 2018, 10:09:40 UTC - in response to Message 35692. in the log it writes: 2018-06-29 08:17:46 (5244): Guest Log: [DEBUG] Detected squid proxy http://192.168.1.2:3128 but then: 2018-06-29 08:18:52 (5244): Guest Log: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE 2018-06-29 08:18:52 (5244): Guest Log: 2.4.4.0 3582 1 25760 6531 3 1 183730 10240000 2 65024 0 15 100 0 0 http://cvmfs-stratum-one.cern.ch/cvmfs/grid.cern.ch http://128.142.168.202:3125 1 https://lhcathome.cern.ch/lhcathome/result.php?resultid=199212943 ID: 35699 · Reply Quote

Laurence Project administrator Project developer Send message Joined: 20 Jun 14 Posts: 431 Credit: 256,317 RAC: 27	Message 35700 - Posted: 29 Jun 2018, 12:53:26 UTC - in response to Message 35695. I have rebooted the ceph-fuse mount. Please let me know if that helps. ID: 35700 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2760 Credit: 305,500,600 RAC: 124,780	Message 35703 - Posted: 29 Jun 2018, 14:18:38 UTC - in response to Message 35700. Please let me know if that helps. Unfortunately not. https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4592&postid=35702 https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4741&postid=35701 ID: 35703 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1561 Credit: 10,110,280 RAC: 1,268	Message 35705 - Posted: 29 Jun 2018, 14:25:35 UTC My first task with this version has troubles after 2 longer (>6hrs) suspends (VM snapshot written to disk). ===> [runRivet] Thu Jun 28 22:10:40 CEST 2018 [boinc pp jets 7000 10 - pythia8 8.170 default-MBR 100000 238] After resuming the 2nd time, I noticed that the running log reports 45800 events processed 45900 events processed but there is no pythia process busy, so also no progress in the running log. I dont get a new job for the VM and the VM is also not killed (shutdown in shared folder). I'll kill the VM myself and try to reproduce above handlings. ID: 35705 · Reply Quote

bronco Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0	Message 35712 - Posted: 30 Jun 2018, 9:23:45 UTC This new multi-core app seems like a good idea but what do we do with those looping Sherpa jobs? With the single-core tasks we simply terminated the task gracefully. For example, if we terminate a 4-core task to stop a looper then we also kill 3 other jobs that are likely not looping which means the CPU cycles spent on those 3 go to waste. Hopefullly Sherpa jobs are not being sent out under the multi-core plan and are being restricted to the single-core plan? ID: 35712 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1992 Credit: 163,198,858 RAC: 101,503	Message 35713 - Posted: 30 Jun 2018, 12:16:13 UTC what I found out was that the tasks from the new version yield considerably less credit points compared to before. Regardless whether in 1-core or multicore mode. ID: 35713 · Reply Quote

Magic Quantum Mechanic Send message Joined: 24 Oct 04 Posts: 1319 Credit: 98,794,410 RAC: 114,249	Message 35720 - Posted: 1 Jul 2018, 1:18:22 UTC Last modified: 1 Jul 2018, 1:20:45 UTC We just seem to have this luck with the server when we move a project over here but they will work here and the credits will be what you expect after you run some and the tasks times can be different length Valids. My long one so far is Run time - 20 hours 42 min 1 sec and CPU time 1 days 10 hours 27 min And shorter Valid here have been 1 hour and 3 hour run times. BUT I have ran these multi-core tests for a couple years now and they work fine and I had over 4000 Theory multi-core tasks before moving them over here. The server is being a problem and the 29th and 30th we got those typical [ERROR] Condor exited after 847s without running a job But even though the server is slow getting that new vdi (like the Atlas project) they seem to be running ok now (my last 6 here are Valids) Of course even though Theory tasks don't need the Ram like Atlas or CMS you still have to check your settings but my 8-core hosts seem happy running the 4 X 2-core tasks only using 5GB ram and 70% CPU I have tested many different multi-core from 3's up to 8-core tasks and they all ran Valids but I found for my hosts I use here run best running the 4 X 2-core tasks. Volunteer Mad Scientist For Life unbelievable are you trying to promote linux again? ID: 35720 · Reply Quote