21) Message boards : ATLAS application : Uploads of finished tasks not possible since last night (Message 33379)
Posted 15 Dec 2017 by PHILIPPE
Post:
Why do you delete the partial uploads manually ?
I remember that David Cameron did a script to fix this issue in the past.
Its script has just to be adapted to the new upload server...
22) Message boards : News : BOINC server update Thursday (Message 33250)
Posted 8 Dec 2017 by PHILIPPE
Post:
A further detail :
The word "LHC@home" is not correctly translated in the french , catalan and japanese versions, in the first item of the home page which describes the project .
Only appears " %1 " in the item title.

Curiously , the catalan and japanese versions are mixed with french version...
23) Message boards : News : BOINC server update Thursday (Message 33225)
Posted 7 Dec 2017 by PHILIPPE
Post:
The new appearance of the site is nice ,
but maybe in order to improve the reading and a better usability :
1°) can you try to right-align the numbers on the status server page (or medium-align for time execution) ?
2°) can you put a direct link to vlhc-dev site and symmetrically a link to this site from the vlhc-dev site (easier to browse between them) ?
24) Message boards : Number crunching : Can't install Vbox on Mac High Sierra (Message 33119)
Posted 22 Nov 2017 by PHILIPPE
Post:
maybe , look at this in the forum of virtualbox.
It may help you.
25) Questions and Answers : Windows : No new LHC units for a long time (Message 33094)
Posted 20 Nov 2017 by PHILIPPE
Post:
when you look to your applications , the jobs done are only sixtrack work units.
you never did any work units for other subprojects (atlas , theory,cms,lhcb,...).

So , maybe , be sure you have selected them in your project preferences.And choose
Max # jobs 1
Max # CPUs 1
for the first time.

A piece of advice :
The best way of beginning a new project is to suspend the others , to avoid compatibilty problems , and then re execute them when initialization has succeeded.
26) Message boards : ATLAS application : Only getting 1 ATLAS WU. "No tasks are available for ATLAS Simulation" (Message 33031)
Posted 9 Nov 2017 by PHILIPPE
Post:
For the ones who aren't good in foreign languages , there is a good site for translation , this is deepl.com.
It is not as efficient as a human traductor , but more than usual and automatic translators like google traduction.
It uses a neural networks to determine the surrounding of a text ,globally, before providing a translation.
The number of tongues it is able to translate is still going to increase.
To follow...
27) Questions and Answers : Windows : Problem with VM jobs (Message 32904)
Posted 25 Oct 2017 by PHILIPPE
Post:
In your logs , you can see the mention :

BOINC will be notified that it needs to clean up the environment.
This is a temporary problem and so this job will be rescheduled for another time.

You have to open virtualbox manager and delete the virtual machines which appears with a red mark ( that means : unreachable, the vboxwrapper can't link anymore with virtualbox)
2017-10-23 23:25:53 (4840): ERROR: Vboxwrapper lost communication with VirtualBox, rescheduling task for a later time.

It may happen when there is an error produced because the vm has not the time to save or stop its state properly.
So , to avoid serial errors ,
2017-10-23 23:31:53 (3004): Error creating VirtualBox instance! rc = 0x80004002

you have to sometimes manually delete the vms in virtualbox manager by right-clicking on it (but only the bad ones in a bad state).

If you have trouble with atlas tasks , because ,you seem to abort them volontarily , you can go to your web preferences and uncheck atlas jobs in your applications.
28) Message boards : CMS Application : CMS Tasks Failing (Message 32860)
Posted 20 Oct 2017 by PHILIPPE
Post:
just for information :

i had this error on this task:
2017-10-19 19:41:07 (3932): VM Completion Message: Could not connect to Condor server on port 9618

It occured just after a reboot done after a big update of windows.
I have windows home version but perhaps other versions are concerned too.
My new image windows is now:
Microsoft Windows 10

Core x64 Edition, (10.00.16299.00)
29) Message boards : LHCb Application : LHCb application detects wrong Boinc version (Message 32761)
Posted 10 Oct 2017 by PHILIPPE
Post:
Same for me.But for theory task.
task Theory_5946_1507460368.603779_0

<core_client_version>7.8.2</core_client_version>
<![CDATA[
<stderr_txt>
.....
2017-10-01 10:00:36 (9552): Detected: vboxwrapper 26197
2017-10-01 10:00:36 (9552): Detected: BOINC client v7.7


What is amazing is the fact that this wu , reading the stderr_txt file ,seems to last during many days...
<core_client_version>7.8.2</core_client_version>
<![CDATA[
<stderr_txt>
d: Heartbeat check (file: 'heartbeat' every 1200.000000 seconds)
2017-09-30 09:50:54 (780): Successfully copied 'init_data.xml' to the shared directory.
.....
2017-10-10 11:00:18 (8180): Removing virtual disk drive from VirtualBox.
11:01:10 (8180): called boinc_finish(0)

</stderr_txt>
]]>


I think that this stderr_txt file has not been deleted after each wu processed.

In the log , we can see different kind of tasks :
2017-09-30 09:54:37 (780): Guest Log: [INFO] Running the fast benchmark.

2017-09-30 09:55:59 (780): Guest Log: [INFO] Machine performance 7.09 HEPSEC06

2017-09-30 09:55:59 (780): Guest Log: [INFO] LHCb application starting. Check log files.
.....
2017-10-03 07:41:44 (6880): Guest Log: [INFO] Running the fast benchmark.

2017-10-03 07:43:30 (6880): Guest Log: [INFO] Machine performance 4.53 HEPSEC06

2017-10-03 07:43:30 (6880): Guest Log: [INFO] LHCb application starting. Check log files.
.....
2017-10-05 07:33:47 (2176): Guest Log: [INFO] Running the fast benchmark.

2017-10-05 07:35:04 (2176): Guest Log: [INFO] Machine performance 6.86 HEPSEC06

2017-10-05 07:35:04 (2176): Guest Log: [INFO] Theory application starting. Check log files
.....
2017-10-07 10:30:57 (10360): Guest Log: [INFO] Running the fast benchmark.

2017-10-07 10:32:15 (10360): Guest Log: [INFO] Machine performance 6.59 HEPSEC06

2017-10-07 10:32:15 (10360): Guest Log: [INFO] LHCb application starting. Check log files.
.....
2017-10-08 09:52:46 (10556): Guest Log: [INFO] Running the fast benchmark.

2017-10-08 09:53:52 (10556): Guest Log: [INFO] Machine performance 7.89 HEPSEC06

2017-10-08 09:53:52 (10556): Guest Log: [INFO] LHCb application starting. Check log files.
.....
2017-10-09 07:31:58 (11500): Guest Log: [INFO] Running the fast benchmark.

2017-10-09 07:33:13 (11500): Guest Log: [INFO] Machine performance 6.83 HEPSEC06

2017-10-09 07:33:13 (11500): Guest Log: [INFO] Theory application starting. Check log files.


I notice that the performance of my host is rather variable , from 4.53 to 7.89 ,with the dynamic benchmark , but i didn't change anything in my configuration.
Is it normal ?

BTW:The last task was a sherpa job with a endless looping (more than 12 hours)-(updating display...).
30) Message boards : Theory Application : Feedback for multicore theory wus (Message 32588)
Posted 2 Oct 2017 by PHILIPPE
Post:
Hi Marmot, these settings were adapted to small computers , like mine .

If you have a stronger configuration , you can use higher values, of course.

It 's possible to run theory tasks with lower ram footprint than using single cores.In fact it's also possible to run single core task with less amount of ram than the 630 MB announced.But it's not a good way to run properly the wus (more jobs with endless loopings unuseful for the project).

I tried to do the same experience for LHCb wus but i understood it 's much more complicated and Laurence explained why :message.

The specification for LHC HEP applications is 2GB per core. This value is used to build the internal computing infrastructure. The VMs also have 1GB of swap configured. For the theory application, others have done similar tests and we arrived a sensible value for the memory. We have to be careful that the observations are true for all the jobs that LHCb may wish to run. The 2250Mb was originally requested by LHCb.


This is the hep software foundation which decides the roadmap to follow in order to work in the same way.Other message

Your app_config.xml will overwrite the web setting in your web preferences.

Here is an example : to allow one 4-cores theory task or 4 sixtract tasks or only one LHCB or CMS or ATLAS single core tasks.

I don't see your computer.It 's hidden...

<app_config>

<project_max_concurrent>4</project_max_concurrent>

<app>
<name>ATLAS</name>
<max_concurrent>1</max_concurrent>
<fraction_done_exact/>
</app>
<app>
<name>CMS</name>
<max_concurrent>1</max_concurrent>
<fraction_done_exact/>
</app>
<app>
<name>LHCb</name>
<max_concurrent>1</max_concurrent>
<fraction_done_exact/>
</app>
<app>
<name>sixtrack</name>
<max_concurrent>4</max_concurrent>
<fraction_done_exact/>
</app>
<app>
<name>Theory</name>
<max_concurrent>1</max_concurrent>
<fraction_done_exact/>
</app>
<app_version>
<app_name>ATLAS</app_name>
<plan_class>vbox64_mt_mcore_atlas</plan_class>
<avg_ncpus>1.000000</avg_ncpus>
<cmdline>--nthreads 1.000000</cmdline>
<cmdline>--memory_size_mb 3400</cmdline>
</app_version>
<app_version>
<app_name>CMS</app_name>
<plan_class>vbox64</plan_class>
<avg_ncpus>1.000000</avg_ncpus>
<cmdline>--nthreads 1.000000</cmdline>
<cmdline>--memory_size_mb 2048</cmdline>
</app_version>
<app_version>
<app_name>LHCb</app_name>
<plan_class>vbox64</plan_class>
<avg_ncpus>1.000000</avg_ncpus>
<cmdline>--nthreads 1.000000</cmdline>
<cmdline>--memory_size_mb 2048</cmdline>
</app_version>
<app_version>
<app_name>Theory</app_name>
<plan_class>vbox64</plan_class>
<avg_ncpus>4.000000</avg_ncpus>
<cmdline>--nthreads 4.000000</cmdline>
<cmdline>--memory_size_mb 1410</cmdline>
</app_version>

</app_config>


It's up to you to adapt at your needs.
By the way , record it as an xml file with the notepad software and not as a text file.It' s important.The result is not the same...

The app_config.xml file takes effects only before a wu is downloaded in your boinc client.So you have to wait that your previous wus downloaded finish before.

I don't change anything in the OS configuration except to enable virtualization in the bios settings.
31) Message boards : Number crunching : How to run 4 ATLAS + 4 CMS tasks? (Message 30810)
Posted 16 Jun 2017 by PHILIPPE
Post:
Just another adress site (overclock.net) for the people who want informations on how to set multiple boinc instances on a same local host/remote host in order to keep a better control and allowing more possibilities in their preferences.

It's a little tricky but why not ...(interesting mainly for medium and big host)

It"s an english guide for both windows and linux and there is a short presence of the boinctasks creator : Efmer at the end of the thread which gives advice anfd informs about his site and his cloud site.
32) Questions and Answers : Wish list : update adress project in boinc client (Message 30782)
Posted 14 Jun 2017 by PHILIPPE
Post:
Thanks for your answer.
33) Message boards : ATLAS application : Successful Atlas (Message 30768)
Posted 13 Jun 2017 by PHILIPPE
Post:
@ Tullio :

There is maybe another solution , easier to save your situation.

Instead of looking in your host , change your account.

1°) Remove your LHC project from Boinc client.
2°) Add a new project "LHC" in your boinc client , using this project adress :
https://lhcathome.cern.ch/lhcathome/
And then create a new account
3°) Select the apps in the new web preference account and ask for job.

If the tasks succeed then you know this is your old account the problem , and not your host.
If they fail , this is your computer the problem and not your old account.

After this test ,you may ask the site admins to merge the old account with the new one and restore your old credits in the new account (But ask them before)...
34) Message boards : ATLAS application : Successful Atlas (Message 30763)
Posted 12 Jun 2017 by PHILIPPE
Post:
Tullio , You keep a mystery...
Apparently , no one is able to help you...
But to maximize the chance someone gets the trick , i record all the informations , given in your log , with this post.

the messages in your logs seem to be different from one theory task to another but there are some details interesting.

With error :206 (0x000000CE) EXIT_INIT_FAILURE

2017-06-12 10:13:42 (17522): Guest Log: [INFO] Reading volunteer information
2017-06-12 10:13:45 (17522): Guest Log: [INFO] Volunteer: tullio (96166) Host: 10454176
2017-06-12 10:13:45 (17522): Guest Log: [INFO] VMID: 11a6f992-647c-415d-969d-d7d3ca99f9ef
2017-06-12 10:13:48 (17522): Guest Log: [INFO] Using weak account key.
2017-06-12 10:13:48 (17522): Guest Log: [INFO] Requesting an X509 credential from LHC@home
2017-06-12 10:14:06 (17522): Guest Log: [INFO] Theory application starting. Check log files.
2017-06-12 10:14:13 (17522): Guest Log: [DEBUG] HTCondor ping
2017-06-12 10:14:15 (17522): Guest Log: [DEBUG] 0
2017-06-12 10:57:18 (17522): Guest Log: [ERROR] Condor exited after 2587s without running a job.
2017-06-12 10:57:18 (17522): Guest Log: [INFO] Shutting Down.
2017-06-12 10:57:18 (17522): VM Completion File Detected.
2017-06-12 10:57:18 (17522): VM Completion Message: Condor exited after 2587s without running a job.


Documentation on weak account is here.

With error : 194 (0x000000C2) EXIT_ABORTED_BY_CLIENT
the previous message disappear but there are :

2017-06-12 10:59:28 (12763): Setting checkpoint interval to 600 seconds. (Higher value of (Preference: 60 seconds) or (Vbox_job.xml: 600 seconds))
2017-06-12 10:59:28 (12763): Guest Log: BIOS: Boot : bseqnr=1, bootseq=0032
2017-06-12 10:59:28 (12763): Guest Log: BIOS: Booting from Hard Disk...
2017-06-12 10:59:31 (12763): Guest Log: BIOS: KBD: unsupported int 16h function 03
2017-06-12 10:59:31 (12763): Guest Log: BIOS: AX=0305 BX=0000 CX=0000 DX=0000
2017-06-12 11:09:22 (12763): VM Heartbeat file specified, but missing.
2017-06-12 11:09:22 (12763): VM Heartbeat file specified, but missing file system status. (errno = '2')
2017-06-12 11:09:22 (12763): Capturing screenshot.
2017-06-12 11:09:23 (12763): Screenshot completed.
2017-06-12 11:09:23 (12763): Powering off VM.


and

Command: VBoxManage -q showvminfo "boinc_7a8b17380ef80d34" --machinereadable
Exit Code: -2135228415
Output:
VBoxManage: error: Could not find a registered machine named 'boinc_7a8b17380ef80d34'
VBoxManage: error: Details: code VBOX_E_OBJECT_NOT_FOUND (0x80bb0001), component VirtualBoxWrap, interface IVirtualBox, callee nsISupports
VBoxManage: error: Context: "FindMachine(Bstr(VMNameOrUuid).raw(), machine.asOutParam())" at line 2781 of file VBoxManageInfo.cpp


and

2017-06-12 11:09:22 (12763):
Command: VBoxManage -q controlvm "boinc_7a8b17380ef80d34" keyboardputscancode 0x39
Exit Code: 0
Output:
VBoxManage: error: Error: '0x39' is not a hex byte!


some docs are here for the command line controlvm :
VBoxManage controlvm <vm> keyboardputscancode <hex> [<hex>...] Sends commands using keycodes to the VM. Keycodes are documented in the public domain, e.g. http://www.win.tue.nl/~aeb/linux/kbd/scancodes-1.html

It speaks about translation of keyboard key but it's rather difficult to understand.
----------------------------------------------------------------------------------------------------------------------------------------------------------------
To give us more details :
Why did you have the need to use a weak account in the past?
Do you still need it in the present ?
Are you sure this is since the consolidation , or did you join since the consolidation another project which may interfere with LHC project (using another virtualizer...)

I'm not able to help you but maybe someone else may do it if he gathers all the pieces of information , given here and have encountered a similar case in his personal experience.

Maybe chance for you...
35) Questions and Answers : Wish list : update adress project in boinc client (Message 30761)
Posted 12 Jun 2017 by PHILIPPE
Post:
Answering to a cruncher , i noticed when you want to add the LHC project in your boinc client that the adress is not the same as the one in the main page of the LHC site :

https://lhcathome.cern.ch/lhcathome/

Would it be possible to modify it, by default , in boinc , avoiding useless troubles ?

Furthermore ,the former project atlas@home is still present in the list of possible projects.

Would it be possible to remove it now that the consolidation seems to be sufficiently reliable ?
36) Questions and Answers : Unix/Linux : I have 3 projects in progress for a year, but the LHC project takes months without new tasks, I do not know what happens (Message 30740)
Posted 11 Jun 2017 by PHILIPPE
Post:
Maybe , change the adress since the consolidation of all the sub-projects :

Remove the project and join a new time using this adress :

https://lhcathome.cern.ch/lhcathome/

then go to your web preference account and select the applications you want to run.
Validate.

And update the project in your boinc client.
37) Message boards : LHCb Application : Influence of ram memory allocation for small hosts (Message 30676)
Posted 6 Jun 2017 by PHILIPPE
Post:

Some questions:
Does the project provide jobs , according to the host's resources ?



The internal jobs or the BOINC task?

What do you mean exactly?
LHCb is sending Monte Carlo simulation jobs to Boinc without special selection with respect to other computing nodes. In some sense we 'sort' the load according to the computational power of the host by tuning the number of events to be generated (without exceeding the 'time-slot')

In fact , looking at the results for each test with a different value of ram,i didn't notice any change in the behavior of the host (except for 2250MB RAM) and in the duration of the internal job.There is a great variability or / and volatibility in the duration.
So i wondered why ?
In my mind , less ram means more swap ,so a higher duration for the job inside the VM.

But as Laurence said :
Yes, as long as the RAM is sufficient, extra RAM will not affect the job execution time.

I understand better,even if some swap was used (above all at the end of the job ,just before the upload of the result, so a very short time).

The 2Gb requirement (+ swap) should accomodate all kind of loads.

Yes , i understand this is a HEP specification ,
but do you allow some volunteers with small hosts (lower than 4 GBytes) to use less ram memory to execute LHCb jobs ?

(Maybe ratio of failure jobs would increase in a small part, but the number of slots running also.) (2 * 1250 = 2500 MB is possible for a small host)

Or do you advise some of us to use a 2-core with 2500 MB in this case (more data share)?

And finally how much is the HEP specification for multi-cores work units with LHCb?

5orry , to be so inquisitive...But it's important...to do good science...
38) Message boards : LHCb Application : Influence of ram memory allocation for small hosts (Message 30644)
Posted 5 Jun 2017 by PHILIPPE
Post:
ERRATUM :
1750 Mbytes :
Average time job = 3h34 (3h03-3h13-4h27-.h..)
Memory used inside VM = 57% and not 38%

Sorry for the mistake...
39) Message boards : LHCb Application : Influence of ram memory allocation for small hosts (Message 30640)
Posted 5 Jun 2017 by PHILIPPE
Post:
I tried to experiment LHCb wus with different amount of memory ram allocated.
I looked at the duration of the internaljobs and at the percentage of ram memory used inside the VM.
After some tests on these single core wus, here are the results :

1250 Mbytes :
Average time job = 3h12 (3h01-2h55-3h18-3h36)
Memory used inside VM = 70%
1500 Mbytes :
Average time job = 3h45 (3h53-3h50-3h33-.h..)
Memory used inside VM = 61%
1750 Mbytes :
Average time job = 3h34 (3h03-3h13-4h27-.h..)
Memory used inside VM = 38%
2000 Mbytes :
Average time job = 3h06 (3h33-3h40-3h15-1h56)
Memory used inside VM = 47%
2250 Mbytes :
Average time job = 3h23 (4h12-2h53-3h30-2h58)
Memory used inside VM = 47%

First discovery :
In all the cases , the behavior of the computer was good and responsive.
Even with less ram , the wus ended correctly.
The requirement of 2048 MBytes ram memory set , by default , doesn't seem appropriate.

Second discovery :
The job duration isn't correlated with the amount of ram allocated.
The influence of memory ram allocated is not so significant.
(I have a HDD , not a SSD , so the discrepancy should be higher, accesses to the disk are more numerous with less ram and consequently more swap)

Some questions:
Does the project provide jobs , according to the host's ressources ?
(that is to say the level of difficulty of jobs is sorted before being sent to a particular host)

Does the requirement of 2048 Mbytes , still a necessity ,looking at the results ?
(No error occured during the tests)

Was 2048 MBytes an originally bound , coming from the XP Operating System ?

What is the optimal target for the needs of the project ?
(if default set modified ,let inform us the best amount wished to adjust it with app_config, if our host has sufficient ram, of course)

Have stats been made on a greater sample of jobs recently , since the new status "in production" of LHCb project ?
(improvements in handling wus could have modified the basis of the requirement needed)

Is it possible to add a progress bar inside the vm to see the percentage of job done?
It's rather uncomfortable to follow it when you don't crunch 7/24 , we need to merge information from windows manager and the console ALT+F4 to shutdown computer at the good moment.
We have to wait for information : "job finished in slot 1" to be sure the work is really recorded on the server.
And with windows manager , we notice the beginning of the upload which announces that in a average of 18 min (with my bandwith), jobs will be recorded truly on server.
40) Message boards : Number crunching : Imbalance between Subprojects (Message 30631)
Posted 4 Jun 2017 by PHILIPPE
Post:
The original boinc philosophy was to use the power of the idle cores of volunteer's computer with low priority.
But it appears LHC projects need more than this statement.
The standard of most of public computer is about 4 cpus and 4 GBytes RAM, even if we start to see 8 cpus and between 6 and 8 GBytes ram associated.
More the requirements of the projects are far from this target and more the difficulty is big for the volunteer to suit and run them.
Not everyone has the amount of money to buy a gamer configuration for his personal use or the skillness to build it and / or to add ram memory.
The volunteer wants ,first of all to keep his computer responsive,when he uses it.
Not everyone has a dedicated host to crunch only boinc projects.
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Maybe a public inquiries (sent by boinc message to be completed by volunteers) could give datas and informations on the people who crunch LHC project.
For instance :
Man , Woman
Student , Worker , retired
Where he learns about the possibility to run lhc projects (tv , magazines , social network , newspaper ,...)?
Does the computer bought only for crunch?
Does he crunch at home , at work , or both ?
Does he find clear , or not the instructions on the site ?
Does he use the forum ?
How does he evaluate his skillness (beginner , medium , pro)?
Does he install boinc as a service ?
Does he encounter troubles during running ?
Was it about the OS platform , the virtualbox manager , the internet service provider , the app_config )?
Does he crunch other project ?
Does he know virtualbox ?
Does he use it elsewhere ?
And so on...
The results of the inquiries may teach you some hidden facts , and bring a work to solve them.More you know about your volunteers (distribution and behavior), more you can understand and help them.
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Theory tasks should be advised to beginners because they accept the shutdown and the reboot without any trouble and require the less ram memory.
And only 1 cpu and 1 job the first time.
By the way , some improvements should be made , about the RAM requirements.
Apparently , it is possible to run the theory , CMS , and LHCb tasks with less ram memory than defined by default.(with no errors and the duration of internal jobs was not longer)(Is 2048 MBytes a remnant of xp OS ?).
Modifying and reducing the default setting may prevent the beginner volunteer's host to be saturated.
But if default values are necessary , let inform the well skilled crunchers to increase it in their app_config.xml file.
It could enable more people to feel the first instant , more comfortable.
They have to trust in themselve and have the good feeling that they can do it without fear.If the computer becomes unresponsive , they don't go further...


Previous 20 · Next 20


©2024 CERN