FGRPB1G work shortage

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4309

Credit: 250041949

RAC: 34400

13 Oct 2022 10:17:19 UTC

Topic 228360

(moderation:

)

Currently we're having trouble fulfilling the high demand of FGRPB1G "work". There are indications that it might get better by the end of October, but if you're limiting yourself to FGRPB1G right now, please consider running BRP7 instead, at least for a while.

astro-marwil

Joined: 28 May 05

Posts: 531

Credit: 632176543

RAC: 1104772

Hallo Bernd! Many thanks

13 Oct 2022 10:52:31 UTC

Message 202193

(moderation:

)

Hallo Bernd!

Many thanks for the hint.

Best regards and happy crunching

Martin

JohnDK

Joined: 25 Jun 10

Posts: 115

Credit: 2533380478

RAC: 2192776

Bernd Machenschalk

31 Oct 2022 19:49:04 UTC

Message 203155

(moderation:

)

Bernd Machenschalk wrote:

Currently we're having trouble fulfilling the high demand of FGRPB1G "work". There are indications that it might get better by the end of October, but if you're limiting yourself to FGRPB1G right now, please consider running BRP7 instead, at least for a while.

Any new indications?

Tom M

Joined: 2 Feb 06

Posts: 6392

Credit: 9506669772

RAC: 17211505

Bernd Machenschalk

2 Nov 2022 22:45:15 UTC

Message 203269

(moderation:

)

Bernd Machenschalk wrote:

Currently we're having trouble fulfilling the high demand of FGRPB1G "work". There are indications that it might get better by the end of October, but if you're limiting yourself to FGRPB1G right now, please consider running BRP7 instead, at least for a while.

Has it gotten better on your end?

Tom M

A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!

Harri Liljeroos

Joined: 10 Dec 05

Posts: 4304

Credit: 3183331665

RAC: 1959679

The graph on the front page

3 Nov 2022 8:57:04 UTC

Message 203285

(moderation:

)

The graph on the front page about granted credit is showing a decline. I assume that is because people are shifting away from FGRB1G to other types of tasks and they are not so generous with credit. This graph is of course a slow indicator due to the validation process.

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4309

Credit: 250041949

RAC: 34400

Sorry, no good news yet for

4 Nov 2022 16:10:00 UTC

Message 203336

(moderation:

)

Sorry, no good news yet for FGRPB1G.

I know that work is being done on new data, as well as on the somewhat unstable pre-processing code/pipeline. But I don't know how much longer it will take until everything is working again.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117154142827

RAC: 36197305

Bernd Machenschalk wrote:....

6 Nov 2022 6:00:14 UTC

Message 203415 in response to message 203336

(moderation:

)

Bernd Machenschalk wrote:

.... the somewhat unstable pre-processing code/pipeline.

Hi Bernd,
That description made me wonder if a problem I've seen a couple of times recently might be related in some way.

I have a lot of hosts crunching FGRPB1G. Most have no peripherals attached - just power and network. Many have uptimes in the hundreds of days. I use various scripts to monitor and control them. In particular, one script visits all hosts every hour and produces quite a detailed log which allows me to track any unusual events or problems that might otherwise go unnoticed for quite a while.

Recently, there have been a number of detections of an issue that seems likely to be caused by something server-side. It has just happened on two different machines on consecutive days. It results in spurious 24hr back-offs as shown in these snips from the event logs. The times are local (UTC+10).

04-Nov-2022 00:44:59 [Einstein@Home] Sending scheduler request: To fetch work.<br />
04-Nov-2022 00:44:59 [Einstein@Home] Requesting new tasks for AMD/ATI GPU<br />
04-Nov-2022 00:45:02 [Einstein@Home] Scheduler request completed: got 0 new tasks<br />
04-Nov-2022 00:45:02 [Einstein@Home] platform 'x86_64-pc-linux-gnu' not found<br />
04-Nov-2022 00:45:02 [Einstein@Home] Project requested delay of 86400 seconds

and

05-Nov-2022 02:50:00 [Einstein@Home] Sending scheduler request: To fetch work.<br />
05-Nov-2022 02:50:00 [Einstein@Home] Reporting 1 completed tasks<br />
05-Nov-2022 02:50:00 [Einstein@Home] Requesting new tasks for AMD/ATI GPU<br />
05-Nov-2022 02:50:03 [Einstein@Home] Scheduler request completed: got 0 new tasks<br />
05-Nov-2022 02:50:03 [Einstein@Home] platform 'x86_64-pc-linux-gnu' not found<br />
05-Nov-2022 02:50:03 [Einstein@Home] Project requested delay of 86400 seconds

The events happened in the middle of the night so I didn't get to see the warnings until the script itself had corrected the issue. The "got 0 new tasks" is fine - it happens regularly - but why should there be a "platform not found" plus a 24hr backoff?

The time between scheduler contacts (last RPC) is monitored by the script. If that time becomes excessive and there is no detected issue with the host itself, boinccmd is used to force a contact with the project. This happened in both cases and the backlog of completed work that had piled up was returned. One had ~50 tasks.

In my case, I'm not unduly bothered by these events. If it's a bug, it would be good to fix it though :-). Since I haven't seen other reports about this, maybe it's something specific to my setup.

Cheers,
Gary.

Keith Myers

Joined: 11 Feb 11

Posts: 4959

Credit: 18638958818

RAC: 5353859

You are not the only person

6 Nov 2022 6:51:59 UTC

Message 203417 in response to message 203415

(moderation:

)

You are not the only person to see this exact symptom of a server induced 24 hour backoff. Many of my team have suffered similar backoffs.

Easy to fix with a boinccmd project connection as you said.

mikey

Joined: 22 Jan 05

Posts: 12654

Credit: 1839045161

RAC: 4642

Keith Myers wrote:You are

6 Nov 2022 11:29:23 UTC

Message 203420 in response to message 203417

(moderation:

)

Keith Myers wrote:

You are not the only person to see this exact symptom of a server induced 24 hour backoff. Many of my team have suffered similar backoffs.

Easy to fix with a boinccmd project connection as you said.

MilkyWay has/had a similar problem for me as well and others are complaining about it in the forums over there but they have a brand new Admin who's still trying to get up to speed with swapping to a new Server etc

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3930

Credit: 46034612642

RAC: 64234397

I believe the random 24hr

6 Nov 2022 14:29:02 UTC

Message 203423

(moderation:

)

I believe the random 24hr back off is not related to the work availability issue. Since that’s been happening well before the issues with work availability.

_________________________________________________________________________

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4309

Credit: 250041949

RAC: 34400

Thanks for reporting this.

7 Nov 2022 7:05:07 UTC

Message 203451 in response to message 203415

(moderation:

)

Thanks for reporting this. This is a sporadic error that resulted from an inconsistency that occurred during the OS upgrade (Oct 18) and apparently went unnoticed so far. Fixed, should not occur again.

FGRPB1G work shortage

Forums › Technical News

Comment viewing options

Forums › Technical News