Skip to content
Snippets Groups Projects

Draft: Run all harvesters on a single local set to allow keeping HTML documents alive over await points.

Closed Adam Reichold requested to merge OC000014987132/metadaten:local-set-harvesters into main

@OC000021106231 @OC000008373193 So this is what would be required to avoid "error: future cannot be sent between threads safely" at the cost of running all harvesters on a single thread/CPU. So this would solve the issues you were running into with the UIP and HLNUG harvesters at the price of limiting the scalability of our harvester process.

Admittedly, the limit of a single CPU is also not as bad as it might sound:

  • We currently do not actually use more than 10% of the two CPU allotted to the harvester VM, so this would just push us up to 20% for now.
  • Everything involving blocking I/O like loading cached responses from disk and decompressing them already runs a in separate thread pool would still be able to use the second CPU.

One large downside I see is that this encourages a coding pattern that uses more memory than necessary, i.e. it is a good thing to drop response text and parsed HTML as soon as possible before moving on to the next item. However, we could still nudge people during code reviews and this issue would not block development of harvesters any more.

So what do you (or anybody else) think?

To be resolved before this is more than a quick draft:

  • Revert the harvester changes here as they are only illustrative but actually increase memory usage without a good reason.
  • Update the HOWTO document to remove the whole section on non-send futures.
  • Give this a try for an overnight harvester run to see if there actually is a performance degradation.
Edited by Falk Heße

Merge request reports

Pipeline #39780 passed

Pipeline passed for 2fb9f6df on OC000014987132:local-set-harvesters

Closed by Adam ReicholdAdam Reichold 11 months ago (Mar 27, 2024 9:01am UTC)

Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
Please register or sign in to reply
Loading

Consent

On this website, we use the web analytics service Matomo to analyze and review the use of our website. Through the collected statistics, we can improve our offerings and make them more appealing for you. Here, you can decide whether to allow us to process your data and set corresponding cookies for these purposes, in addition to technically necessary cookies. Further information on data protection—especially regarding "cookies" and "Matomo"—can be found in our privacy policy. You can withdraw your consent at any time.