Post Reply 
Did I wreck the site? Was not my intention.
10-03-2017, 08:05 PM
Post: #1
Did I wreck the site? Was not my intention.
So today I was wondering if I could collect the MoHPC archives. So I started with wget -r with bandwidth limited to 16kb/s . This produced errors.

So I learned the lesson and I said "ok unlimited download speed but every link visited after 5 seconds, not one after another". After a while, still errors.

I raised the interval between requests to 10 seconds, then 20 and then 30. But at the end I got only 50x server errors . I could access the main page and nothing else.

Now I am not sure if the error was faked (i.e: in reality I was the only guy blocked) or it was a general error. In the case it was a general error, for my experience with webservers I would guess that the fcgi processor (Perl for the old forum?) had a memory leak crashing the entire system. Of course it is a guess.

Anyway it was not my intention. I am not sure if there is a compressed version of the archives somewhere, otherwise I would like just to collect those even if I have to go very slow (60 seconds between connections is ok? 120? 180?)

So, since no one posted after 17.00 (CEST) until 19.00 (CEST) , although I know a site should be "solid" it is also true that I may have contributed to wreck it. Sorry.

PS: side request. Could the archives be static? Since they won't get updated anymore, can the pages be generated once and that's it?

Wikis are great, Contribute :)
Find all posts by this user
Quote this message in a reply
10-03-2017, 08:45 PM
Post: #2
RE: Did I wreck the site? Was not my intention.
(10-03-2017 08:05 PM)pier4r Wrote:  PS: side request. Could the archives be static? Since they won't get updated anymore, can the pages be generated once and that's it?

We already have archives 01-21 on our USB flash keys/DVDs/CDs

Why don't you get one yourself?

Greetings,
    Massimo

-+×÷ ↔ left is right and right is wrong
Visit this user's website Find all posts by this user
Quote this message in a reply
10-03-2017, 08:58 PM (This post was last modified: 10-03-2017 08:59 PM by pier4r.)
Post: #3
RE: Did I wreck the site? Was not my intention.
That is in the todo list (that's long, but I will eventually to support this site). But then can I put them on a p2p collection?

Anyway the point of static pages holds. If they are not anymore updated, having an application generate them every time is not needed and it is less load on the server.

Wikis are great, Contribute :)
Find all posts by this user
Quote this message in a reply
10-03-2017, 10:17 PM
Post: #4
RE: Did I wreck the site? Was not my intention.
(10-03-2017 08:05 PM)pier4r Wrote:  So, since no one posted after 17.00 (CEST) until 19.00 (CEST) , although I know a site should be "solid" it is also true that I may have contributed to wreck it. Sorry.

Yes, still at around 9:00am (EDT) (I don't know what CEST is) the site was not stable, I could not access most pages; it was uneven, but a short while later it was back to normal. Seems like the site throttles wget requests (as most websites do). Probably a good idea to ask first (not for permission, but to see if it is tolerated well) before trying stuff like this.

I agree with Massimo - buy the USB Drive Document Collection ASAP. It will satisfy your itch for more stuff very quickly.

(10-03-2017 08:58 PM)pier4r Wrote:  That is in the todo list (that's long, but I will eventually to support this site). But then can I put them on a p2p collection?

Anyway the point of static pages holds. If they are not anymore updated, having an application generate them every time is not needed and it is less load on the server.

IMHO (and please note I have absolutely no say or authority) you should not share them on p2p, but rather recommend interested folks purchase the USB drive Document Set to help support and promote the site. Without support, sites like this can easily just go away.

I imagine the hosting site does not charge for server load (to generate the pages - it is not a very heavy load) but only for bandwidth and possibly storage. Also, I can't help but notice that keeping it as-is, is a natural obstacle for people trying to grab the entire site - as you discovered. I've no idea at all if this is part of the reason for keeping it as-is, but it seems a useful side-effect.

--Bob Prosperi
Find all posts by this user
Quote this message in a reply
10-03-2017, 10:52 PM (This post was last modified: 10-03-2017 10:56 PM by pier4r.)
Post: #5
RE: Did I wreck the site? Was not my intention.
(10-03-2017 10:17 PM)rprosperi Wrote:  Also, I can't help but notice that keeping it as-is, is a natural obstacle for people trying to grab the entire site - as you discovered. I've no idea at all if this is part of the reason for keeping it as-is, but it seems a useful side-effect.

That's fine, but not if the site collapse (a 503 server error means that the processor rendering the pages , likely Perl, is crashed. If the site goes down, it mean that the memory or the processes are blown).

One can employ limits on the webserver itself still serving static pages. That saves ram/cpu and maintenance for sure.

Also, just to clarify:

1st attempt (bandwidth limited) failed after 2 minutes - site still fine
2nd attempt (5 seconds limit between trials) failed after circa 10 minutes - site still fine
3rd attempt (10 seconds limit) failed after ca. 30 minutes - site fine
4th attempt (20 seconds) failed after ca. 45 minutes - site fine except the archive tried.
5th attempt (30 seconds) failed after ca. 60 minutes - site down

Now, although scraping a site may be not that nice, a request (a single one, just one connection) every 30 seconds is pretty nothing and if that much is able to bring down the site, there is some misconfiguration somewhere. In any case I would avoid it, wget or not.

For this I say, all the more reasons if those page are on the USB/DVD (so the work is already done), go for static pages.

For the p2p part. Sure in theory one should foster those purchases, I agree. The point is not to avoid to purchase something, rather to make something resilient so there is a shared backup. I learned that digital content, especially niche one, disappears too quickly if one takes it for granted.

edit: and yes I should have asked for permission. I only thought that there was no problem whatsoever doing it because I knew I was going to limit the download and one single downloader with very slow frequency of download normally is handled fine. (as I did for the c2.wiki already)

Wikis are great, Contribute :)
Find all posts by this user
Quote this message in a reply
10-03-2017, 11:37 PM (This post was last modified: 10-04-2017 12:24 AM by Don Shepherd.)
Post: #6
RE: Did I wreck the site? Was not my intention.
(10-03-2017 10:52 PM)pier4r Wrote:  
(10-03-2017 10:17 PM)rprosperi Wrote:  Also, I can't help but notice that keeping it as-is, is a natural obstacle for people trying to grab the entire site - as you discovered. I've no idea at all if this is part of the reason for keeping it as-is, but it seems a useful side-effect.

That's fine, but not if the site collapse (a 503 server error means that the processor rendering the pages , likely Perl, is crashed. If the site goes down, it mean that the memory or the processes are blown).

One can employ limits on the webserver itself still serving static pages. That saves ram/cpu and maintenance for sure.

Also, just to clarify:

1st attempt (bandwidth limited) failed after 2 minutes - site still fine
2nd attempt (5 seconds limit between trials) failed after circa 10 minutes - site still fine
3rd attempt (10 seconds limit) failed after ca. 30 minutes - site fine
4th attempt (20 seconds) failed after ca. 45 minutes - site fine except the archive tried.
5th attempt (30 seconds) failed after ca. 60 minutes - site down

Now, although scraping a site may be not that nice, a request (a single one, just one connection) every 30 seconds is pretty nothing and if that much is able to bring down the site, there is some misconfiguration somewhere. In any case I would avoid it, wget or not.

For this I say, all the more reasons if those page are on the USB/DVD (so the work is already done), go for static pages.

For the p2p part. Sure in theory one should foster those purchases, I agree. The point is not to avoid to purchase something, rather to make something resilient so there is a shared backup. I learned that digital content, especially niche one, disappears too quickly if one takes it for granted.

edit: and yes I should have asked for permission. I only thought that there was no problem whatsoever doing it because I knew I was going to limit the download and one single downloader with very slow frequency of download normally is handled fine. (as I did for the c2.wiki already)

I could not get on the site earlier today. If whatever you were doing caused that, cut it out. Buy the museum thumb drive, like Bob and Massimo suggested.
Find all posts by this user
Quote this message in a reply
10-04-2017, 05:43 AM
Post: #7
RE: Did I wreck the site? Was not my intention.
CEST is probably Central European Summer Time.
Find all posts by this user
Quote this message in a reply
10-04-2017, 05:55 AM
Post: #8
RE: Did I wreck the site? Was not my intention.
MyBB allows for a sticky Message Of The Day box, used e.g. in the Articles forum. The admin could add an entry about getting content from this site.

wget follows robots.txt, so if not stated otherwise (see above), recursively downloading the content is not an abusive action. However, it shouldn't damage the site. IMO, pier4r found a bug Wink.
Find all posts by this user
Quote this message in a reply
10-04-2017, 08:01 AM (This post was last modified: 10-04-2017 08:02 AM by HP67.)
Post: #9
RE: Did I wreck the site? Was not my intention.
(10-03-2017 10:17 PM)rprosperi Wrote:  IMHO (and please note I have absolutely no say or authority) you should not share them on p2p, but rather recommend interested folks purchase the USB drive Document Set to help support and promote the site. Without support, sites like this can easily just go away.

The doc is not downloadable from this site AFAIK. If all he wanted to do was grab the forum archives that seems reasonable. Does not seem reasonable that the server (mysql probably specifically, from the error msgs I got) should crash from wget.

I agree we should support the site and I did buy a DVD set. However, given HP produced all the doc I think stamping HP Museum on all the manuals is in poor taste and we should not have to pay for them. If what we do when we buy a DVD set/USB stick is support the forum and pay for the media costs of collecting and scanning the doc I am all for that.

It ain't OVER 'till it's 2 PICK
Find all posts by this user
Quote this message in a reply
10-04-2017, 08:44 AM
Post: #10
RE: Did I wreck the site? Was not my intention.
(10-04-2017 08:01 AM)HP67 Wrote:  If what we do when we buy a DVD set/USB stick is support the forum and pay for the media costs of collecting and scanning the doc I am all for that.

From this same site:
Quote:...Your purchase helps to offset the cost of running the website...

Greetings,
    Massimo

-+×÷ ↔ left is right and right is wrong
Visit this user's website Find all posts by this user
Quote this message in a reply
10-04-2017, 08:47 AM (This post was last modified: 10-04-2017 08:48 AM by pier4r.)
Post: #11
RE: Did I wreck the site? Was not my intention.
(10-03-2017 11:37 PM)Don Shepherd Wrote:  I could not get on the site earlier today. If whatever you were doing caused that, cut it out. Buy the museum thumb drive, like Bob and Massimo suggested.

I still fail to convey my message. I wonder if it is due my poor English.

So, to clarify, after the 5th attempt (done yesterday around 15.00 central european time) I stopped since the result was not that good. Nor I have intention to do it again, since the result is not good (I don't want to bring down the site).

Nevertheless my point is that the current setup is a bit too fragile. If one person requests 120 pages per hour and the site goes down, it is a bit unexpected.

To make an analogy with calculators: imagine I buy an HP 41 and I use it a bit and it crashes, I ask and I get "yes the HP 41 works wonderfully, but if you do more than 100 key presses within an hour, it crashes" . It would surprise me.

The point of buying the museum DVD or what not is another story. Surely it is a good collection (once again, I am all for collections that avoid losing efforts done in the pasts, they are very important) but it does not solve the point that the site is a bit too fragile.

In other words: if one buys 1 billion copies of the collection on the USB stick, and then runs wget again, would the site be magically more robust? No.

Also I would have expected that the part of the archives was down, not the entire site. Somehow the things are connected although MyBB should be independent from the old archives.

Last but not least. I always prefer to take responsibility for what I do, so I say as early as I can "look if that was due to me, it is my fault, sorry.", but my approach assumes an atmosphere of "ok, good to know, now let's focus on the problem and not on the blame". If the atmosphere instead is of some sort of blaming, like "why did you do that?" then there is no incentive to be honest and that won't produce good results (I experienced that directly in my region of origin, where honesty was a luxury and nothing worked).

Wikis are great, Contribute :)
Find all posts by this user
Quote this message in a reply
10-04-2017, 09:12 AM
Post: #12
RE: Did I wreck the site? Was not my intention.
(10-04-2017 08:44 AM)Massimo Gnerucci Wrote:  
(10-04-2017 08:01 AM)HP67 Wrote:  If what we do when we buy a DVD set/USB stick is support the forum and pay for the media costs of collecting and scanning the doc I am all for that.

From this same site:
Quote:...Your purchase helps to offset the cost of running the website...

I am well aware of that thank you very much. What I am not aware of is what the truth of the matter is regardless of what is stated. Anyway, I did buy the set with the intention to support the site. So now I get to use said site to express my opinions.

It ain't OVER 'till it's 2 PICK
Find all posts by this user
Quote this message in a reply
10-04-2017, 09:17 AM
Post: #13
RE: Did I wreck the site? Was not my intention.
(10-04-2017 08:47 AM)pier4r Wrote:  In other words: if one buys 1 billion copies of the collection on the USB stick, and then runs wget again, would the site be magically more robust? No.

Since

Quote:...Your purchase helps to offset the cost of running the website...

Maybe yes?
Who knows, with billions in budget a full team of engineers monitoring the site 24x7x365 could be hired... ;)

Greetings,
    Massimo

-+×÷ ↔ left is right and right is wrong
Visit this user's website Find all posts by this user
Quote this message in a reply
10-04-2017, 09:19 AM
Post: #14
RE: Did I wreck the site? Was not my intention.
(10-04-2017 09:12 AM)HP67 Wrote:  I am well aware of that thank you very much.

You're welcome!

Greetings,
    Massimo

-+×÷ ↔ left is right and right is wrong
Visit this user's website Find all posts by this user
Quote this message in a reply
10-04-2017, 01:45 PM
Post: #15
RE: Did I wreck the site? Was not my intention.
(10-04-2017 08:01 AM)HP67 Wrote:  If what we do when we buy a DVD set/USB stick is support the forum and pay for the media costs of collecting and scanning the doc I am all for that.
Scanning is done by volunteers also. IIRC, I have scanned the french 20S manual. No explicit payment expected since this indirectly supports this site, which in turn is the reward to the user.
Find all posts by this user
Quote this message in a reply
10-04-2017, 04:33 PM
Post: #16
RE: Did I wreck the site? Was not my intention.
(10-04-2017 09:17 AM)Massimo Gnerucci Wrote:  Maybe yes?
Who knows, with billions in budget a full team of engineers monitoring the site 24x7x365 could be hired... Wink

Sure, but I what I meant is that until there is a change, the site will be fragile.

Wikis are great, Contribute :)
Find all posts by this user
Quote this message in a reply
10-04-2017, 08:45 PM
Post: #17
RE: Did I wreck the site? Was not my intention.
Anyway, just for info, I bought the usb drive from the MoHPC.

As I said many times, I am a bit surprised by the lack of documents about the 49g+/50g/prime but it does not matter because the forum (especially the new one MyBB based with the software section) offset this.

I bought it mostly to support the forum actually, since I also spammed 700+ times. Although this upsets a bit my todo list (I actually should penny pinching for a while, having a daughter and having only one income in the family)

Wikis are great, Contribute :)
Find all posts by this user
Quote this message in a reply
10-04-2017, 10:05 PM
Post: #18
RE: Did I wreck the site? Was not my intention.
(10-04-2017 08:45 PM)pier4r Wrote:  As I said many times, I am a bit surprised by the lack of documents about the 49g+/50g/prime but it does not matter because the forum (especially the new one MyBB based with the software section) offset this.

This (once) was a Museum: new models weren't covered so much.

Greetings,
    Massimo

-+×÷ ↔ left is right and right is wrong
Visit this user's website Find all posts by this user
Quote this message in a reply
10-05-2017, 12:25 AM
Post: #19
RE: Did I wreck the site? Was not my intention.
(10-04-2017 08:45 PM)pier4r Wrote:  Anyway, just for info, I bought the usb drive from the MoHPC.

As I said many times, I am a bit surprised by the lack of documents about the 49g+/50g/prime but it does not matter because the forum (especially the new one MyBB based with the software section) offset this.

First, thanks for supporting the MoHPC site pier4r. The more folks that do so, the longer it will remain available for us to use for our common hobby.

Second, I was not looking to assign blame, only to suggest that before trying something like wget -all, it's probably best to ask the community, as many of us frankly have tried doing the same thing with similar results. I can't say if this means that the site is fragile or not, I don't run it, nor know anything about configuring myBB. But I do know that many host sites will 'react' to prevent mass wget requests, and it appears that is happening here.

Third, I've no special knowledge of the arrangements Dave Hicks has with HP regarding distribution of the manuals, etc. but having been involved in similar arrangements between companies and customer/fan organizations, I can say that it is common to allow distribution of information regarding discontinued models, while simultaneously disallowing similar distribution for current products. This helps support customers of those past models at no cost to the manufacturer, while also maintaining control of support for customers of current models.

The 49g/49g+/48gII/50g are all very similar (from a manual/documentation perspective) and the last of these models (50g) was only discontinued less than a year ago, so it's reasonable to conclude the above restrictions were in place, at least until recently, and since no new versions of the MoHPC collection have been released since then, it's impossible to say if they are forthcoming.

Of course, the 50g AUR and User Guide have been available from multiple other sources, both in PDF and printed form, for many years, so the fact that they're missing here is not much of an impact; I'm sure there are multiple links in these very pages for downloading both these docs, plus others.

Hopefully, the 50g docs will be in a future release as additional incentive for fans to buy the doc set and further support the site.

--Bob Prosperi
Find all posts by this user
Quote this message in a reply
10-05-2017, 05:45 AM (This post was last modified: 10-05-2017 05:45 AM by pier4r.)
Post: #20
RE: Did I wreck the site? Was not my intention.
@rprosperi: thanks for the explanation.

(10-04-2017 10:05 PM)Massimo Gnerucci Wrote:  This (once) was a Museum: new models weren't covered so much.

I know and it is still (I mean, it has even pages on mechanical calculators that are awesome). The point is: what should a museum include?

To me it would be a pity to include a model once it is "too late". What about all those models that were attempts and did not have enough success (hp xpander?), in the future they will be rarities so having their documentation while they are produced could be interesting later.

Of course, with hindsight everything is easy.

Anyway this is not a big problem, as I said the MyBB forum offset this greatly and several collections (like what I am trying to support via torrent) do it too.

Wikis are great, Contribute :)
Find all posts by this user
Quote this message in a reply
Post Reply 




User(s) browsing this thread: 1 Guest(s)