Post Reply 
Collection of HHC programming contests and other useful documents
09-22-2018, 07:15 PM
Post: #137
RE: Collection of HHC programming contests and other useful documents
I finally got a chance to go through pier4r's 201805 torrent, which has a scraped copy of this current HP Museum Forum through May.

It's a really good start.

I wrote some code similar to what I wrote for the old forum, to parse all the messages to make an offline download. Unlike pier4r's torrent, I didn't try to capture all linked content, but just the embedded content. This reduced its size somewhat. Also, similar to what I did for the old forum, I went through all broken links in an attempt to find any missing images and restore them, either from the Wayback Machine or by contacting the authors. I also grabbed close to 200 of the embedded images that were classified as attachments and therefore required a login to view.

As before, I have attached a list of all the images I was unable to restore. This time I also listed the thread number of each one missing images, too.

Unfortunately, I noticed that the snapshot that pier4r made used the "printthread" version of everything. This loses some information, most notably all the individual post numbers, but also any attachments (probably thousands of files) and some other minor things. By passing a forum login cookie into the download process, it can then pull all the attachments also.

At some point I will download the whole new forum myself, this time using a login cookie and pulling the full version of the posts rather than the "printthread" version. Doing this would also produce something feasible for loading into my own MyBB instance, should the HP Museum site ever shut down. But for now, what pier4r did was good enough. And as long as Dave keeps maintaining the HP Museum Forum I see little reason to create an online-viewable mirrored version of it.

In the interest of preserving the historic record, I've now posted a torrent with my processed version of his torrent here. This is completely usable offline, though navigating between messages without using the menus can be difficult due to the lost message hierarchy. All messages are browsable in the same manner as the offline version I made of the old museum, as well as the other discussion forums and newsgroups that I have torrents for. It has all posts through May 19, 2018, plus around 93% of the embedded images.

Also at this time, all subforums are combined into a single menu, though each post indicates the forum it came from. I might consider breaking up the separate forums at some point.

Screenshot of menu:

[Image: jhXX6mf.png]

Download from here:

https://www.hpcalc.org/torrents/


Attached File(s)
.txt  HP Museum new forum missing images.txt (Size: 27.72 KB / Downloads: 7)
Visit this user's website Find all posts by this user
Quote this message in a reply
Post Reply 


Messages In This Thread
RE: Collection of HHC programming contests and other useful documents - Eric Rechlin - 09-22-2018 07:15 PM



User(s) browsing this thread: 2 Guest(s)