• Creator
    Topic
  • #4001

    OneMadGypsy
    Participant

    In an attempt to try and preserve the most important information at quakeone.com I have been converting sub-forums to PDF documents. My first installment is from quake-help/general-help/. I intend to do as many and as much of these as I can. I may even revisit ones I’ve already done and attempt to get a better mirror to work with. I think I have better ideas on how to go about this than I did when I started this and hopefully my next installments will be better. However, this isn’t necessarily bad. It’s over 1800 pages of important questions and answers.

    facts:

    1. this is not the entire sub-forum. It is the best I could do with what I was able to rip. It’s a whole damn lot of it though. Maybe even better than 80%
    2. around 600 pages are pretty messed up but, the info is there. It’s just ugly
    3. I hand pruned 532 pages out of this document. From practically blank pages to pages with entirely useless data (like forum indexes) to pages that had little more than a signature on it.
    4. I stripped every single solitary form out of the this document (useless weight). 10’s of thousands and most of them by hand cause acrobat couldn’t handle selecting them all at once til I got about halfway through
    5. I purposely destroyed over 50,000 links. It’s not meant to be a website in a pdf. It’s meant to be a searchable document with an incredible amount of valuable information. Probably half of the links (or more) will die when quakeone finally does. We can’t rely on links for this.
    6. the final document is over 1800 pages long and only weighs a hair over 16mb
    7. some images are missing and that may be important. I’m sorry. I was working with what I had
    8. there weren’t too many 400 bad requests for actual pages. There seemed to be a decent amount for the css on a chunk of pages though. You would think a site mirror program wouldn’t act like every page has a different css but, you would be thinking wrong.
    9. this took forever and I don’t see this getting any easier as I capture more sub forums

    Overall, even though this document isn’t perfect. I’m pretty proud of it. We at least know that this much information won’t be lost. I’m going to try and do better on the next ones but, it’s really out of my hands what I end up with when I try to mirror such a large body of data.

    quakeone.com/forums/quake-help/general-help.pdf

    • This topic was modified 48 years ago by .

    one day at a timetop | reply

    http://www.nextgenquake.com/forum/other/galleries/
Viewing 13 replies - 1 through 13 (of 13 total)
  • Author
    Replies
  • #4256

    wizardmachine
    Participant
    • topics: 2
    • replies: 16

    Alright, I’m uploading it to big G and will share a link with you in a while. Let me know if the thing actually worked out and if it did, I can do another one this weekend.

    I was even thinking about setting up some raspberry pi to do the heavy lifting for us (constantly, and send us mails in case something gets messed up) but I don’t have additional screen/keyboards and it would take me some time to configure ssh and all of that. Or maybe a virtual box with HTTrack on it?

    #4181

    OneMadGypsy
    Participant
    • topics: 39
    • replies: 306

    IF you clicked cancel it would catch up ALL of those links. I’m not saying to do that (or not to do that). I just know for a fact that it won’t really cancel cause, this is what happened when I clicked it. First cancel click is more like saying “stop looking for more shit and go get the stuff you already know about”, second cancel click will really cancel everything. However, after first cancel click, once it catches up the pending links it really will stop. Then it takes ALL those temp files and starts compiling them into pages. That’s gonna take a minute but it’s not too bad. I did 20,000 pages or something like that in around 10 minutes.

    I’m wondering if you are getting the whole site. Do this: Open whatever folder you have all this going to and look for quakeone.com. Is it just the subforum in that folder (directory-wise) or do you have the whole site structure? You are already way ahead of me in links. I stopped somewhere around 25,000/25,000

    Also, you don’t really have 1.74 gigs. It’s a lie. Mine said 2 gigs but as it started catching up all the links and crossing errors the gigs dropped. I think in the long run I only really had 1.2 gigs out of 2.

    • This reply was modified 48 years ago by .

    one day at a timetop | reply

    #4180

    wizardmachine
    Participant
    • topics: 2
    • replies: 16

    Links scanned: 35677/35797
    Errors: 16k
    Bytes saved: 1.74gb

    Yeah, there’s a lot of hooyoo.com but it’s being all skipped. There’s a lot of quakeone ‘actions’ in the ‘in progress’ view so I guess it’s all good, and it seems quite close to completion.

    The thing is, the ‘links scanned’ max value keeps going up, so the value itself never reaches real completion. I still see ‘quakeone’ everyonce in a while so I’m pretty certain I’m not mirroring the whole internet (I hope!).

    #4145

    OneMadGypsy
    Participant
    • topics: 39
    • replies: 306

    here is an idea of what I set

    scan rules: ie… DONT get these types. It may be a bit more complicated than this though. The help file is a bit confusing to me on this point. This MAY mean don’t get these for only the top level of the domain. It sort of implies that. I was too lazy to write a million rules cause worst case scenario it would get these anyway in some cases and I simply wouldn’t include them in the pdf. Really, to me, this is more about how much bandwidth do you want to lose. I paid $50 to be lazy… or maybe this worked exactly right and it would have cost me $50 anyway. I never really searched around to see if I got any of this.

    -*.avi -*.js -*.zip -*.pdf -*.7z

    of course my full scan rule was the below. I just took my rules and dumped them in place of their +*.js rule
    +*.png +*.gif +*.jpg +*.jpeg +*.css -*.avi -*.js -*.zip -*.pdf -*.7z -ad.doubleclick.net/* -mime:application/foobar

    limits:
    max mirroring depth 4 – maybe this should be 5 just to be sure
    max external mirroring depth 0 – does this mean no external mirroring or only top level? who knows? I don’t

    flow control
    number of connections 3
    retries 3

    links
    checked: get html first
    checked: test validity of all links
    checked: attempt to detect all links

    build: I FAILED hard on this one. I don’t know how I missed this. I blame the stupid “jump-around’ tab system for the options. I should have set…
    checked: no error pages
    checked: no external pages

    spider:
    I left all of it default cause IDKWTF any of that means

    proxy:
    I left all of this default too but, not because I don’t understand it

    MIME types:
    ignored this. I don’t even want “files” so this was useless to me

    browser id:
    ignored this

    Log, Index, Cache
    left it all default

    experts only:
    checked: use a cache for updates (might have already been checked)
    left everything else default

    I’m not saying these are the best settings. In my understanding of everything this seemed to be the most logical for what I wanted to do. I have no experience with mirroring programs and had to rely on other experience/knowledge (and the help file) to get me through these settings. You’re basically running your own crawler bot. I don’t know shit about how that works… totally different end of my web experience. But hey, I don’t know jack shit about making PDF’s either and the one I made isn’t too shabby, considering what I had to work with. You gotta just fuck around with things until you get something out of it worth talking about. Give me the tools and materials I’ll build you a space shuttle. It’s gonna blow the fuck up for sure but, I could probably get something a few feet off the ground before all the spectators die. And I’m pretty confident that due to my awesome explosion something will end up in space. Mission Accomplished.

    Actually everything is already in space so I don’t have to do shit.

    EDIT: @scan rules

    I had it backwards on what I said above. My rules are right. What the docs are trying to say is doing something like -/*.js would mean only skip js files that are 1 level deep.

    If we wanted the rules to be really tight we could probably add
    -*.mp3 -*.mpg -*.mov -*.mpeg -*.mp4 -*.wav -*.tar -*.gz -*.bz -*.exe -*.dmg -*.txt -*.doc -*.au

    and I’m sure a lot more but that is probably already overkill for whatever is posted on quakeone.

    • This reply was modified 48 years ago by .

    one day at a timetop | reply

    #4140

    OneMadGypsy
    Participant
    • topics: 39
    • replies: 306

    @13811/22859

    22859 links are cache’d. Eventually it will catch up and start banging those out. In my case about half of that wound up being errors. If you cancel the mirror it will start banging all those out before it actually cancels … unless you click cancel again.

    @ 8418 errors

    No surprise there. I had almost 30,000. You can see what they are in the log. Maybe it’s broken links. Maybe it’s 400 bad requests for quakeone pages. The latter makes me sigh.

    @ hooyoo.com

    I checked my mirror and I don’t have any hooyoo.com folder. That doesn’t necessarily mean anything though cause I tried to mirror the entire site in one go and killed it at 2gigs. Who knows how much stuff I didn’t get. I have like half of imgur saved on my computer though. So, yeah, lots and lots of images is no surprise at all.

    My only request is just do the best you can do. Whatever we get is better than nothing. Who knows, even with all those errors you might still have a pretty damn good mirror. It all depends on what the errors are. Maybe it’s 8000+ errors cause hooyoo doesn’t exist anymore or something (I didn’t check)

    —–

    OK, I did check and it’s all in Japanese, looks nothing like a quake site and I can’t imagine what it has to do with anything. IDK. I don’t know how you set the options. Maybe you missed some option like “stay on this server” or something. Maybe you are mirroring the internet. I would think not cause I doubt the program comes with defaults that would lead to something like that. I just don’t know, brother. The help file for the program is pretty nice and explains everything with images and all. When I tried my mirror I went through EVERY option, set all the most obvious ones and thumbed through the help about the others. Some of it confused me a little and in those cases I just left the default. I did not skip one single option though. I’ve been messing with “unknown” programs a long time and one thing I learned, the hard way, many times, is to set every option or at least try to determine what it does and figure the best I can if it needs to be set. If you didn’t do that, who knows what you told HTTrack to do. It’s not too late to pause it (window/pause – I think) and review/set your options. You can actually cancel the whole thing and start it back up when you feel more confident of your selections. As long as you left USE CACHE (options:experts only tab) on it should work in more of an update way and shouldn’t try to start completely over.

    • This reply was modified 48 years ago by .

    one day at a timetop | reply

    #4139

    wizardmachine
    Participant
    • topics: 2
    • replies: 16

    It’s been on 99% for a few hours already (total 11h, 1.08gb, links scanned 13811/22859).

    However it says there are 8418 errors although I guess that’s kind of standard (or maybe it isn’t?). It’s also receiving lots of images from hooyoo.com, is that normal?

    #4132

    OneMadGypsy
    Participant
    • topics: 39
    • replies: 306

    Thank you. I hope it looks awesome and perfect. When this one is done if you are up for some more you can tackle “quakeone.com/forum/quake-mod-releases/finished-works/” or “http://quakeone.com/forum/quake-mod-releases/works-in-progress”

    one day at a timetop | reply

    #4125

    wizardmachine
    Participant
    • topics: 2
    • replies: 16

    Ok it’s doing the thing and I think I set it up properly. I’ll check in the morning how it looks like.

    #4058

    OneMadGypsy
    Participant
    • topics: 39
    • replies: 306

    LMAO! I found this is the history book (page 312). This poor guy was getting like 5fps ~

    RandomQuake: When playing I noticed that when I receive damage my game slowdowns, like a fps drop or sometihng like that, is very annoying and I don´t find a reason.

    ——————–

    MadGypsy: That’s supposed to happen. It’s part of the SVCSQCCGUI DMS (damage mobility system) that EZQuake uses for realism. You’re all damaged so you can’t move around very well. Just get better and it wont happen as much.

    edit: FTEQW doesn’t use the DMS.

    fun fact: FTEQW = Forget The Ez Quake Way …so it should be the most helpful of all to you.

    ———————

    If anybody else finds funny stuff feel free to post it here.

    • This reply was modified 48 years ago by .

    one day at a timetop | reply

    #4045

    wizardmachine
    Participant
    • topics: 2
    • replies: 16

    Yeah, my internet is pretty stable. I’ll set it up as soon as I go to bed today so that it’s working during the night. I’ll send you some PMs if I have trouble setting up the whole thing but I guess it’s pretty straightforward. You shouldn’t be paying a dime to be saving this data from getting lost in oblivion. If anything, you should be getting paid!

    • This reply was modified 48 years ago by .
    #4042

    OneMadGypsy
    Participant
    • topics: 39
    • replies: 306

    Do you have really good internet with unlimited bandwidth? I did this with my phone hotspot. Cost me 50$ cause I ran out of internets. I dont give any fucks about $50 but, it would be nice to not pay money to save quakeone data.

    If so,…

    download this
    and
    mirror this

    ultra 7zip the results and send them to me.

    regarding the app make sure you set scan rules like -*.avi -*.zip -*.js (no avi’s no zips, no javascript etc). We just need images, html and css. Everything else is useless to a pdf. Well, useless to the pdf I’m going to make, anyway. You should probably read about the scan rules. I think what I just wrote isn’t good enough. It may need to be a lil more ellaborate -/*.avi or some shit like that.
    Also dont set connections above probably 3. I used 3 and then got impatient and jacked it up to 8 … an awful lot of bad requests ensued.

    • This reply was modified 48 years ago by .

    one day at a timetop | reply

    #4039

    wizardmachine
    Participant
    • topics: 2
    • replies: 16

    Outstanding work! Actions really speak for ourselves. Let me know if I can help somehow.

    #4035

    talisa
    Participant
    • topics: 23
    • replies: 73

    this is great, preserving valuable threads from quakeone.

    i applaud you for going through all this effort to save anything valuable from a doomed site

    there is so incredibly much priceless info in there, like answers to frequently asked questions and help on things like problems with engines and how things work, and entire tutorial threads on how to do things like map in gtkradiant, setting up modern quake clients using the files from the gog and steam releases, and threads with instructions on using Darkplaces features like how to create RT-lights and such things.

    those kinda things are invaluable information, and it would be a terrible shame
    for all that info to just get lost to the void when quakeone perishes

Viewing 13 replies - 1 through 13 (of 13 total)

You must be logged in to reply to this topic.