Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Hi! So I have Ubuntu installed with the latest version of Perl, Sqlite 3 and the Taparip repository. Help me, Obi-Wan Kenobi. You're my only hope.
RE: Taparip
Do you have cpanm installed? Or just cpan? Those are the module installers. Both will work, just cpan is a little older and more uh, interactive.
"Kitto daijoubu da yo." - Sakura Kinomoto
RE: Taparip
Hm running instmodsh shows I have cpanminus installed. I change that if needed.
RE: Taparip
Then try running what's there in the readme:
sudo cpanm Mojo::UserAgent Carp DBI Date::Manip

After that, you need to open up and start changing configuration.  I was lame and lazy, so you have to edit it directly.  Maybe if I ever decide to care enough I can make it an INI or YAML or something.  Anyway, you need to edit this section:

$domain should link to the path of the forum you're trying to rip.  Change the URL to your forum.
$apipath is probably still the same -- it will break right away if run and you'll know it if it's wrong, and then I'd actually have to look.
$dbfile is where you save the output.  Pick some directory you can write to.  Or just use sudo, lol (this software makes no warranty for any purpose lol)
The next one you should set is $endtopic.  Get the thread number of the most recent thread from the URLs around the forum.
$repeat_thread is well... okay, this is how the software works:  Within the list of thread numbers, it jumps around at random to simulate organic traffic.  Therefore someone looking at server logs won't think it's obvious they're getting crawled.  However, if you restart the script, the order will be re-randomized, so it might try to download threads you've already downloaded.  If it's set to 0, you get everything once.  If 1, you can get threads again -- most useful when you need to pick up new posts.
$delay - 2 seconds seems like we're not trying to murder their servers, right?
$verbose - you like words, don't you?
$username and $password -- If there are private forums you're trying to capture, you'll need this.  I believe someone said it worked.  However, I migrated this forum without needing to login, since it's all public.
$authorz - um, some guy added this to my script later, leave it alone I guess.

perl -c will check the code compilation without running anything, so you can check to see we have all the modules and you didn't mess up syntax before actually running it.
"Kitto daijoubu da yo." - Sakura Kinomoto
RE: Taparip
Man, I forgot about how you saved our forum with that, Lab. Much kudos, once again.

Though I still need to go through and repost a bunch of my fiction that got horribly munged in the serial migrations. I'm pretty sure it was all crapatalk's fault, anyhow.
Yasuri Nanami is my number one waifu, if only because she would horribly murder all the others if they didn't shut up and toe the line.

"They did not care about all the other attempts wizards had made on the Lone Power through history; as far as a computer is concerned, there is no program that cannot be debugged, or at worst, rewritten."
-Diane Duane, High Wizardry

"If our friendship depends on things like space and time, we've destroyed our own brotherhood! But overcome space, and all we have left is Here. Overcome time, and all we have left is Now. And in the middle of Here and Now, don't you think that we might see each other once or twice?"
-Richard Bach, Jonathan Livingston Seagull
RE: Taparip
(07-25-2020, 04:31 AM)Black Aeronaut Wrote: Man, I forgot about how you saved our forum with that, Lab.  Much kudos, once again.

Though I still need to go through and repost a bunch of my fiction that got horribly munged in the serial migrations.  I'm pretty sure it was all crapatalk's fault, anyhow.

Some of it was my fault, most of it was their fault.  I still have the originals in HTML form if you want me to give it another go.  Before I had to a chance to screw it up, but after they did.

The unfortunate part of web forums is that they're really hard to monetize, so only a few companies can exist that provide them.  So we get these large mergers, as they get big enough they can provide a low enough level of service and product at scale that they can survive.  But the upside is that forums are hard to monetize because they are not outrage machines, but a place where you talk to the same people, over and over again.  The format avoids mindless scrolling, and resists bragging or quick comebacks.  Instead forums get a community to create something together.

So thanks for being the kind of person who make me care enough to want to save a community.  Even if it's just saving it from gray-on-gray text and generally shitty UX.  That goes for the rest of you, too.
"Kitto daijoubu da yo." - Sakura Kinomoto
RE: Taparip
Hi Lab.

So running this went fine and it installed.
sudo cpanm Mojo::UserAgent Carp DBI Date::Manip

$domain is set to my forum
$apipath I left as is. (I'm assuming that's correct)
$dbfile I left as is /home/user/Downloads/Forum.db (do I need to swap "user" with my user account name?)
$endtopic  the most recent thread as in the one with the latest/last post by date? for example -t543.html
This is the thread ID with the very last post before the forums stopped being used a few weeks ago.
I don't understand this part completely. Is is jumping around the forum scraping all topics or in different sections of the forum only?

How do I run perl -c
What would the command look like if I had the file on my desktop?
perl -c user/desktop/  or whatever. Sorry I'm a newb. D:
RE: Taparip
That would work, sure.  Alternatively, you can cd into the directory and run it from there.

Sure, if that was the thread that was created last, do
my $endtopic = 543;
It's not the most recent thread with a new post, it's the most recent first post in a thread, made in any subforum.  If you're really out of sorts, make a new thread and see what number you get.  On the other hand, if you're not sure, just add a hundred or so to your best guess, and the bot will ignore threads not created yet.

Just general advice, and this is to everyone -- it's hard to permanently screw up your computer by accident on the command line.  Back in the elder days you could typo rm -r ./ as rm -r /. and delete everything on your hard drive, but by the year 2000 or so they realized this was really bad and prevent you from doing it.  Don't be afraid of your computer.

Oh, and make backups!  At work our data managers accidentally drop the wrong table so often it's embarrassing.  But we always have backups, so clients keep paying us the big bucks.  Not saying that there's a risk with my software, it's just good practice.
"Kitto daijoubu da yo." - Sakura Kinomoto
RE: Taparip
So I ran "perl -c" first and got this.

:~/Downloads/taparip-master$ perl -c syntax OK

Then tried running "perl" and it returned this.

:~/Downloads/taparip-master$ perl
Gathering data from https:// failed: Can't locate DBD/ in @INC (you may need to install the DBD::SQLite module) (@INC contains: /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.30.0 /usr/local/share/perl/5.30.0 /usr/lib/x86_64-linux-gnu/perl5/5.30 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.30 /usr/share/perl/5.30 /usr/local/lib/site_perl /usr/lib/x86_64-linux-gnu/perl-base) at (eval 93) line 3.
Perhaps the DBD::SQLite perl module hasn't been fully installed,
or perhaps the capitalisation of 'SQLite' isn't right.
Available drivers: DBM, ExampleP, File, Gofer, Mem, Proxy, Sponge.
at line 258.
RE: Taparip
Huh, thought I already posted here.  I should have a real cpanfile.  Okay, let's try this, then:

cpanm DBD::SQLite

Basically the error message was right, the database interface was installed, but not the driver for the particular software, which in this case is SQLite.

And then retry.
"Kitto daijoubu da yo." - Sakura Kinomoto
RE: Taparip
"cpanm DBD::SQLite" worked and installed.

Running "perl -c" is still a no.  Sad

:~$ perl /home/user/Desktop/
Gathering data from https:// connect('dbname=/home/user/Downloads/Forum.db','',...) failed: unable to open database file at /home/user/Desktop/ line 258.
Can't call method "do" on an undefined value at /home/user/Desktop/ line 261.
RE: Taparip
Also, would there be a way to merge the "old" database with the current one on the new board?
RE: Taparip
It's certainly possible -- my crazy jerryrig of scripts could -- and did -- do that. (After all, I was importing old threads and posts while people were using the new board already.) It all depends on how you're managing the import of your old data to the new board, but if it involves simple SQL inserts, you're fine. Inserts won't remove anything that's already in the DB, they'll just add to it. (If, on the other hand, your transfer revolves around dropping and recreating tables, or deleting contents out right, you'll need to rethink what you're doing.)
-- Bob

I have been Roland, Beowulf, Achilles, Gilgamesh, Clark Kent, Mary Sue, DJ Croft, Skysaber.  I have been 
called a hundred names and will be called a thousand more before the sun grows dim and cold....

RE: Taparip
I guess I need more information at this point.  The documentation says that if a database file is missing, it will be created.  So the possibilities I can see as to why it can't be opened are:
* There's an existing file with that name, which is a corrupt file.  If there's something there, delete it and try again.
* The permissions aren't set correctly, the script needs read/write access to the directory containing the file.  chmod 777 [the path] and try again.

I suppose I could just send you an empty database file too, but that might not solve the problem.
"Kitto daijoubu da yo." - Sakura Kinomoto
RE: Taparip
Yeah, same thing. Did no one else have these issues? Lol. Want to send me an empty database file or walk me through how to create one myself.
RE: Taparip
Nope, no one else had these issues.  You do have sqlite3 installed, right?  Can you run sqlite3 on the command line?  ( .quit to quit )
"Kitto daijoubu da yo." - Sakura Kinomoto
RE: Taparip
Yes. Sad

SA@SA-HP-Pavilion:~$ sqlite3
SQLite version 3.31.1 2020-01-27 19:55:54
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.

Forum Jump:

Users browsing this thread: 1 Guest(s)