Thread Rating:
  • 1 Vote(s) - 5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Taparip
Taparip
#1
Hi! So I have Ubuntu installed with the latest version of Perl, Sqlite 3 and the Taparip repository. Help me, Obi-Wan Kenobi. You're my only hope.
Reply
RE: Taparip
#2
Do you have cpanm installed? Or just cpan? Those are the module installers. Both will work, just cpan is a little older and more uh, interactive.
"Kitto daijoubu da yo." - Sakura Kinomoto
Reply
RE: Taparip
#3
Hm running instmodsh shows I have cpanminus installed. I change that if needed.
Reply
RE: Taparip
#4
Then try running what's there in the readme:
sudo cpanm Mojo::UserAgent Carp DBI Date::Manip

After that, you need to open up taparip.pl and start changing configuration.  I was lame and lazy, so you have to edit it directly.  Maybe if I ever decide to care enough I can make it an INI or YAML or something.  Anyway, you need to edit this section:

https://github.com/labster/taparip/blob/...rip.pl#L20

$domain should link to the path of the forum you're trying to rip.  Change the URL to your forum.
$apipath is probably still the same -- it will break right away if run and you'll know it if it's wrong, and then I'd actually have to look.
$dbfile is where you save the output.  Pick some directory you can write to.  Or just use sudo, lol (this software makes no warranty for any purpose lol)
The next one you should set is $endtopic.  Get the thread number of the most recent thread from the URLs around the forum.
$repeat_thread is well... okay, this is how the software works:  Within the list of thread numbers, it jumps around at random to simulate organic traffic.  Therefore someone looking at server logs won't think it's obvious they're getting crawled.  However, if you restart the script, the order will be re-randomized, so it might try to download threads you've already downloaded.  If it's set to 0, you get everything once.  If 1, you can get threads again -- most useful when you need to pick up new posts.
$delay - 2 seconds seems like we're not trying to murder their servers, right?
$verbose - you like words, don't you?
$username and $password -- If there are private forums you're trying to capture, you'll need this.  I believe someone said it worked.  However, I migrated this forum without needing to login, since it's all public.
$authorz - um, some guy added this to my script later, leave it alone I guess.

perl -c taparip.pl will check the code compilation without running anything, so you can check to see we have all the modules and you didn't mess up syntax before actually running it.
"Kitto daijoubu da yo." - Sakura Kinomoto
Reply
RE: Taparip
#5
Man, I forgot about how you saved our forum with that, Lab. Much kudos, once again.

Though I still need to go through and repost a bunch of my fiction that got horribly munged in the serial migrations. I'm pretty sure it was all crapatalk's fault, anyhow.
Reply
RE: Taparip
#6
(07-25-2020, 03:31 AM)Black Aeronaut Wrote: Man, I forgot about how you saved our forum with that, Lab.  Much kudos, once again.

Though I still need to go through and repost a bunch of my fiction that got horribly munged in the serial migrations.  I'm pretty sure it was all crapatalk's fault, anyhow.

Some of it was my fault, most of it was their fault.  I still have the originals in HTML form if you want me to give it another go.  Before I had to a chance to screw it up, but after they did.

The unfortunate part of web forums is that they're really hard to monetize, so only a few companies can exist that provide them.  So we get these large mergers, as they get big enough they can provide a low enough level of service and product at scale that they can survive.  But the upside is that forums are hard to monetize because they are not outrage machines, but a place where you talk to the same people, over and over again.  The format avoids mindless scrolling, and resists bragging or quick comebacks.  Instead forums get a community to create something together.

So thanks for being the kind of person who make me care enough to want to save a community.  Even if it's just saving it from gray-on-gray text and generally shitty UX.  That goes for the rest of you, too.
"Kitto daijoubu da yo." - Sakura Kinomoto
Reply
RE: Taparip
#7
Hi Lab.

So running this went fine and it installed.
sudo cpanm Mojo::UserAgent Carp DBI Date::Manip

$domain is set to my forum
$apipath I left as is. (I'm assuming that's correct)
$dbfile I left as is /home/user/Downloads/Forum.db (do I need to swap "user" with my user account name?)
$endtopic  the most recent thread as in the one with the latest/last post by date? for example -t543.html
This is the thread ID with the very last post before the forums stopped being used a few weeks ago.
I don't understand this part completely. Is is jumping around the forum scraping all topics or in different sections of the forum only?

How do I run perl -c taparip.pl
What would the command look like if I had the taparip.pl file on my desktop?
perl -c user/desktop/taparip.pl  or whatever. Sorry I'm a newb. D:
Reply
RE: Taparip
#8
That would work, sure.  Alternatively, you can cd into the directory and run it from there.

Sure, if that was the thread that was created last, do
my $endtopic = 543;
It's not the most recent thread with a new post, it's the most recent first post in a thread, made in any subforum.  If you're really out of sorts, make a new thread and see what number you get.  On the other hand, if you're not sure, just add a hundred or so to your best guess, and the bot will ignore threads not created yet.

Just general advice, and this is to everyone -- it's hard to permanently screw up your computer by accident on the command line.  Back in the elder days you could typo rm -r ./ as rm -r /. and delete everything on your hard drive, but by the year 2000 or so they realized this was really bad and prevent you from doing it.  Don't be afraid of your computer.

Oh, and make backups!  At work our data managers accidentally drop the wrong table so often it's embarrassing.  But we always have backups, so clients keep paying us the big bucks.  Not saying that there's a risk with my software, it's just good practice.
"Kitto daijoubu da yo." - Sakura Kinomoto
Reply
RE: Taparip
#9
So I ran "perl -c taparip.pl" first and got this.

:~/Downloads/taparip-master$ perl -c taparip.pl
taparip.pl syntax OK


Then tried running "perl taparip.pl" and it returned this.

:~/Downloads/taparip-master$ perl taparip.pl
Gathering data from https://https://www.tapatalk.com/groups/GROUPNAME//viewtopic.phpinstall_driver(SQLite) failed: Can't locate DBD/SQLite.pm in @INC (you may need to install the DBD::SQLite module) (@INC contains: /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.30.0 /usr/local/share/perl/5.30.0 /usr/lib/x86_64-linux-gnu/perl5/5.30 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.30 /usr/share/perl/5.30 /usr/local/lib/site_perl /usr/lib/x86_64-linux-gnu/perl-base) at (eval 93) line 3.
Perhaps the DBD::SQLite perl module hasn't been fully installed,
or perhaps the capitalisation of 'SQLite' isn't right.
Available drivers: DBM, ExampleP, File, Gofer, Mem, Proxy, Sponge.
at taparip.pl line 258.
Reply
RE: Taparip
#10
Huh, thought I already posted here.  I should have a real cpanfile.  Okay, let's try this, then:

cpanm DBD::SQLite

Basically the error message was right, the database interface was installed, but not the driver for the particular software, which in this case is SQLite.

And then retry.
"Kitto daijoubu da yo." - Sakura Kinomoto
Reply
RE: Taparip
#11
"cpanm DBD::SQLite" worked and installed.

Running "perl -c taparip.pl" is still a no.  Sad


:~$ perl /home/user/Desktop/taparip.pl
Gathering data from https://https://www.tapatalk.com/groups/GROUP//viewtopic.phpDBI connect('dbname=/home/user/Downloads/Forum.db','',...) failed: unable to open database file at /home/user/Desktop/taparip.pl line 258.
Can't call method "do" on an undefined value at /home/user/Desktop/taparip.pl line 261.
Reply
RE: Taparip
#12
Also, would there be a way to merge the "old" database with the current one on the new board?
Reply
RE: Taparip
#13
It's certainly possible -- my crazy jerryrig of scripts could -- and did -- do that. (After all, I was importing old threads and posts while people were using the new board already.) It all depends on how you're managing the import of your old data to the new board, but if it involves simple SQL inserts, you're fine. Inserts won't remove anything that's already in the DB, they'll just add to it. (If, on the other hand, your transfer revolves around dropping and recreating tables, or deleting contents out right, you'll need to rethink what you're doing.)
-- Bob

I have been Roland, Beowulf, Achilles, Gilgamesh, Clark Kent, Mary Sue, DJ Croft, Skysaber.  I have been 
called a hundred names and will be called a thousand more before the sun grows dim and cold....
Reply
RE: Taparip
#14
I guess I need more information at this point.  The documentation says that if a database file is missing, it will be created.  So the possibilities I can see as to why it can't be opened are:
* There's an existing file with that name, which is a corrupt file.  If there's something there, delete it and try again.
* The permissions aren't set correctly, the script needs read/write access to the directory containing the file.  chmod 777 [the path] and try again.

I suppose I could just send you an empty database file too, but that might not solve the problem.
"Kitto daijoubu da yo." - Sakura Kinomoto
Reply
RE: Taparip
#15
Yeah, same thing. Did no one else have these issues? Lol. Want to send me an empty database file or walk me through how to create one myself.
Reply
RE: Taparip
#16
Nope, no one else had these issues.  You do have sqlite3 installed, right?  Can you run sqlite3 on the command line?  ( .quit to quit )
"Kitto daijoubu da yo." - Sakura Kinomoto
Reply
RE: Taparip
#17
Yes. Sad

SA@SA-HP-Pavilion:~$ sqlite3
SQLite version 3.31.1 2020-01-27 19:55:54
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite>
Reply
RE: Taparip
#18
I do still exist here but real life demands are taking hits on my time.  And I tend to solve the technical problems of the people paying me $40/hour before random people on the internet, I'm sure you understand.

Okay, pushed a new revision of the software, with a slightly updated schema.  Download it again, either through git stash;git pull;git stash pop or downloading a new ZIP, depending on how you got the software originally.  If you do the second, you'll have to copy all of the config you changed into the new file.

Try running it first and see if you get a different error.  If you do, report back here.  If no new error, then try running
sqlite3 --init schema.sql <path to your $db_name>
with the appropriate file path added.
"Kitto daijoubu da yo." - Sakura Kinomoto
Reply
RE: Taparip
#19
I totally understand and greatly appreciate all your help and time you’ve given me! I tried running it but still get the same error.


SA@SA-HP:~/Downloads$ perl taparip.pl
Gathering data from https://https://www.tapatalk.com/groups/GROUP//viewtopic.phpDBI connect('dbname=/home/user/Downloads/Forum.db','',...) failed: unable to open database file at taparip.pl line 258.
Can't call method "do" on an undefined value at taparip.pl line 261.
SA@SA-HP:~/Downloads$


------------------------
I went back and tried chmod 777 as you suggested earlier as well.

SA@SA-HP:~$ chmod 777 /home/user/Downloads/taparip-master/taparip.pl
SA@SA-HP:~$ perl /home/user/Downloads/taparip-master/taparip.pl
Gathering data from https://https://www.tapatalk.com/groups/GROUP//viewtopic.php
parent directory of $db_file '/home/user/Downloads/Forum.db' does not exist, cannot create schema at /home/user/Downloads/taparip-master/taparip.pl line 261.
SA@SA-HP:~$


Also, what about giving me a blank Forum.db that I could drop in place to see if it works? Or the steps to create my own if it's a simple command I can just copy/paste?
Reply
RE: Taparip
#20
That's not the same error. You're trying to save the database to a folder that doesn't exist. You need to change $db_path to a folder that actually exists. You can name the file anything you want -- Forum.db is just a suggestion. But it has to live in a folder.

(In theory I could mkdir -p that folder, but what if I don't have privilege for that? Seems like madness to go down that path, pun not intended.)
"Kitto daijoubu da yo." - Sakura Kinomoto
Reply
RE: Taparip
#21
Ahh I’ve been screwing up the whole time. Which way should I try executing this command, with or without doing chmod 777? I also copy and pasted the lines noted.



Code:
SA@SA-HP:~$ perl /home/SA/Desktop/taparip.pl
syntax error at /home/SA/Desktop/taparip.pl line 29, near "my "
Global symbol "$start_topic" requires explicit package name (did you forget to declare "my $start_topic"?) at /home/SA/Desktop/taparip.pl line 29.
Global symbol "$start_topic" requires explicit package name (did you forget to declare "my $start_topic"?) at /home/SA/Desktop/taparip.pl line 101.
Execution of /home/SA/Desktop/taparip.pl aborted due to compilation errors.



Line 29 looks like this:
my $start_topic = 1;

Line 101 looks like this:
my @topic_list = scalar(@get_topics) ? @get_topics : ($start_topic .. $end_topic );




Code:
SA@SA-HP:~$ chmod 777 /home/SA/Downloads/taparip-master/taparip.pl
SA@SA-HP:~$ perl /home/SA/Downloads/taparip-master/taparip.pl
Gathering data from https://https://www.tapatalk.com/groups/group//viewtopic.php
DBI connect('dbname=/home/SA/Downloads/ForumRip','',...) failed: unable to open database file at /home/SA/Downloads/taparip-master/taparip.pl line 267.
Can't call method "selectall_arrayref" on an undefined value at /home/SA/Downloads/taparip-master/taparip.pl line 90.
SA@SA-HP:~$


Line 90:
my %seen_users = map {$_->[0] => 1} @{ $dbh->selectall_arrayref("SELECT username FROM users") };

Line 267:
my $dbh = DBI->connect("dbi:SQLite:dbname=$db_file","","");
Reply
RE: Taparip
#22
> syntax error at /home/SA/Desktop/taparip.pl line 29, near "my "

That means there's no semicolon at the end of the previous (non-comment) line. Look up to line 25.
"Kitto daijoubu da yo." - Sakura Kinomoto
Reply
RE: Taparip
#23
Ah fixed it but still getting this.

SA@SA-HP:~$ perl /home/SA/Desktop/taparip.pl
Gathering data from https://https://www.tapatalk.com/groups/group//viewtopic.php
DBI connect('dbname=/home/SA/Downloads/ForumRip','',...) failed: unable to open database file at /home/SA/Downloads/taparip-master/taparip.pl line 267.
Can't call method "selectall_arrayref" on an undefined value at /home/SA/Downloads/taparip-master/taparip.pl line 90.
SA@SA-HP:~$
Reply
RE: Taparip
#24
Hi. I keep forgetting to come by here. Got busy with life, work and school. Any ideas for my previous post? 
Could I possibly give you the URL and user/pw for the boards and pay a nominal fee for your time? Smile

Thanks!
Reply
RE: Taparip
#25
Nah, the money doesn't help at all.  It's the free time.  Do you know any dynamic language devs?  My workplace is totally hiring.

Possibly I could just do it myself -- are you looking for a backup, or are you looking to migrate over to a new forum?
"Kitto daijoubu da yo." - Sakura Kinomoto
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)