?

Log in

No account? Create an account

Previous Entry

Scrapbook Photo Backups

While I don't really believe all the "LJ is going to close" drama brought about by yesterday's sudden layoffs (I do feel for those laid off with such lack of notice or severance, though - my thoughts go out to you all!) (if you have no idea what I'm talking about, check out this post, which lays it out quite well), I did realize it would definitely be prudent to back up my LJ and Scrapbook files, which I have been quite remiss in doing to date. LJ entry backup is quite well documented at these links:

However, Scrapbook backup is poorly documented (if at all, really). The best resource I've come across is this entry in lj_dev, which has solutions that basically don't work. So I took it upon myself to fix this situation. :)

The basic problem with the Perl client is that it was never updated to deal with LJ's split-authentication system after there were some security problems that prompted changes a long while back. Thanks to the wonders of the WWW::Mechanize Perl module, I believe I've fixed that client so that it works perfectly fine, and so, I present to you:

I will try to keep this entry up-to-date with the latest checksums if I make any further changes - if you download a version with any other checksums, it may not be the most up-to-date. This client is intended for people who are somewhat technical, and have Perl installed on their systems. You'll probably also need to be able to install Perl modules. I apologize, I don't have time to write up detailed instructions on any of that right now. Once you have those bits done, though, all you need to do is create a ".fotoup.conf" file with your configuration in it. The minimum configuration looks like:

server: pics.livejournal.com
username: your LJ username
password: your LJ password
backupdir: a directory on your system that you want to back up to

That file should be in your UNIX home directory. If you're on Windows, I'm not sure on the correct location, sorry. Now just run "./fotoup.pl --backup", and wait. You should get output that looks something like this:

$ ./fotoup.pl --backup
To upload: 0 from data, 0 from receipt

Fetching export.xml from server...
Total pictures: 3563
Already backed up: 0
Pictures to backup: 3563
Bytes to fetch over network: 4889620910
  Fetching image 1/3563 ...  0.1%
  Fetching image 2/3563 ...  0.1%
  Fetching image 3/3563 ...  0.2%
  Fetching image 4/3563 ...  0.3%
  Fetching image 5/3563 ...  0.3%
...

Sometimes you will see an "MD5 of downloaded file doesn't match, retrying." error - don't panic, the server just probably had a hiccup - the script will automatically retry once before failing fatally. If it does fail again, you can re-start and you won't lose all the files you already downloaded - just run the exact same command again. It'll then look something like this:

$ ./fotoup.pl --backup
To upload: 0 from data, 0 from receipt

Fetching export.xml from server...
Total pictures: 3563
Already backed up: 5
Pictures to backup: 3558
Bytes to fetch over network: 4874557302
  Fetching image 1/3558 ...  0.1%
  Fetching image 2/3558 ...  0.1%
  Fetching image 3/3558 ...  0.2%
  Fetching image 4/3558 ...  0.2%
  Fetching image 5/3558 ...  0.2%
...

Note the "Already backed up" number. I added the re-try functionality because I was seeing this error quite often, hopefully it won't occur twice in a row very often, so you shouldn't have to restart repeatedly, but you might want to keep an eye on things anyway, and re-start if necessary.

I'm sorry these instructions aren't more non-geek-friendly. Please feel free to play around and write up better instructions if you'd like. If you want to link to the updated version of the client, please consider linking to this entry so ensure that the MD5/SHA1 checksums are included as well. Feel free to put links to your instructions in the comments here as well, so that others can also benefit from them. Thanks!

Edit 2009-01-08 08:37 EST: Updated fotoup.pl on the server, new MD5/SHA1 sums: abeedb22e0937ab0f4d9da62347993e3/188f0506ed8be5dbb4b7ed833dfa6a2599bffeba New version should fix Windows problems with MD5, still looking at the Windows 404 loops.

Tags:

Comments

( 23 comments — Leave a comment )
tallblue
Jan. 7th, 2009 08:24 pm (UTC)
Thank you so much!!!! I am getting the "MD5 of download file doesn't match, retrying." errors, but I will keep trying :)
krellis
Jan. 7th, 2009 08:26 pm (UTC)
Is it retrying and then failing every time, or is it just retrying and then succeeding? If it's failing every time, there could be a different type of problem. In my (continuing) backup, I'm just seeing an MD5 failure every 20-30 downloads, and when it retries, it usually succeeds.
tallblue
Jan. 7th, 2009 09:28 pm (UTC)
It goes up to file 11 and the previous files are only 0.1 percent. it is giving me this error at file 11 even if I del everything and start over.
krellis
Jan. 7th, 2009 09:31 pm (UTC)
If you have a lot of files the 0.1 percent is normal. It'll eventually get higher. :) When you get to that file, do you get something like this:

Fetching image 13/2782 ... MD5 of downloaded file doesn't match, retrying.
Fetching image 13/2782 ... 0.7%
... (continues with more files)

or something like this:

Fetching image 13/2782 ... MD5 of downloaded file doesn't match, retrying.
Fetching image 13/2782 ... MD5 of downloaded file doesn't match.
(script stops running)

(Obviously the numbers would be different, but hopefully you get the idea.)
tallblue
Jan. 8th, 2009 04:09 am (UTC)
Actually I keep getting the MD5 errors and nothing has progressed.


BTW I am running XP with active perl.

Edited at 2009-01-08 04:11 am (UTC)
krellis
Jan. 8th, 2009 01:39 pm (UTC)
Okay, I believe I found the MD5 error on Windows - I've updated the copy of fotoup.pl on my web server, just use the link in the entry to download it again (new checksums have been updated in the entry as well). Give this new version a try and see if it gets you past the MD5 errors.
tallblue
Jan. 8th, 2009 05:54 pm (UTC)
So far it is working great! I have a lot of files. Thank you so much!
bychoice
Jan. 7th, 2009 08:34 pm (UTC)
I'm getting the following error:
Bytes to fetch over network: 98117639
Fetching image 1/639 ... Error GETing http://pics.livejournal.com/bychoice/pic/000tq8ek: Not Found at ./fotoup.pl line 644

Now, it is true that this file doesn't exist. Any chance we can get some code to skip over errors or otherwise deal with them? I don't see any way to get around this on my end. Help would be appreciated.
krellis
Jan. 7th, 2009 08:35 pm (UTC)
It should skip over a 404 error. I ran into one of those myself, and it just went right along. In your case it's dying after that and not going on?
bychoice
Jan. 8th, 2009 12:51 am (UTC)
Yes, it is dying and not going on. It downloaded a bunch of images before that and now each time I run it, it dies.
leprosy
Jan. 7th, 2009 10:22 pm (UTC)
Discovered your patch via a websearch
I'm cursed with a WinXP system and I've discovered that the .fotoup.pl needs to be in the root dir.

I get the MD5 error immediately and every time. I'm inclined to fire up Linux and try it there and/or wait until the backup panic of '09 dies down a bit.

EDIT: It turns out that if I just print out the second MD5 check instead of dieing, the files are all downloaded OK. I'm not a good enough perl hacker to see why the hash check is actually failing though.

Edited at 2009-01-07 11:07 pm (UTC)
krellis
Jan. 7th, 2009 11:11 pm (UTC)
Re: Discovered your patch via a websearch
If you print both the $md5 variable and $p->{'md5'} that it's being compared to, does one have upper-case letters and the other lower-case?
leprosy
Jan. 8th, 2009 12:26 am (UTC)
Re: Discovered your patch via a websearch
They don't match at all.

$md5 contained 8b3e2ec3465e9ea696e8a5d9ecb171a2

$p->{'md5'} and the actual downloaded file's MD5 was cb07cff00b99073c8c038b5f219f74d5
krellis
Jan. 8th, 2009 02:20 am (UTC)
Re: Discovered your patch via a websearch
Hmm, that's just quite odd. I'll have to fire up Perl on a Windows box to see if I can replicate that. Are you using ActiveState Perl, Cygwin, or something else? Also, if the photo in question is public, can you send me the link (either here or to my username @ livejournal dot com)? You can find the link in the export.xml by searching for the MD5. Thanks!
leprosy
Jan. 8th, 2009 02:46 am (UTC)
Re: Discovered your patch via a websearch
Both cygwin and active state are on my system.

perl -v says the active state version is the one in use.
(I got the same md5 issue after renaming the cygwin dir just now)

All three of the photos I downloaded failed the MD5 check.

cb07cff00b99073c8c038b5f219f74d5
http://pics.livejournal.com/leprosy/pic/00002hr5

fe499e186e4ef9b5b0bfe0fbb3214bda
http://pics.livejournal.com/leprosy/pic/000039c3

3da731b9b2875af3e7bea014954bc385
http://pics.livejournal.com/leprosy/pic/00004rqx

It's actually not a big deal for me, but we're looking into your code to give some nervous deadjournal users an easier way to get their stuff backed up.

krellis
Jan. 8th, 2009 01:40 pm (UTC)
Re: Discovered your patch via a websearch
I figured out the MD5 issue, the file handle wasn't being switched into binary mode before the MD5 hash was taken. I've placed a new copy of fotoup.pl on my web server at the link in the original entry, and updated the MD5/SHA1 checksums there as well. Download that and see if it works for you - I've verified that I was getting repeated MD5 errors before the fix on my Windows box, and it's working now after the fix, so hopefully it'll work for you as well!
amymarr
Jan. 8th, 2009 04:42 pm (UTC)
Re: Discovered your patch via a websearch
Just so you know, I don't understand a word of this comment. Not one. :)
krellis
Jan. 8th, 2009 04:57 pm (UTC)
Re: Discovered your patch via a websearch
Not even "I"? :)
leprosy
Jan. 8th, 2009 08:03 pm (UTC)
Re: Discovered your patch via a websearch
It works like a charm!

Great work,
jackal
Jun. 15th, 2009 01:31 am (UTC)
Any help for MacOS X ?
What about MacOS X users?

$ bin/fotoup.pl-krellis --backup
Can't locate WWW/Mechanize.pm in @INC (@INC contains: /Library/Perl/Updates/5.8.8 /System/Library/Perl/5.8.8/darwin-thread-multi-2level /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin-thread-multi-2level /Library/Perl/5.8.8 /Library/Perl /Network/Library/Perl/5.8.8/darwin-thread-multi-2level /Network/Library/Perl/5.8.8 /Network/Library/Perl /System/Library/Perl/Extras/5.8.8/darwin-thread-multi-2level /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 /Library/Perl/5.8.1/darwin-thread-multi-2level /Library/Perl/5.8.1 .) at bin/fotoup.pl-krellis line 22.
BEGIN failed--compilation aborted at bin/fotoup.pl-krellis line 22.
jackal
Jun. 15th, 2009 01:52 am (UTC)
Re: Any help for MacOS X ?
Actually I think I have it.

# sudo -H /usr/bin/cpan -i WWW::Mechanize


cpan comes with the XCode development package.
cmshaw.dreamwidth.org
Apr. 20th, 2010 10:04 pm (UTC)
fotoup
Hello, I don't know if you're still supporting/caring about this, but I figure it can't hurt to ask!

I installed this and ran it all the way to 95.5% completion, but now I am getting this error:

  Fetching image 569/581 ...  95.5%
  Fetching image 570/581 ... Error GETing http://pics.livejournal.com/cmshaw/pic
/000sdgka: Not Found at fotoup.pl line 646

C:\Users\xxx\Downloads>perl fotoup.pl --backup
To upload: 0 from data, 0 from receipt

Fetching export.xml from server...
Total pictures: 586
Already backed up: 574
Pictures to backup: 12
Bytes to fetch over network: 2973494
  Fetching image 1/12 ... Error GETing http://pics.livejournal.com/cmshaw/pic/00
0sdgka: Not Found at fotoup.pl line 646

The thing is, I'm looking through the export.xml file that's downloaded and there isn't an "000sdgka" anywhere in that file. Do you know what's going on?

In any case, thank you for helping me download the first 569 (or 574?) image files!
joecarnahan
Jun. 3rd, 2012 03:11 am (UTC)
The new Scrapbook
I tried getting fotoup.pl to work this week and found myself thwarted by the new Scrapbook implementation. I was able to hack a way to download my pictures using wget (described here), but I figured I should also give you a heads-up that the new image hosting means that pics.livejournal.com doesn't work anymore. :-/
( 23 comments — Leave a comment )