Automatic downloading of Hi Rez images.

P

Peter Jason

I have Firefox, Win7 SP1.

Is there any software to download from the
Internet a list of urls in a particular folder,
and then store the resultant images in that or
another folder. I need this to be automatic to
use the off-peak ISP rates.

Say I want to download images that are normally
selected from a thumbnail. I can download these
small thumbnails and their file names (there are
hundreds) and store them in a folder. With the
"ReNamer" software...
http://www.den4b.com/
I can manipulate their filenames to fit into the
main site's webpage and so download the HiRez
images. This works manually, but I need
something to do it automatically.

Peter
 
P

Paul

Peter said:
Thanks, I'll check this out.
You might also want to look for someone building the
latest version (and fixing the bugs in it). While the title
of this page indicates 112, there are links to 1.13.4 within.

http://opensourcepack.blogspot.ca/2010/05/wget-112-for-windows.html

Commands like that, have a gazillion options, but at
least one useful one, is changing the UserAgent string
that WGET sends to the server. Some servers control content,
based on the reported user agent. So faking it, and telling
the server "I'm Firefox", might be required in some cases.
You can verify the UserAgent string, by using Wireshark
packet tracer, as long as the web session is unencrypted.
Note how your Firefox sends the string, and then adjust
WGET to do the same. (The Firefox about:config might have the
info, but using a packet sniffer is to see what's really being
sent. It also allows you to see what the WGET sends.)

Using the latest version, is in case there's some security
issue with it. WGET is like a web browser in some respects.
And certain server behaviors, can result in a WGET command
that sits in a loop, or does other bad things. Commands like
that, become less bulletproof, as they gain more features
of a real browser.

The only time I've used WGET, is in the middle of a Gentoo
install, where there's no GUI and only the command line, and
the instructions say to download a couple files off the web.
Then, the WGET command comes in handy (since there's no
GUI to use for a browser).

Paul
 
V

VanguardLH

Peter Jason said:
Is there any software to download from the Internet a list of urls in
a particular folder, and then store the resultant images in that or
another folder. I need this to be automatic to use the off-peak ISP
rates.
Search online for "web crawler" or "web spider". These browse a site
while navigating through all the links in each page to retrieve the
pages for that site and cache them locally. Usually they have a setting
to determine how deep you navigate into a site since a site could have
thousands of pages that you may never see or don't want to see; that is,
you would be wasting time and bandwidth to navigate to and retrieving
highly buried pages that you are not likely to care about.

Some sites don't like these robots crawling around their web sites
because they know a human isn't the one doing the visiting. Search
engines roam through web sites to gather their statistics but there are
also users that use these robots to steal content to reuse elsewhere.
Web sites can use a robots.txt file to tell robots not to roam through
their web site and copy its content but only robots that honor that
request will comply and some robots don't.

http://www.robotstxt.org/

I've seen some sites, knowing that many robots will misbehave and not
honor the request, that will stall or delay the delivery of linked pages
down to a speed expected from that of a human user. That is, they won't
deliver the next page until after a period of time of delivering the
prior page. They throttle their delivery of their web pages. Humans
won't notice the difference but robots certainly will. Think of it like
a foot race between you and a robot. The robot could run a lot faster
but it gets slowed down so it can't run any faster than you. This also
slows down the overhead a web server has to endure when misbehaving
robots attack their site. This means your robot won't be able to
retrieve their pages any faster than you could.

If a search at one of the common or well-known download sites
(download.com, softpedia.com) don't turn up any robots that you like,
the above robots help site has a list of known robots you could look at.

I haven't used a web crawler in well over a decade. My recollection is
that it stored the web site being crawled so that it built up a copy of
the web site in a local folder. Nowadays that may not work as well as
you would like. Streamed media isn't stored on your local hard disk so
you won't have a copy of it (unless you use stream capture software).
That content won't get saved in the local copy of the web site. It will
still be externally linked and you'll have to be online to see it even
when loading the local copies of their web pages. Anything externally
linked at the web site may end up externally linked in your copy of
their web pages. Also, many sites now employ dynamic web pages. Their
scripts (some of which are server-side scripts that you will NEVER be
able to see or to retrieve) decide what content to generate in a web
page on your visit and that content may change depending on when you
visit and how you navigated through their site. So the web crawler
might show different content each time you have it crawl through a site.
Dynamic content, AJAX, and streamed media are becoming much more popular
and prevalent so you'll have to find out how your choice of web crawler
handles those.
 
S

Stan Brown

I have Firefox, Win7 SP1.

Is there any software to download from the
Internet a list of urls in a particular folder,
and then store the resultant images in that or
another folder. I need this to be automatic to
use the off-peak ISP rates.
Yes. Wget comes to mind.
 
N

Nil

Oops -- great minds think alike, I see.

You made it *too* easy by giving the URL, though. :)
According to Paul, there may be some bugs to deal with, and the OP has
to write a batch file or script to make it do what he wants, and the
web site may present some obstacles. Still, that's what WGET was made
for, to retrieve files off a web site in an unattended mode. I've used
it for several simple tasks, but not for anything as complex as the
OP's want, if I understand him (which I'm not sure I do.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top