Figure out if a website has restricted/password protected area -


i have big list of websites , need know if have areas password protected.

i thinking doing this: downloading of them httrack , writing script looks keywords "log in" , "401 forbidden". problem these websites different/some static , dynamic (html, cgi, php,java-applets...) , of them won't use same keywords...

do have better ideas?

thanks lot!

looking password fields far, won't sites use http authentication. looking 401s http authentication, won't sites don't use it, or ones don't return 401. looking links "log in" or "username" fields more.

i don't think you'll able entirely automatically , sure you're detecting password-protected areas.

you'll want take library @ web automation, , write little program reads list of target sites file, checks each one, , writes 1 file of "these passworded" , "these not", , might want go manually check ones not, , make modifications program accomodate. using httrack great grabbing data, it's not going detection -- if write own "check password protected area" program general purpose hll, can more checks, , can avoid generating more requests per site necessary determine password-protected area exists.

you may need ignore robots.txt

i recommend using the python port of perls mechanize, or whatever nice web automation library preferred language has. modern languages have nice library opening , searching through web pages, , looking @ http headers.

if not capable of writing yourself, you're going have rather difficult time using httrack or wget or similar , searching through responses.


Comments

Popular posts from this blog

c++ - How do I get a multi line tooltip in MFC -

asp.net - In javascript how to find the height and width -

c# - DataTable to EnumerableRowCollection -