url - How do I check for valid (not dead) links programmatically using PHP? -
given list of urls, check each url:
- returns 200 ok status code
- returns response within x amount of time
the end goal system capable of flagging urls potentially broken administrator can review them.
the script written in php , run on daily basis via cron.
the script processing approximately 1000 urls @ go.
question has 2 parts:
- are there bigtime gotchas operation this, issues have run into?
- what best method checking status of url in php considering both accuracy , performance?
use php curl extension. unlike fopen() can make http head requests sufficient check availability of url , save ton of bandwith don't have download entire body of page check.
as starting point use function this:
function is_available($url, $timeout = 30) { $ch = curl_init(); // curl handle // set curl options $opts = array(curlopt_returntransfer => true, // not output browser curlopt_url => $url, // set url curlopt_nobody => true, // head request curlopt_timeout => $timeout); // set timeout curl_setopt_array($ch, $opts); curl_exec($ch); // it! $retval = curl_getinfo($ch, curlinfo_http_code) == 200; // check if http ok curl_close($ch); // close handle return $retval; }
however, there's ton of possible optimizations: might want re-use curl instance and, if checking more 1 url per host, re-use connection.
oh, , code check strictly http response code 200. not follow redirects (302) -- there curl-option that.
Comments
Post a Comment