Why is this returning a "Not Found" with PHP and cURL? -


my script works other links tried, , same response curl (and lot smaller, code):

<?php     $url = $_get['url'];     $header = get_headers($url,1);     print_r($header);     function get_url($u,$h){         if(preg_match('/200/',$h[0])){             echo file_get_contents($u);         }         elseif(preg_match('/301/',$h[0])){             $nh = get_headers($h['location']);             get_url($h['location'],$nh);         }     }     get_url($url,$header); ?> 

but for: http://www.anthropologie.com/anthro/catalog/productdetail.jsp?subcategoryid=home-tabletop-utensils&id=78110&catid=home-tabletop&pushid=home-tabletop&popid=home&sortproperties=&navcount=355&navaction=top&fromcategorypage=true&selectedproductsize=&selectedproductsize1=&color=sil&colorname=silver&isproduct=true&isbigimage=&templatetype=

and: http://www.urbanoutfitters.com/urban/catalog/productdetail.jsp?itemdescription=true&itemcount=80&startvalue=1&selectedproductcolor=&sortby=&id=14135412&parentid=a_furn_bath&sortproperties=+subcategoryposition,&navcount=56&navaction=poppushpush&color=&pushid=a_furn_bath&popid=a_decorate&prepushid=&selectedproductsize=

(and anthropologie product links). i'm assuming other sites have no yet found act way also. here header response:

array (     [0] => http/1.1 200 ok     [server] => apache     [x-powered-by] => servlet 2.4; jboss-4.2.0.ga_cp05 (build: svntag=jbpapp_4_2_0_ga_cp05 date=200810231548)/jbossweb-2.0     [x-atg-version] => version=rentlufeqyxbvedqbgf0zm9ybs85ljfwmsxbremgwybeufnmawnlbnnllzagif0=     [content-type] => text/html;charset=iso-8859-1     [date] => sat, 24 jul 2010 23:47:47 gmt     [content-length] => 21669     [connection] => keep-alive     [set-cookie] => array         (             [0] => jsessionid=65ca111adbf267a3b405c69a325576f8.app46-node2; path=/             [1] => visitcount=1; expires=fri, 29-may-2026 00:41:07 gmt; path=/             [2] => uoccii:=; expires=mon, 23-aug-2010 23:47:47 gmt; path=/             [3] => lastvisited=2010-07-24; expires=fri, 29-may-2026 00:41:07 gmt; path=/         )  ) 

i'm guessing maybe has cookies? ideas?

install fiddler , see being sent.

you can try setting user-agent real browser. sites try prevent scraping checking this.


Comments

Popular posts from this blog

c++ - How do I get a multi line tooltip in MFC -

asp.net - In javascript how to find the height and width -

c# - DataTable to EnumerableRowCollection -