2016-04-23 2 views
1

zieht, möchte ich Daten von amazon besten Geschäften url.and ziehen, um nur den Produktteil nicht das Gesamte ie zu zeigen. Header und Sidebar und begrenzen auf 8 Produkte. ich bin mit curl und einfachen html dom in phpwie man Daten von anderer Web site mit curl in php

include_once("php/simple_html_dom.php"); 
//use curl to get html content 
function getHTML($url,$timeout) 
{ 
     $ch = curl_init($url); // initialize curl with given url 
     curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER["HTTP_USER_AGENT"]); // set useragent 
     curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // write the response to a variable 
     curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // follow redirects if any 
     curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout); // max. seconds to execute 
     curl_setopt($ch, CURLOPT_FAILONERROR, 1); // stop when it encounters an error 
     return @curl_exec($ch); 
} 
echo $html=getHTML("http://www.amazon.in/gp/goldbox/ref=nav_topnav_deals",10); 

?> 

aber das Problem ist, es zieht alle Inhalte, aber ich mag das Teilprodukt div nur
und amazon div Behälter für das Produkt ist

<div id="100_dealView_0" class="a-section a-spacing-none tallCellView gridColumn4 singleCell"> 

     <div class="a-section dealContainer"> 

    <div class="a-section backGround layer"> 
    </div> 

    <div class="a-section layer"> 

      <div class="a-row dealContainer dealTile"> 


     <a id="dealImage" class="a-link-normal" href="https://www.amazon.in/s/ref=gbps_img_s-4_0227_af8a024a?fst=as%3Aoff&amp;rh=n%3A1571283031%2Cn%3A1983396031%2Ck%3A23rdApril_runningshoes_dotdlist%2Cp_76%3A1318482031%2Cp_6%3AA14FG3FHN6HO9H&amp;keywords=23rdApril_runningshoes_dotdlist&amp;ie=UTF8&amp;qid=1460093112&amp;rnid=1318474031&amp;smid=A14FG3FHN6HO9H&amp;pf_rd_p=900470227&amp;pf_rd_s=slot-4&amp;pf_rd_t=701&amp;pf_rd_i=gb_main&amp;pf_rd_m=A1VBAL9TL5WCBF&amp;pf_rd_r=13ED3AZVD21FX9VX9SS1"> 
      <div class="a-row a-spacing-base a-spacing-top-base imageBlock"> 
       <div class="a-row dealContainer"> 
        <div class="a-row layer"> 
         <img alt="" src="https://images-na.ssl-images-amazon.com/images/I/51%2BpumuEs%2BL._AA210_.jpg" data-a-hires="https://images-na.ssl-images-amazon.com/images/I/51%2BpumuEs%2BL._AA420_.jpg"> 
        </div> 
        <div class="a-row layer backGround"> 
        </div> 
       </div> 
      </div> 
     </a> 



        <div class="a-row a-spacing-mini"> 


     <span class="a-size-mini a-color-base dotdBadge">DEAL OF THE DAY</span> 

</div> 

       <div class="a-row a-spacing-mini"> 

      <div class="a-row priceBlock unitLineHeight"> 
       <span class="a-size-medium a-color-base inlineBlock unitLineHeight">₹549 - ₹5,399</span> 
      </div> 

</div> 
       <div class="a-row a-spacing-mini"> 

     <div class="a-row unitLineHeight"> 
      <span class="a-size-mini a-color-secondary inlineBlock unitLineHeight"> 
       Ends in 
      </span> 

      <span id="100_dealView_0_dealClock" class="a-size-mini a-color-secondary inlineBlock unitLineHeight">12:13:59</span> 
     </div> 

</div> 
       <div class="a-row a-spacing-mini"> 

    <a class="a-link-normal" href="https://www.amazon.in/s/ref=gbps_tit_s-4_0227_af8a024a?fst=as%3Aoff&amp;rh=n%3A1571283031%2Cn%3A1983396031%2Ck%3A23rdApril_runningshoes_dotdlist%2Cp_76%3A1318482031%2Cp_6%3AA14FG3FHN6HO9H&amp;keywords=23rdApril_runningshoes_dotdlist&amp;ie=UTF8&amp;qid=1460093112&amp;rnid=1318474031&amp;smid=A14FG3FHN6HO9H&amp;pf_rd_p=900470227&amp;pf_rd_s=slot-4&amp;pf_rd_t=701&amp;pf_rd_i=gb_main&amp;pf_rd_m=A1VBAL9TL5WCBF&amp;pf_rd_r=13ED3AZVD21FX9VX9SS1"> 
     <span class="a-declarative" data-action="gbdeal-actionrecord" data-gbdeal-actionrecord="{&quot;actionType&quot;:&quot;TITLE&quot;,&quot;position&quot;:&quot;0&quot;,&quot;widgetID&quot;:&quot;100&quot;,&quot;dealID&quot;:&quot;af8a024a&quot;}"> 

      <span id="dealTitle" class="a-size-base a-color-base dealTitleTwoLine hoverVisible visibleCss singleCellTitle autoHeight" style="width: 210px;"> 
       Men's Shoes: Minimum 40% Off for Sports Shoes 
      </span> 
      <span id="dealTitle" class="a-size-base a-color-link dealTitleTwoLine restVisible singleCellTitle autoHeight"> 
       Men's Shoes: Minimum 40% Off for Sports Shoes 
      </span> 

     </span> 
    </a> 

</div> 

        <div class="a-row a-spacing-mini"> 

     <div class="a-row reviewStars"> 
      <a class="a-link-normal touchAnchor" href="/gp/product-reviews/B00593XQS6/ref=gbps_rvw_s-4_0227_af8a024a?pf_rd_p=900470227&amp;pf_rd_s=slot-4&amp;pf_rd_t=701&amp;pf_rd_i=gb_main&amp;pf_rd_m=A1VBAL9TL5WCBF&amp;pf_rd_r=13ED3AZVD21FX9VX9SS1"> 
       <span class="a-declarative" data-action="gbdeal-actionrecord" data-gbdeal-actionrecord="{&quot;actionType&quot;:&quot;REVIEWS&quot;,&quot;position&quot;:&quot;0&quot;,&quot;widgetID&quot;:&quot;100&quot;,&quot;dealID&quot;:&quot;af8a024a&quot;}"> 

          <i class="a-icon a-icon-star a-star-5"><span class="a-icon-alt">Avg. Customer Review</span></i> 

        1 
      </span> 
     </a> 

</div> 

          <div class="a-row buttonOuterContainer "> 


    <div class="a-row a-spacing-medium"> 

         <span class="a-declarative" data-action="gbdeal-actionrecord" data-gbdeal-actionrecord="{&quot;actionType&quot;:&quot;SEE_MORE&quot;,&quot;position&quot;:&quot;0&quot;,&quot;widgetID&quot;:&quot;100&quot;,&quot;dealID&quot;:&quot;af8a024a&quot;}"> 
          <span class="a-button a-button-span12 a-button-primary fixedWidth210"><span class="a-button-inner"><a href="https://www.amazon.in/s/ref=gbps_ulm_s-4_0227_af8a024a?fst=as%3Aoff&amp;rh=n%3A1571283031%2Cn%3A1983396031%2Ck%3A23rdApril_runningshoes_dotdlist%2Cp_76%3A1318482031%2Cp_6%3AA14FG3FHN6HO9H&amp;keywords=23rdApril_runningshoes_dotdlist&amp;ie=UTF8&amp;qid=1460093112&amp;rnid=1318474031&amp;smid=A14FG3FHN6HO9H&amp;pf_rd_p=900470227&amp;pf_rd_s=slot-4&amp;pf_rd_t=701&amp;pf_rd_i=gb_main&amp;pf_rd_m=A1VBAL9TL5WCBF&amp;pf_rd_r=13ED3AZVD21FX9VX9SS1" class="a-button-text a-text-center" role="button"> 
           View Deal 
          </a></span></span> 
         </span> 

    </div> 


          </div> 
      </div> 

    </div> 
</div> 

</div></div> 

ihr ist 60+ divs aber ich möchte zuerst 8 divs durch schaben den Inhalt auf das jeweilige Feld.

+0

Sie haben die Simple HTML DOM-Bibliothek hinzugefügt. Es hat Methoden zum Parsen des HTML und Suchen nach Elementen. Warum benutzt du es nicht? – Barmar

+0

Sie müssen nicht curl verwenden. Sie können 'file_get_html ($ url)' aus der Simple HTML DOM-Bibliothek verwenden. – Barmar

+0

thak Sie @Barmar, aber es zeigt undefined Funktion und wie Sie die bestimmten 8 Divs –

Antwort

1

Sie können XPath verwenden. Werfen Sie einen Blick auf this tutorial on scraping the web in PHP. In Ihrem Fall haben Sie hier nicht den gesamten HTML-Code angegeben, aber ich schätze, Sie möchten das erste div aufnehmen.

$document = new DOMDocument; 

libxml_use_internal_errors(true); 

$document->loadHTML($output); 

$xpath = new DOMXPath($document); 

$data = $xpath->query("//div[@id='100_dealView_0']"); 

foreach ($data as $d) { // in case there are multiple (there shouldn't be) 
    echo $d->nodeValue; 
}