2016-08-08 38 views
1

Ich habe Scrapy zuvor mit craiglist zum Erfolg gebracht, aber jetzt, da ich versuche, Benutzernamen beliebig zu kratzen, bekomme ich immer ein leeres Array in der Scrapy-Shell.Scrapy Shell, die leeres Array mit Steam-Website zurückgibt?

Der Benutzername Element (die zum Beispiel xempy ist) in enthalten ist:

<a class="searchPersonaName" href="https://steamcommunity.com/id/zxZEmpy">xempy</a> 

der Befehl Ich verwende die tatsächlichen Benutzernamen aus der URL zu kratzen oben ist:

response.select('//*[@id="search_results"]/div[3]/div[3]/a/text()').extract() 

die URL ich zu kratzen bin versucht, ist

https://steamcommunity.com/search/users/#filter=users&text=xempy 

I verwendet Chrome XPath des Elements zu kopieren ich bin inte Ausgelöst, anstatt es von Hand zu tippen, um sicher zu gehen, dass es keine Tippfehler gab, aber sogar alles von Hand mit den absoluten Pfaden eingeben, bekomme ich immer noch ein leeres Array, wenn ich versuche, eine einfache Zeichenfolge mit der Benutzername "xempy".

Was mache ich falsch? Ich habe den gleichen Prozess verwendet, um craigslist erfolgreich zu scrappen, aber auf der Webseite von Steam scheint es nicht zu funktionieren und ich kann keine wirklichen Beispiele für Steam Scrapy Scripts finden.

+1

run 'view (response)' aus der Shell, schauen Sie sich auch die Quelle in Ihrem Browser an, rechtsklicken Sie und wählen Sie Quelltext anzeigen –

Antwort

0

Wenn Sie an der aktuellen Quelle in Ihrem Browser aussehen wird, kopieren und Quelltext anzeigen wählen Sie kein Zeichen der Ergebnisse sehen werden, werden die Daten über eine Ajax-Anforderung zu https://steamcommunity.com/search/SearchCommunityAjax dynamisch hinzugefügt.

Sie das Ajax-Request zu imitieren, ich habe Anfragen verwendet, aber die Schritte werden das gleiche für scrapy sein:

import requests 

headers = { 
    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.82 Safari/537.36", 
    "X-Requested-With": "XMLHttpRequest"} 
params = {"text": "xempy", "filter": "users", "sessionid": "", "steamid_user": "false", "page": "1"} 
ajax_url = "https://steamcommunity.com/search/SearchCommunityAjax" 
with requests.Session() as s: 
    s.headers.update() 
    r = s.get("https://steamcommunity.com/search/users/#filter=users&text=xempy") 
    # need to update the session id which we get from the previous gets headers 
    params["sessionid"] = next(
     c.split("=", 1)[1] for c in r.headers["set-cookie"].split(";") if c.startswith("sessionid")) 
    # need to update the session headers 
    s.headers.update(r.headers) 
    # and also the cookies from the previous request 
    s.cookies.update(r.cookies) 
    result = (s.get(ajax_url, params=params).json()) 

Wenn wir den Code ausführen können Sie sehen, wir bekommen einige Json zurück:

In [5]: with requests.Session() as s: 
    ...:   s.headers.update() 
    ...:   r = s.get("https://steamcommunity.com/search/users/#filter=users&text=xempy") 
    ...:   params["sessionid"] = next(
    ...:    c.split("=", 1)[1] for c in r.headers["set-cookie"].split(";") if c.startswith("sessionid")) 
    ...:   s.headers.update(r.headers) 
    ...:   s.cookies.update(r.cookies) 
    ...:   result = (s.get(ajax_url, params=params).json()) 
    ...:   print(result) 
    ...:  
{u'html': u'\t\t<div style="float: right; padding-bottom: 2px">\r\n\t\t\t\t\t\tShowing 1 - 11 of 11\t\t\t</div>\r\n\t<div style="clear: both"></div>\r\n\t\t\t\t\t\t<div class="search_row">\r\n\t<div class="search_result_friend">\r\n\t\t\t</div>\r\n\t<div class="mediumHolder_default" data-miniprofile="16183171" style="float:left;"><div class="avatarMedium"><a href="https://steamcommunity.com/id/zxZEmpy"><img src="https://steamcdn-a.akamaihd.net/steamcommunity/public/images/avatars/b9/b9c886a08cf17c4f1f31ea19148d8b3bbd748762_medium.jpg"></a></div></div>\r\n\t<div class="searchPersonaInfo">\r\n\t\t<a class="searchPersonaName" href="https://steamcommunity.com/id/zxZEmpy">xempy</a><br />\r\n\t\t\t\t\t\t\t\t&nbsp;\t\t\t</div>\r\n\t<div style="clear:left"></div>\r\n\r\n\t\t\t<div class="search_match_info">\r\n\t\t\t\t\t\t\t\t\t\t<div>Custom URL: steamcommunity.com/id/<span style="color: whitesmoke">zxZEmpy</span></div>\r\n\t\t\t\t\t\t\t\t\t\t<div>\r\n\t\t\t\t\tAlso known as: <span style="color: whitesmoke">trill</span>, <span style="color: whitesmoke">[TGIF] Mario Batali</span>, <span style="color: whitesmoke">[TGIF] Mario \xdfatali</span>, <span style="color: whitesmoke">Mario \xdfatali</span>, <span style="color: whitesmoke">[TGIF\'</span>, <span style="color: whitesmoke">[TGIF] Mario \u03b2atali</span>\t\t\t\t</div>\r\n\t\t\t\t\t</div>\r\n\t\t</div>\r\n\t\t\t\t\t\t\t\t<div class="search_row">\r\n\t<div class="search_result_friend">\r\n\t\t\t</div>\r\n\t<div class="mediumHolder_default" data-miniprofile="280326130" style="float:left;"><div class="avatarMedium"><a href="https://steamcommunity.com/id/xempyjecar"><img src="https://steamcdn-a.akamaihd.net/steamcommunity/public/images/avatars/89/8928b324ba9c12859283e8be3f11f19d9232033c_medium.jpg"></a></div></div>\r\n\t<div class="searchPersonaInfo">\r\n\t\t<a class="searchPersonaName" href="https://steamcommunity.com/id/xempyjecar">Xempy -A-</a><br />\r\n\t\t\t\t\tIgor<br />\t\t\tSerbia&nbsp;<img style="margin-bottom:-2px" src="https://steamcommunity-a.akamaihd.net/public/images/countryflags/rs.gif" border="0" />\t\t\t</div>\r\n\t<div style="clear:left"></div>\r\n\r\n\t\t\t<div class="search_match_info">\r\n\t\t\t\t\t\t\t\t\t\t<div>Custom URL: steamcommunity.com/id/<span style="color: whitesmoke">xempyjecar</span></div>\r\n\t\t\t\t\t\t\t\t\t\t<div>\r\n\t\t\t\t\tAlso known as: <span style="color: whitesmoke">Xempy -A- NEW SEASON HYPEE</span>, <span style="color: whitesmoke">Brekija</span>, <span style="color: whitesmoke">FAIRPLAY ORGANISATION</span>, <span style="color: whitesmoke">Xempy | csgoshit.com</span>, <span style="color: whitesmoke">Xempy | csgorage.com</span>, <span style="color: whitesmoke">\u2500\u2500\u2500\u2554\u2550\u2550\u2550\u2557</span>, <span style="color: whitesmoke">XempyTheCupcake</span>\t\t\t\t</div>\r\n\t\t\t\t\t</div>\r\n\t\t</div>\r\n\t\t\t\t\t\t\t\t<div class="search_row">\r\n\t<div class="search_result_friend">\r\n\t\t\t</div>\r\n\t<div class="mediumHolder_default" data-miniprofile="315139919" style="float:left;"><div class="avatarMedium"><a href="https://steamcommunity.com/id/filipppp"><img src="https://steamcdn-a.akamaihd.net/steamcommunity/public/images/avatars/ca/caa5747851b5255a2d76699d855bf20e709af3d1_medium.jpg"></a></div></div>\r\n\t<div class="searchPersonaInfo">\r\n\t\t<a class="searchPersonaName" href="https://steamcommunity.com/id/filipppp">Xempy -A-</a><br />\r\n\t\t\t\t\tIgor<br />\t\t\tSerbia&nbsp;<img style="margin-bottom:-2px" src="https://steamcommunity-a.akamaihd.net/public/images/countryflags/rs.gif" border="0" />\t\t\t</div>\r\n\t<div style="clear:left"></div>\r\n\r\n\t\t\t<div class="search_match_info">\r\n\t\t\t\t\t\t\t\t\t\t<div>Custom URL: steamcommunity.com/id/<span style="color: whitesmoke">filipppp</span></div>\r\n\t\t\t\t\t\t\t\t\t\t<div>\r\n\t\t\t\t\tAlso known as: <span style="color: whitesmoke">Extreeemeeee</span>, <span style="color: whitesmoke">Ratatatatatata</span>\t\t\t\t</div>\r\n\t\t\t\t\t</div>\r\n\t\t</div>\r\n\t\t\t\t\t\t\t\t<div class="search_row">\r\n\t<div class="search_result_friend">\r\n\t\t\t</div>\r\n\t<div class="mediumHolder_default" data-miniprofile="258386073" style="float:left;"><div class="avatarMedium"><a href="https://steamcommunity.com/id/lenyagoglov"><img src="https://steamcdn-a.akamaihd.net/steamcommunity/public/images/avatars/71/71ee8d0519c74cea0352836b188c747b36224f8f_medium.jpg"></a></div></div>\r\n\t<div class="searchPersonaInfo">\r\n\t\t<a class="searchPersonaName" href="https://steamcommunity.com/id/lenyagoglov">Xempys</a><br />\r\n\t\t\t\t\tTed<br />\t\t\tLuxembourg&nbsp;<img style="margin-bottom:-2px" src="https://steamcommunity-a.akamaihd.net/public/images/countryflags/lu.gif" border="0" />\t\t\t</div>\r\n\t<div style="clear:left"></div>\r\n\r\n\t\t\t<div class="search_match_info">\r\n\t\t\t\t\t\t\t\t\t\t<div>Custom URL: steamcommunity.com/id/<span style="color: whitesmoke">lenyagoglov</span></div>\r\n\t\t\t\t\t\t\t\t</div>\r\n\t\t</div>\r\n\t\t\t\t\t\t\t\t<div class="search_row">\r\n\t<div class="search_result_friend">\r\n\t\t\t</div>\r\n\t<div class="mediumHolder_default" data-miniprofile="257927191" style="float:left;"><div class="avatarMedium"><a href="https://steamcommunity.com/id/rostislavtseychuk85"><img src="https://steamcdn-a.akamaihd.net/steamcommunity/public/images/avatars/86/8641de85a283f0d23d1cbeb35ee0c0d5ca87a83b_medium.jpg"></a></div></div>\r\n\t<div class="searchPersonaInfo">\r\n\t\t<a class="searchPersonaName" href="https://steamcommunity.com/id/rostislavtseychuk85">Xempys</a><br />\r\n\t\t\t\t\tGabriel<br />\t\t\tLebanon&nbsp;<img style="margin-bottom:-2px" src="https://steamcommunity-a.akamaihd.net/public/images/countryflags/lb.gif" border="0" />\t\t\t</div>\r\n\t<div style="clear:left"></div>\r\n\r\n\t\t\t<div class="search_match_info">\r\n\t\t\t\t\t\t\t\t\t\t<div>Custom URL: steamcommunity.com/id/<span style="color: whitesmoke">rostislavtseychuk85</span></div>\r\n\t\t\t\t\t\t\t\t</div>\r\n\t\t</div>\r\n\t\t\t\t\t\t\t\t<div class="search_row">\r\n\t<div class="search_result_friend">\r\n\t\t\t</div>\r\n\t<div class="mediumHolder_default" data-miniprofile="252811169" style="float:left;"><div class="avatarMedium"><a href="https://steamcommunity.com/id/mochulskayaa"><img src="https://steamcdn-a.akamaihd.net/steamcommunity/public/images/avatars/76/76c10b0744403468aaf8090f56ca8ddd61338925_medium.jpg"></a></div></div>\r\n\t<div class="searchPersonaInfo">\r\n\t\t<a class="searchPersonaName" href="https://steamcommunity.com/id/mochulskayaa">Xempys</a><br />\r\n\t\t\t\t\tRichard<br />\t\t\tGuatemala&nbsp;<img style="margin-bottom:-2px" src="https://steamcommunity-a.akamaihd.net/public/images/countryflags/gt.gif" border="0" />\t\t\t</div>\r\n\t<div style="clear:left"></div>\r\n\r\n\t\t\t<div class="search_match_info">\r\n\t\t\t\t\t\t\t\t\t\t<div>Custom URL: steamcommunity.com/id/<span style="color: whitesmoke">mochulskayaa</span></div>\r\n\t\t\t\t\t\t\t\t</div>\r\n\t\t</div>\r\n\t\t\t\t\t\t\t\t<div class="search_row">\r\n\t<div class="search_result_friend">\r\n\t\t\t</div>\r\n\t<div class="mediumHolder_default" data-miniprofile="260028611" style="float:left;"><div class="avatarMedium"><a href="https://steamcommunity.com/id/katerukhina"><img src="https://steamcdn-a.akamaihd.net/steamcommunity/public/images/avatars/24/24241e97a6caf3bd932a01ea22afc6b3d758f1a1_medium.jpg"></a></div></div>\r\n\t<div class="searchPersonaInfo">\r\n\t\t<a class="searchPersonaName" href="https://steamcommunity.com/id/katerukhina">Xempys</a><br />\r\n\t\t\t\t\tChristian<br />\t\t\tFiji&nbsp;<img style="margin-bottom:-2px" src="https://steamcommunity-a.akamaihd.net/public/images/countryflags/fj.gif" border="0" />\t\t\t</div>\r\n\t<div style="clear:left"></div>\r\n\r\n\t\t\t<div class="search_match_info">\r\n\t\t\t\t\t\t\t\t\t\t<div>Custom URL: steamcommunity.com/id/<span style="color: whitesmoke">katerukhina</span></div>\r\n\t\t\t\t\t\t\t\t</div>\r\n\t\t</div>\r\n\t\t\t\t\t\t\t\t<div class="search_row">\r\n\t<div class="search_result_friend">\r\n\t\t\t</div>\r\n\t<div class="mediumHolder_default" data-miniprofile="292454844" style="float:left;"><div class="avatarMedium"><a href="https://steamcommunity.com/id/purdenkos"><img src="https://steamcdn-a.akamaihd.net/steamcommunity/public/images/avatars/5c/5c7f9d1b71a68ab8599ae0fe8f2c4e0445348eaa_medium.jpg"></a></div></div>\r\n\t<div class="searchPersonaInfo">\r\n\t\t<a class="searchPersonaName" href="https://steamcommunity.com/id/purdenkos">Xempys</a><br />\r\n\t\t\t\t\tPatrik<br />\t\t\tCote D\'ivoire (Ivory Coast)&nbsp;<img style="margin-bottom:-2px" src="https://steamcommunity-a.akamaihd.net/public/images/countryflags/ci.gif" border="0" />\t\t\t</div>\r\n\t<div style="clear:left"></div>\r\n\r\n\t\t\t<div class="search_match_info">\r\n\t\t\t\t\t\t\t\t\t\t<div>Custom URL: steamcommunity.com/id/<span style="color: whitesmoke">purdenkos</span></div>\r\n\t\t\t\t\t\t\t\t</div>\r\n\t\t</div>\r\n\t\t\t\t\t\t\t\t<div class="search_row">\r\n\t<div class="search_result_friend">\r\n\t\t\t</div>\r\n\t<div class="mediumHolder_default" data-miniprofile="56000172" style="float:left;"><div class="avatarMedium"><a href="https://steamcommunity.com/id/v2incent"><img src="https://steamcdn-a.akamaihd.net/steamcommunity/public/images/avatars/ac/ac45a256e0a14712efff255db0105fedd80a4f0e_medium.jpg"></a></div></div>\r\n\t<div class="searchPersonaInfo">\r\n\t\t<a class="searchPersonaName" href="https://steamcommunity.com/id/v2incent">Ext4ze ` ^0| \'Xempy^0\'</a><br />\r\n\t\t\t\t\tv2incent<br />\t\t\t&nbsp;\t\t\t</div>\r\n\t<div style="clear:left"></div>\r\n\r\n\t\t\t<div class="search_match_info">\r\n\t\t\t\t\t\t\t\t\t\t<div>Custom URL: steamcommunity.com/id/<span style="color: whitesmoke">v2incent</span></div>\r\n\t\t\t\t\t\t\t\t</div>\r\n\t\t</div>\r\n\t\t\t\t\t\t\t\t<div class="search_row">\r\n\t<div class="search_result_friend">\r\n\t\t\t</div>\r\n\t<div class="mediumHolder_default" data-miniprofile="297670812" style="float:left;"><div class="avatarMedium"><a href="https://steamcommunity.com/id/xempy"><img src="https://steamcdn-a.akamaihd.net/steamcommunity/public/images/avatars/62/62ea583f7f838562c73cb70e3993e27acd583aef_medium.jpg"></a></div></div>\r\n\t<div class="searchPersonaInfo">\r\n\t\t<a class="searchPersonaName" href="https://steamcommunity.com/id/xempy">xempsanity `\xb4</a><br />\r\n\t\t\t\t\tIgor<br />\t\t\tSerbia&nbsp;<img style="margin-bottom:-2px" src="https://steamcommunity-a.akamaihd.net/public/images/countryflags/rs.gif" border="0" />\t\t\t</div>\r\n\t<div style="clear:left"></div>\r\n\r\n\t\t\t<div class="search_match_info">\r\n\t\t\t\t\t\t\t\t\t\t<div>Custom URL: steamcommunity.com/id/<span style="color: whitesmoke">xempy</span></div>\r\n\t\t\t\t\t\t\t\t\t\t<div>\r\n\t\t\t\t\tAlso known as: <span style="color: whitesmoke">XEMPYKiNGOFNOTHiNG</span>, <span style="color: whitesmoke">X3MPY</span>, <span style="color: whitesmoke">X3MPY * brother\'s on acc</span>\t\t\t\t</div>\r\n\t\t\t\t\t</div>\r\n\t\t</div>\r\n\t\t\t\t\t\t\t\t<div class="search_row">\r\n\t<div class="search_result_friend">\r\n\t\t\t</div>\r\n\t<div class="mediumHolder_default" data-miniprofile="121633219" style="float:left;"><div class="avatarMedium"><a href="https://steamcommunity.com/id/Empyrk"><img src="https://steamcdn-a.akamaihd.net/steamcommunity/public/images/avatars/6b/6b87d7a04bf211a2665b828436ad34e549f2b193_medium.jpg"></a></div></div>\r\n\t<div class="searchPersonaInfo">\r\n\t\t<a class="searchPersonaName" href="https://steamcommunity.com/id/Empyrk">Empyrk</a><br />\r\n\t\t\t\t\tMatteo<br />\t\t\tToscana, Italy&nbsp;<img style="margin-bottom:-2px" src="https://steamcommunity-a.akamaihd.net/public/images/countryflags/it.gif" border="0" />\t\t\t</div>\r\n\t<div style="clear:left"></div>\r\n\r\n\t\t\t<div class="search_match_info">\r\n\t\t\t\t\t\t\t\t\t\t<div>Custom URL: steamcommunity.com/id/<span style="color: whitesmoke">Empyrk</span></div>\r\n\t\t\t\t\t\t\t\t</div>\r\n\t\t</div>\r\n\t\t\t\t<div style="clear: both"></div>\r\n\t\t<div style="float: right; padding-bottom: 2px">\r\n\t\t\t\t\t\tShowing 1 - 11 of 11\t\t\t</div>\r\n\t<div style="clear: both"></div>\r\n\r\n\r\n', u'search_filter': u'users', u'search_text': u'xempy', u'success': 1, u'search_page': 1} 

Sie müssen nur auf results["html"] zugreifen, um die Quelle zu erhalten.

+0

Vielen Dank! Das hat gut funktioniert! – isaacprograms