2016-03-29 4 views
1

Ich versuche Nachrichten mit R von Yahoo Finance Webseite zu kratzen, um eine Tabelle mit zwei Spalten zu erstellen: Datum und Schlagzeilen. Nach den Anweisungen von here erstelle ich korrekt eine Spalte mit Schlagzeilen; Der nächste Schritt besteht darin, das Datum abzurufen und es als Spalte zur Tabelle hinzuzufügen.Scrapping Schlagzeilen und Daten von Yahoo Finance mit R

Ich glaube, ich brauche nur diesen Befehl zu ändern:

out_dt <- xpathSApply(d, "//ul[contains(@class,'newsheadlines')]/following::ul/li/a", xmlValue) 

um das Datum anstelle der Schlagzeilen aus, als ein Beispiel zu bekommen, diesen Code:

<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"><title>BMPS.MI Headlines | BANCA MPS Stock - Yahoo! Finance</title><script type="text/javascript" src="http://l.yimg.com/a/i/us/fi/03rd/yg_csstare_nobgcolor.js"></script><link rel="stylesheet" href="http://l.yimg.com/zz/combo?kx/yucs/uh3/uh/1138/css/uh_non_mail-min.css&amp;kx/yucs/uh3s/atomic/84/css/atomic-min.css&amp;kx/yucs/uh_common/meta/3/css/meta-min.css&amp;kx/yucs/uh3/top-bar/366/css/no_icons-min.css&amp;kx/yucs/uh3/search/css/588/blue_border-min.css&amp;kx/yucs/uh3/get-the-app/151/css/get_the_app-min.css&amp;bm/lib/fi/common/p/d/static/css/2.0.356981/2.0.0/mini/yfi_yoda_legacy_lego_concat.css&amp;bm/lib/fi/common/p/d/static/css/2.0.356981/2.0.0/mini/yfi_symbol_suggest.css&amp;bm/lib/fi/common/p/d/static/css/2.0.356981/2.0.0/mini/yui_helper.css&amp;bm/lib/fi/common/p/d/static/css/2.0.356981/2.0.0/mini/yfi_theme_teal.css&amp;bm/lib/fi/common/p/d/static/css/2.0.356981/2.0.0/mini/yfi_follow_quote.css&amp;bm/lib/fi/common/p/d/static/css/2.0.356981/2.0.0/mini/yfi_follow_stencil.css" type="text/css"><script language="javascript"> 
ll_js = new Array(); 
</script><script type="text/javascript" src="http://l1.yimg.com/bm/combo?fi/common/p/d/static/js/2.0.356981/2.0.0/mini/yui-min-3.9.1.js&amp;fi/common/p/d/static/js/2.0.356981/yui_2.8.0/build/yuiloader-dom-event/2.0.0/mini/yuiloader-dom-event.js&amp;fi/common/p/d/static/js/2.0.356981/yui_2.8.0/build/container/2.0.0/mini/container.js&amp;fi/common/p/d/static/js/2.0.356981/yui_2.8.0/build/datasource/2.0.0/mini/datasource.js&amp;fi/common/p/d/static/js/2.0.356981/yui_2.8.0/build/autocomplete/2.0.0/mini/autocomplete.js"></script><script language="javascript"> 
YUI.YUICfg = {"base":"http:\/\/l.yimg.com\/","comboBase":"http:\/\/l.yimg.com\/zz\/combo?","combine":true,"allowRollup":true,"maxURLLength":"2000"} 
YUI.YUICfg.root = 'yui:'+YUI.version+'/build/'; 
YUI.applyConfig(YUI.YUICfg); 
</script><script language="javascript"> 
ll_js.push({ 
    'success_callback' : function() { 
      YUI().use('stencil', 'follow-quote', 'node', function (Y) { 
       var conf = {'xhrBase': '/', 'lang': 'en-US', 'region': 'US', 'loginUrl': 'https://login.yahoo.com/config/login_verify2?&.done=http://finance.yahoo.com/q?s=BMPS.MI&.intl=us'}; 

       Y.Media.FollowQuote.init(conf, function() { 
        var exchNode = null, 
         followSecClass = "", 
         followHtml = "", 
         followNode = null; 

        followSecClass = Y.Media.FollowQuote.getFollowSectionClass(); 
        followHtml = Y.Media.FollowQuote.getFollowBtnHTML({ ticker: 'BMPS.MI', addl_classes: "follow-quote-always-visible", showFollowText: true }); 
        followNode = Y.Node.create(followHtml); 
        exchNode = Y.one(".wl_sign"); 
        if (!Y.Lang.isNull(exchNode)) { 
         exchNode.append(followNode); 

        } 

       }); 
      }); 
    } 
}); 

Jeder Vorschlag?

Antwort

3

können Sie rvest wie folgt verwenden:

require(rvest) 
doc <- read_html("http://finance.yahoo.com/q/h?s=AAPL+Headlines") 
scope <- doc %>% html_nodes("#yfncsumtab li") 
res <- lapply(scope, function(li){ 
    data.frame(stringsAsFactors = FALSE, 
    date = li %>% html_node("cite span") %>% html_text, 
    headline = li %>% html_node("a") %>% html_text 
    ) 
}) 
do.call(rbind, res) 

Dies gibt Ihnen:

   date                     headline 
1 (Tue 3:49AM EDT)         US hacks iPhone, ends legal battle but questions linger 
2 (Tue 1:27AM EDT)       Amazon Echo turns into a sleeper hit, offsetting Fire's failure 
3 (Tue 1:00AM EDT)          Why Everyone Loses in Apple’s Fight Against the FBI 
4 (Tue 12:36AM EDT) [$$] US drops Apple case, Japan's negative rate bounty and the criminals paid not to kill 
5 (Tue 12:25AM EDT)        U.S. succeeds in cracking Apple's iPhone, drops legal action 
6 (Tue 12:00AM EDT) [$$] Brussels Attacks: Belgium Turns to U.S. for Help in Scouring Seized Laptops, Phones 
7  (Mon, Mar 28)    [$$] FBI Opens San Bernardino Shooter’s iPhone; U.S. Drops Demand on Apple 
8  (Mon, Mar 28)            Wolverton: Encyption debate isn't going away 
9  (Mon, Mar 28)           [$$] US drops Apple case after cracking iPhone 
10  (Mon, Mar 28)   Words of warning — not celebration — in Silicon Valley after FBI ends Apple fight 
11  (Mon, Mar 28)        [$$] FBI Opens Shooter's iPhone; U.S. Drops Demand on Apple 
12  (Mon, Mar 28)           FBI hacks into terrorist’s iPhone without Apple 
13  (Mon, Mar 28)         Justice Department cracks iPhone; withdraws legal action 
14  (Mon, Mar 28)        Apple responds: 'This case should have never been brought' 
15  (Mon, Mar 28)       IPhone Security Is the Casualty in Apple's Victory Over the FBI 
16  (Mon, Mar 28)       Cracked Apple iPhone By F.B.I. Puts Spotlight On Apple Security 
17  (Mon, Mar 28)         DOJ Drops Apple Case: Bloomberg West (Full Show 03/28) 
18  (Mon, Mar 28)           Apple, Inc.'s New iPhone SE: Off to a Big Start? 
19  (Mon, Mar 28)            AP Explains: Apple vs. FBI _ What Happened? 
20  (Mon, Mar 28)             PRESS DIGEST- Financial Times - March 29 

ich Ihnen das Datum-Parsing lassen.

wäre eine weitere Alternative das Datum aus dem h3-Überschrift nehmen werden als

require(rvest) 
doc <- read_html("http://finance.yahoo.com/q/h?s=AAPL+Headlines") 
scope <- doc %>% html_nodes("#yfncsumtab") 
dates <- scope %>% html_nodes("h3 span") %>% html_text() 
headlines <- scope %>% html_nodes("h3 + ul") %>% lapply(. %>% html_nodes("li a") %>% html_text) 

# combine both 
do.call(rbind,Map(cbind, dates, headlines)) 

Welche

in der folgenden Matrix ergibt folgt
 [,1]      [,2]                      
[1,] "Tuesday, March 29, 2016" "March 29 Premarket Briefing: 10 Things You Should Know"         
[2,] "Tuesday, March 29, 2016" "You might soon be able to pay for goods in-store using Facebook Messenger"     
[3,] "Tuesday, March 29, 2016" "FBI unlocks iPhone"                  
[4,] "Tuesday, March 29, 2016" "US hacks iPhone, ends legal battle but questions linger"         
[5,] "Tuesday, March 29, 2016" "Amazon Echo turns into a sleeper hit, offsetting Fire's failure"       
[6,] "Tuesday, March 29, 2016" "Why Everyone Loses in Apple’s Fight Against the FBI"          
[7,] "Tuesday, March 29, 2016" "[$$] US drops Apple case, Japan's negative rate bounty and the criminals paid not to kill" 
[8,] "Tuesday, March 29, 2016" "U.S. succeeds in cracking Apple's iPhone, drops legal action"        
[9,] "Tuesday, March 29, 2016" "[$$] Brussels Attacks: Belgium Turns to U.S. for Help in Scouring Seized Laptops, Phones" 
[10,] "Monday, March 28, 2016" "[$$] FBI Opens San Bernardino Shooter’s iPhone; U.S. Drops Demand on Apple"    
[11,] "Monday, March 28, 2016" "Wolverton: Encyption debate isn't going away"            
[12,] "Monday, March 28, 2016" "[$$] US drops Apple case after cracking iPhone"           
[13,] "Monday, March 28, 2016" "Words of warning — not celebration — in Silicon Valley after FBI ends Apple fight"   
[14,] "Monday, March 28, 2016" "[$$] FBI Opens Shooter's iPhone; U.S. Drops Demand on Apple"        
[15,] "Monday, March 28, 2016" "FBI hacks into terrorist’s iPhone without Apple"           
[16,] "Monday, March 28, 2016" "Justice Department cracks iPhone; withdraws legal action"         
[17,] "Monday, March 28, 2016" "Apple responds: 'This case should have never been brought'"        
[18,] "Monday, March 28, 2016" "IPhone Security Is the Casualty in Apple's Victory Over the FBI"       
[19,] "Monday, March 28, 2016" "Cracked Apple iPhone By F.B.I. Puts Spotlight On Apple Security"       
[20,] "Monday, March 28, 2016" "DOJ Drops Apple Case: Bloomberg West (Full Show 03/28)" 

Auch im zweiten Fall i auf das Datum-Parsing verlassen Sie