Erhalte bestimmte Tags innerhalb des Eltern-Tags mit Beautifulsoup4

Ich benutze BeautifulSoup4 mit Python, um Inhalt aus dem Web zu scrape, mit dem ich Inhalt von bestimmten HTML-Tags extrahieren möchte, während ich andere ignoriere.Erhalte bestimmte Tags innerhalb des Eltern-Tags mit Beautifulsoup4

Ich habe folgende html:

<div class="the-one-i-want"> 
    <p> 
     "random text content here and about" 
    </p> 
    <p> 
     "random text content here and about" 
    </p> 
    <p> 
     "random text content here and about" 
    </p> 
    <div class="random-inserted-element-i-dont-want"> 
     <content> 
    </div> 
    <p> 
     "random text content here and about" 
    </p> 
    <p> 
     "random text content here and about" 
    </p> 
</div>

Mein Ziel ist es zu verstehen, wie Python aus der Mutter erhält die <p> Elemente nur instruieren <div> class="the-one-i-want">, sonst innerhalb all <div> ‚s zu ignorieren.

Derzeit bin ich den Inhalt des übergeordneten div durch das folgende Verfahren Ortung:

content = soup.find('div', class_='the-one-i-want')

Allerdings kann ich nicht scheinen, um herauszufinden, wie weiter zu spezifizieren nur aus, dass die <p> Tags zu extrahieren, ohne Error.

Quelle

2016-06-24 theeastcoastwest

h = """<div class="the-one-i-want"> 
    <p> 
     "random text content here and about" 
    </p> 
    <p> 
     "random text content here and about" 
    </p> 
    <p> 
     "random text content here and about" 
    </p> 
    <div class="random-inserted-element-i-dont-want"> 
     <content> 
    </div> 
    <p> 
     "random text content here and about" 
    </p> 
    <p> 
     "random text content here and about" 
    </p> 
</div>"""

Sie können nur find_all("p"), nachdem Sie verwenden finden:

from bs4 import BeautifulSoup 
soup = BeautifulSoup(h) 

print(soup.find("div","the-one-i-want").find_all("p"))

Oder eine CSS auswählen verwenden:

[<p>\n  "random text content here and about"\n </p>, <p>\n  "random text content here and about"\n </p>, <p>\n  "random text content here and about"\n </p>, <p>\n  "random text content here and about"\n </p>, <p>\n  "random text content here and about"\n </p>]

print(soup.select("div.the-one-i-want p"))

Beide Sie geben nur Nachkommen der div mit der Klasse finden the-one-i-want, das gleiche gilt für unsere select

Quelle

2016-06-24 20:49:34

Ich schwöre, dass ich das schon versucht hatte, aber ich denke nicht. Problem gelöst. Vielen Dank – theeastcoastwest

Erhalte bestimmte Tags innerhalb des Eltern-Tags mit Beautifulsoup4

Antwort

Verwandte Themen