Ich versuche also, Daten aus einer TXT-Datei zu lesen und dann die am häufigsten verwendeten 30 Wörter zu finden und auszudrucken. Jedes Mal, wenn ich meine txt-Datei zu lesen, erhalte ich die Fehlermeldung:UnicodeDecodeError: 'Ascii' Codec kann Byte 0x92 nicht decodieren?
"UnicodeDecodeError: 'ascii' codec can't decode byte 0x92 in position 338: ordinal not in range(128)".
Hier ist mein Code:
filename = 'wh_2015_national_security_strategy_obama.txt'
#catches the year of named in the file
year = filename[0:4]
ecount = 30
#opens the file and reads it
file = open(filename,'r').read() #THIS IS WHERE THE ERROR IS
#counts the characters, then counts the lines, replaces the non word characters, slipts the list and changes it all to lower case.
numchar = len(file)
numlines = file.count('\n')
file = file.replace(",","").replace("'s","").replace("-","").replace(")","")
words = file.lower().split()
dictionary = {}
#this is a dictionary of all the words to not count for the most commonly used.
dontcount = {"the", "of", "in", "to", "a", "and", "that", "we", "our", "is", "for", "at", "on", "as", "by", "be", "are", "will","this", "with", "or",
"an", "-", "not", "than", "you", "your", "but","it","a","and", "i", "if","they","these","has","been","about","its","his","no"
"because","when","would","was", "have", "their","all","should","from","most", "were","such","he", "very","which","may","because","--------"
"had", "only", "no", "one", "--------", "any", "had", "other", "those", "us", "while",
"..........", "*", "$", "so", "now","what", "who", "my","can", "who","do","could", "over", "-",
"...............","................", "during","make","************",
"......................................................................", "get", "how", "after",
"..................................................", "...........................", "much", "some",
"through","though","therefore","since","many", "then", "there", "–", "both", "them", "well", "me", "even", "also", "however"}
for w in words:
if not w in dontcount:
if w in dictionary:
dictionary[w] +=1
else:
dictionary[w] = 1
num_words = sum(dictionary[w] for w in dictionary)
#This sorts the dictionary and makes it so that the most popular is at the top.
x = [(dictionary[w],w) for w in dictionary]
x.sort()
x.reverse()
#This prints out the number of characters, line, and words(not including stop words.
print(str(filename))
print('The file has ',numchar,' number of characters.')
print('The file has ',numlines,' number of lines.')
print('The file has ',num_words,' number of words.')
#This provides the stucture for how the most common words should be printed out
i = 1
for count, word in x[:ecount]:
print("{0}, {1}, {2}".format(i,count,word))
i+=1
Mögliche doppelte http://stackoverflow.com/questions/21129020/how-to-fix-unicodedecodeerror- ascii-codec-cant-decode-byte & http://stackoverflow.com/questions/26619801/unicodedecodeerror-ascii-codec-cant-decode-byte-0x92-in-position-47-ordinal – Jaimes
Siehe den Beitrag, mit dem ich verlinkt bin und die [Python 3 docs for 'open'] (https://docs.python.org/3/library/functions.html#open), insbesondere ihr' encoding' Parameter. Für Python 2 ist die "neue" Version von "open" in ['io.Öffnen'] (https://docs.python.org/2/library/io.html#io.open). PS: Dieses Byte ist höchstwahrscheinlich ein Nicht-Standard (Microsoft) Recht-Anführungszeichen, häufig missbraucht als "lockiges" Apostroph. –
** Es ist keines der oben genannten ** - all diese Fragen und Antworten befassen sich mit Python 2. Nicht einer wird helfen, die OP die sehr einfache Frage in Bezug auf Python 3's TextIOWrapper eine Ausnahme zu werfen, die durch die Auswahl der richtigen korrigiert werden muss encoding –