2016-05-25 7 views
0

Ich bin ein Neuling bei Python. Ich versuche, eine CSV-Datei in einem lesbaren Raster zu organisieren. Als ich meine Excel-Datei in CSV umwandelte, wurde die Ausgabe verstümmelt, ein Durcheinander von Kommas und verstreuten Werten. Ich habe die Liste ausprobiert, aber die Daten wurden nicht so organisiert, wie ich es wollte. Ich wollte, dass mein Code in einem Pandas-Grid-Plot nach Kategorien - wie Ethnic und Racial Roots - organisiert wird.Wie kann ich Daten mit Pandas organisieren?

Hier die einen Teil der Datei als CSV gespeichert (die leider kommt heraus verstümmelt):

Ethnic and Racial Roots                    Jobs Held            Identity       Reason for Latino Identity       Latino ID    With Whom Gets Together-Major Group         With Whom Gets Together---Specific Group                            Transnational Behaviors               Perceptions of Opportunity, Inequality, Discrimination    
Subject Code Gen Place Age Male Country African European Indian Other Color Docs Reason Return 1st Occup 1st Oc Code 1st Wage Cur Occup Cur Oc Code Cur Wage Cur Hours/Day Father Occ Mother Occ Identity ID as Latino Ethnicity Culture Language Politics Values Emotions Everything Among Imms Mexican Cen Amer Caribbean South Amer Latinos-Gen Mex Gua Nic SS Hon CR PR DR Ecu Col Ven Bra Per Arg USYrs Contact R-Remits P-Remits Quantity Freq Sent How Sent Use 1 Use 2 US Bank OS Bank Type Com 1 Type Com 2 Presents Educ EngAbil EconOpps OthOpps Ineqaulity Discrim Context 
F-001 1 1 28 1 2 0 1 1 0 1 2 3 4 serv sk park 8 7.5 serv sk park 8 14 10 99 99 1 1 1 0 0 0 0 0 0 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 2 1 1 8 1 1 1 1 2 1 1 1 0 9 13 1 1 0 1 1 3 
F-002 1 2 35 1 15 1 1 1 0 3 9 6 4 sales work uns 7 7 music artist 10 7 99 9 9 1 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 1 1 8 3 2 5 3 2 1 1 1 0 1 13 2 2 0 1 9 9 
F-003 1 1 30 0 10 0 1 1 0 1 2 1 1 restfood unsk 7 2.9 inspect arq skill 8 2.9 10 99 99 2 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 1 8 3 1 1 2 0 2 2 1 0 2 6 0 1 0 3 1 2 
F-007 1 3 19 1 10 0 0 1 0 3 2 1 4 cleanserv unsk 7 8 restfood unsk 7 8 10 3 3 1 1 1 0 0 0 0 0 0 1 9 9 9 9 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 1 1 8 1 1 1 5 1 2 1 1 0 1 6 1 1 0 3 1 1 
F-008 1 3 20 1 10 0 0 1 0 3 2 1 1 professional 10 8.75 restfood skill 8 8.75 10 3 3 1 1 0 0 0 0 1 0 0 1 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 4 1 1 8 1 1 1 4 5 2 1 1 0 2 11 1 1 0 1 1 8 
F-010 1 2 21 0 5 0 1 1 0 1 1 5 1 serv sk cashier 8 6.75 serv skill libra 8 10 10 8 1 1 1 0 1 0 0 0 0 0 3 0 0 1 1 0 0 0 0 0 0 0 1 1 1 0 1 0 0 0 3 1 1 8 1 1 1 2 3 1 2 4 0 1 13 2 1 0 1 0 3 
F-013 1 3 29 1 5 1 1 0 0 1 2 2 4 manufa unsk 4 4 manufa unsk 4 4 8 10 10 2 1 0 1 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 8 1 2 8 9 9 9 9 9 9 9 1 4 1 18 2 2 0 3 1 4 
F-014 1 1 25 1 10 0 1 1 0 3 2 1 4 restfood unsk 7 3.5 restfood unsk 7 3.5 9 6 1 1 1 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 5 1 3 8 2 4 1 2 0 2 1 1 0 1 6 0 1 0 3 0 0 
F-015 1 3 23 1 5 1 1 0 0 3 9 6 4 unknown 99 99 unknwon 99 99 99 99 99 9 9 9 9 9 9 9 9 9 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  0 9 9 9 9 9 9 9 9 9 9 9 9 99 9 9 9 9 9 9 
F-016 1 3 30 0 5 1 1 1 0 2 3 3 2 clean serv unsk 7 7 clean serv unsk 7 7 10 5 1 1 1 0 0 0 0 1 0 0 1 0 1 1 1 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 4 1 1 8 2 1 1 4 2 1 2 3 0 1 9 1 1 0 1 1 3 
F-017 1 3 21 0 10 0 1 1 0 3 2 1 1 domest garden 7 5 homekeeper 1 5 8 6 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 2 8 3 2 1 3 2 2 2 4 0 1 9 0 1 0 2 1 5 
F-018 1 3 23 1 10 1 1 1 0 3 2 3 2 ambulant unsk 7  restfood unsk 7  99 9 1 1 1 0 1 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 1 1 8 3 2 1 2 0 2 2 3 0 1 12 2 9 9 2 1 4 
F-019 1 3 34 1 4 0 1 1 0 1 1 2 4 domest garden 7 3 professional 10 3 99 10 9 1 1 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 1 2 8 9 9 9 9 9 9 9 1 0 2 20 1 1 0 1 1 8 
F-020 1 3 33 1 3 1 1 0 0 1 2 1 4 domestic serv 7 1.25 sales work unsk 7 1.25 12 5 1 1 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 1 1 8 1 1 1 4 0 1 1 1 4 1 14 1 1 0 1 1 4 
F-021 1 3 33 0 5 1 0 1 1 4 3 2 2 clean serv unsk 7 9 clean serv unsk 7 9 10 3 1 1 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 1 3 8 3 4 1 2 3 2 2 1 1 1 14 1 1 0 2 1 3 
F-022 1 3 33 1 3 1 1 1 0 1 2 2 1 sales work uns 7 99 clean serv unsk 7 99 8 99 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 1 8 1 1 1 1 5 1 1 2 3 2 12 1 1 0 1 1 8 
F-024 1 3 26 1 15 1 1 1 0 3 2 2 4 restfood unsk 7 8.75 sales work unsk 7 8.75 99 5 7 1 1 0 0 1 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 1 8 3 1 1 2 0 2 1 1 0 2 13 1 1 0 9 0 0 
F-025 1 2 31 1 6 0 1 1 0 1 3 5 2 serv rest skill 8 7.5 restfood unsk 7 7.5 12 9 1 2 1 1 0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 13 0 3 8 3 4 3 2 0 2 2 1 0 2 12 2 1 0 1 1 1 
F-026 1 3 31 0 6 0 1 1 0 3 3 5 4 serv hotel skill 8 8 manager proffes 10 8 8 5 1 1 1 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 1 3 8 3 5 5 2 0 1 2 1 0 1 13 2 1 0 3 1 3 
F-027 1 3 20 1 14 0 1 1 0 1 1 3 4 adm asist NGO 10 3.75 superv rest skill\ 8 3.75 8 8 8 1 1 0 1 0 0 0 0 0 9 1 0 1 1 0 1 0 0 0 0 0 1 1 0 0 1 0 0 0 3 1 2 8 9 9 9 9 9 1 1 1 0 1 12 2 1 0 9 1 4 
F-028 1 1 20 0 10 0 1 1 0 3 1 5 1 manufcloth unsk 7 2.5 adm asist NGO 10 2.5 8 7 1 2 1 1 0 0 0 0 0 0 4 1 1 1 0 0 1 0 0 1 1 0 0 1 0 0 0 0 0 0 1 1 1 8 3 1 1 2 0 2 2 1 0 1 12 0 1 0 3 1 4 
F-032 1 3 22 1 6 0 1 1 0 1 2 2 1 restfood unsk 7 6.25 restfood unsk 7 6.25 12 9 1 2 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 1 1 8 2 1 1 1 2 2 2 1 0 9 9 1 1 0 1 0 0 
F-033 1 1 20 1 10 0 1 1 0 1 2 3 1 restfood unsk 7 12 servworker skil; 8 12 10 6 1 2 1 0 0 1 0 0 0 0 1 1 1 0 1 0 1 0 0 1 1 0 0 0 1 1 0 0 0 0 2 1 2 8 1 1 1 2 0 2 2 1 0 2 12 1 1 0 1 1 3 
F-034 1 3 30 0 4 1 1 1 0 1 3 2 3 manufa unsk 4 99 domestic serv 7 99 5 11 1 1 1 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 2 1 2 8 9 9 9 9 9 1 2 2 0 1 16 2 1 0 2 1 4 
F-035 1 3 22 1 10 0 1 1 0 1 2 5 1 cleanserv unsk 7 10 restfood unsk 7 10 10 9 9 1 1 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 1 8 1 2 7 4 0 2 2 1 0 1 7 1 1 0 1 1 6 
F-036 1 3 26 0 3 0 1 1 0 2 2 1 1 salesfood unsk 7 6 domerstserv uns 7 6 99 99 99 1 1 0 0 0 0 1 0 0 2 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 5 1 1 8 3 1 1 1 2 1 1 9 9 9 12 1 9 9 9 9 9 
F-037 1 3 25 1 10 0 0 1 0 3 2 5 1 restfood unsk 7 99 restfood unsk 7 99 4 3 1 1 1 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 8 2 1 2 1 2 2 2 1 0 2 7 1 1 0 1 0 0 
F-038 1 1 19 0 5 1 1 1 0 5 1 5 2 salespharm uns 7 7.5 restfood unsk 7 7.5 5 6 8 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 1 1 8 3 4 1 3 2 2 2 1 0 1 13 1 1 0 3 1 8 
F-039 1 3 21 1 13 0 1 1 1 3 2 5 4 manufac unskil 4 5.25 salespharm uns 7 5.25 99 9 1 1 1 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 4 1 3 8 3 4 7 1 0 1 2 1 0 1 12 2 1 0 2 1 3 
F-040 1 3 20 0 5 1 1 0 0 4 1 5 1 manufac unskill 4 5.5 clean serv unsk 7 5.5 8 5 9 1 1 0 0 0 0 1 0 0 1 0 0 1 1 0 0 0 0 0 0 0 1 1 0 1 1 0 0 0 2 1 2 8 3 2 1 3 0 1 2 1 0 1 12 0 1 0 2 1 8 
F-041 1 2 25 0 6 0 1 1 0 3 2 5 1 manufac unskill 4 3 restfood unsk 7 3 8 99 99 1 1 0 0 0 0 1 0 0 1 

Hier sind die für diese Daten verwendeten Codes (die ich in einem Pandas Gitter Grundstück setzen wollen)

Codes                                                                        
Generation  1= First 2=Second                                                                      
Location  1=New York 2=New Jersey 3=Pennsylvania                                                                       
Age  Age at Last Birthday                                                                       
Gender  0=Female 1=Male                                                                      
Country  1=Arg 2=Bol 3=Bra 4=Col 5=DR 6=Ecu 7= El Sal 8=Gua 9=Hon 10=Mex 11=Nic 12=Pan 13=Peru 14=PR 15=Ven                                                                      
African Roots  0=No 1=Yes                                                                       
European Roots  0=No 1=Yes                                                                       
Indian Roots  0=No 1=Yes                                                                       
Other Roots  0=No 1=Yes                                                                       
Skin Color  1=Light 2=Medium Light 3=Medium 4=Mediium Dark 5=Dark                                                                      
Legal Status  1=Documents 2=No Documents 3=Questionable Documents 9=Missing                                                                      
Reason for Migration  1=supply-side economics 2=demand-side economics 3=network links 4=violence at origin 5=family reasons 6=other                                                                      
Return Plans  1=Yes 2=No 3=Don't Know 4=No Answer 9=Not Asked                                                                      
Occupation  1=Unpaid 2=Student 3=Agrigulture 4=Unskilled Operative 5=Skilled Operative 6=Transport Worker                                                                      
     7=Unsilled Services 8=Skilled Services 9=Small Business 10=Professional 11=Retired 99=Unknown                                                                      
Wage  Wage in U.S. Dollars; 88=Not applicable; 99=Unknown                                                                      
Hours Worked  Hours Worked; 88=Not Applicable; 99=Unknown                                                                       
Identity  1=Latino 2=American 3=Both 9=Unknown                                                                       
Latino Identity Among Immigrants  1=Yes 2=No 3=Yes-No 4=Don't Know 9=Missing                                                                      
Reasons for Latino Identity  1=Yes 0=No 9=Unknown                                                                       
With Whom Gets Together  1=Yes 0=No 9=Unknown                                                                       
USYrs  Number of Years in US; 88=Not Applicable; 99 Missing                                                                       
In Contact with Home Community  1=Yes 0=No 9=Unknown                                                                       
R Sends Money Home  1=Yes 2=No 3=Send Other 9=Unknown                                                                      
Parent Sends Money Home (Second Generation Only)  1=Yes 2=No 8=Not Applicable 9=Unknown                                                                      
Quantity Sent by Respondent or Parent  1=Half of Paycheck 2=20% of Paycheck 3=Varies Month to Month                                                                      
How Money Sent  1=Moneygram 2=Paisano 3=Friend 4=Self 5=Bank 6=Moneygram and Paisano 7=Moneygram and Friend                                                                      
Frequency Money Sent  1=Once a Month 2=Twice a Year 3=Once a Year 4=Once in a While 5=Holidays                                                                       
How Money Used  0=No Use 1=Buy House 2=Family Expenses 3=Health 4=Education 5=Savings 6=Pay a Debt                                                                       
Bank in US  1=Yes 2=No 9=Unknown                                                                       
Bank Overseas  1=Yes 2=No 9=Unknown                                                                       
Type of Communication  1=Land Phone 2=Cell Phone 3=Calling Card 4=Email 5=Regular Mail 6=No Communication 9=Unknwn                                                                      
Presents Sent  1=Yes 2=No 9=Unknown                                                                       
Education  In Years                                                                       
EngAbil  0=None 1=Some English 2=Good English 9=Missing                                                                      
EconOpps  1=More in US 2=More at Origin 3=Same at Both 9=Missing                                                                      
OthOpps  0=Just Earnings 1=Personal 2=Work 3=Study 4=Political 9=Missing                                                                      
Inequality  1=More at Origin 2=More in US 3=Same in Both 9=Missing                                                                      
Discrim  1=Yes 0=No 9=Missing                                                                      
Context  1=Work/School 2=On Street 3=Language 4=Race/Ethnicity 5=Medical 6=Violence 7=Poverty 8=Other 9=Missing                                                                      

Hier ist mein Code so weit:

import numpy as np 

import csv 

import pandas as pd 



Lat_pro = open('Identity.Codes.Datafile.csv') 

Lat_reader = list(pd.read_csv(Lat_pro)) 



print Lat_reader 

Hier ist meine Ausgabe:

['Unnamed: 0', 'Unnamed: 1', 'Unnamed: 2', 'Unnamed: 3', 'Unnamed: 4', 
'Unnamed: 5', 'Ethnic and Racial Roots', 'Unnamed: 7', 'Unnamed: 8', 'Unnamed: 
9', 'Unnamed: 10', 'Unnamed: 11', 'Unnamed: 12', 'Unnamed: 13', ' Jobs Held', 
'Unnamed: 15', 'Unnamed: 16', 'Unnamed: 17', 'Unnamed: 18', 'Unnamed: 19', 
'Unnamed: 20', 'Unnamed: 21', 'Unnamed: 22', ' Identity', 'Unnamed: 24', 
'Reason for Latino Identity ', 'Unnamed: 26', 'Unnamed: 27', 'Unnamed: 28', 
'Unnamed: 29', 'Unnamed: 30', 'Unnamed: 31', 'Latino ID', 'With Whom Gets 
Together-Major Group', 'Unnamed: 34', 'Unnamed: 35', 'Unnamed: 36', 'Unnamed: 
37', ' With Whom Gets Together---Specific Group', 'Unnamed: 39', 'Unnamed: 40', 
'Unnamed: 41', 'Unnamed: 42', 'Unnamed: 43', 'Unnamed: 44', 'Unnamed: 45', 
'Unnamed: 46', 'Unnamed: 47', 'Unnamed: 48', 'Unnamed: 49', 'Unnamed: 50', 
'Unnamed: 51', 'Unnamed: 52', 'Transnational Behaviors', 'Unnamed: 54', 
'Unnamed: 55', 'Unnamed: 56', 'Unnamed: 57', 'Unnamed: 58', 'Unnamed: 59', 
'Unnamed: 60', 'Unnamed: 61', 'Unnamed: 62', 'Unnamed: 63', 'Unnamed: 64', 
'Unnamed: 65', 'Unnamed: 66', 'Unnamed: 67', 'Perceptions of Opportunity, 
Inequality, Discrimination', 'Unnamed: 69', 'Unnamed: 70', 'Unnamed: 71', 
'Unnamed: 72'] 
+0

Sie zeigen uns die ersten Zeilen der CSV-Datei, so ist es schwer zu sagen. Ein kurzer Blick auf [pandas.read_csv] (http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html) zeigt eine Auswahl an Optionen. Ich würde vorschlagen, mit nur einer 2- oder 3-zeiligen Datei zu arbeiten, bis die Daten auf eine Weise kommen, die Sie mögen. –

Antwort

1

funktioniert möglicherweise besser, wenn die Daten durch Komma getrennt waren. Sie können das in den Daten verwendete Trennzeichen mit der Option delimeter (a.k.a sep) angeben.

Check out the docs

Zum Beispiel:

pandas.read_csv('file.csv', delimiter=',') 

wie Peter sagen war, nur sicherstellen, dass Ihre Daten korrekt abgegrenzt ist, und dann können Sie es geben dort sicher zu sein, es es richtig liest .

Auch diese erste Kopfzeile wird Dinge in der ersten Datendatei vermasseln. Es ist wahrscheinlich am besten, dass Sie nur das entfernen, aber Sie können es auch ignorieren, indem Sie die Option skiprows verwenden.

pandas.read_csv('file.csv', delimiter=',', skiprows=1) 

Update:

eine Bereinigung der Daten tun, liest die erste Geldbuße nur ohne delimiter oder skiprows zu verwenden.

Daten

Ethnic,and,Racial,Roots,Jobs,Held,Identity,Reason,for,Latino,Identity,Latino,ID,With,Whom,Gets,Together-Major,Group,With,Whom,Gets,Together---Specific,Group,Transnational,Behaviors,Perceptions,of,Opportunity,,Inequality,,Discrimination, 
Subject,Code,Gen,Place,Age,Male,Country,African,European,Indian,Other,Color,Docs,Reason,Return,1st,Occup,1st,Oc,Code,1st,Wage,Cur,Occup,Cur,Oc,Code,Cur,Wage,Cur,Hours/Day,Father,Occ,Mother,Occ,Identity,ID,as,Latino,Ethnicity,Culture,Language,Politics,Values,Emotions,Everything,Among,Imms,Mexican,Cen,Amer,Caribbean,South,Amer,Latinos-Gen,Mex,Gua,Nic,SS,Hon,CR,PR,DR,Ecu,Col,Ven,Bra,Per,Arg,USYrs,Contact,R-Remits,P-Remits,Quantity,Freq,Sent,How,Sent,Use,1,Use,2,US,Bank,OS,Bank,Type,Com,1,Type,Com,2,Presents,Educ,EngAbil,EconOpps,OthOpps,Ineqaulity,Discrim,Context 
F-001,1,1,28,1,2,0,1,1,0,1,2,3,4,serv,sk,park,8,7.5,serv,sk,park,8,14,10,99,99,1,1,1,0,0,0,0,0,0,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,2,1,1,8,1,1,1,1,2,1,1,1,0,9,13,1,1,0,1,1,3 
F-002,1,2,35,1,15,1,1,1,0,3,9,6,4,sales,work,uns,7,7,music,artist,10,7,99,9,9,1,1,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,1,1,8,3,2,5,3,2,1,1,1,0,1,13,2,2,0,1,9,9 
F-003,1,1,30,0,10,0,1,1,0,1,2,1,1,restfood,unsk,7,2.9,inspect,arq,skill,8,2.9,10,99,99,2,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,1,8,3,1,1,2,0,2,2,1,0,2,6,0,1,0,3,1,2 
F-007,1,3,19,1,10,0,0,1,0,3,2,1,4,cleanserv,unsk,7,8,restfood,unsk,7,8,10,3,3,1,1,1,0,0,0,0,0,0,1,9,9,9,9,9,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,1,1,8,1,1,1,5,1,2,1,1,0,1,6,1,1,0,3,1,1 
F-008,1,3,20,1,10,0,0,1,0,3,2,1,1,professional,10,8.75,restfood,skill,8,8.75,10,3,3,1,1,0,0,0,0,1,0,0,1,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,4,1,1,8,1,1,1,4,5,2,1,1,0,2,11,1,1,0,1,1,8 
F-010,1,2,21,0,5,0,1,1,0,1,1,5,1,serv,sk,cashier,8,6.75,serv,skill,libra,8,10,10,8,1,1,1,0,1,0,0,0,0,0,3,0,0,1,1,0,0,0,0,0,0,0,1,1,1,0,1,0,0,0,3,1,1,8,1,1,1,2,3,1,2,4,0,1,13,2,1,0,1,0,3 
F-013,1,3,29,1,5,1,1,0,0,1,2,2,4,manufa,unsk,4,4,manufa,unsk,4,4,8,10,10,2,1,0,1,0,0,0,0,0,1,0,0,1,1,0,0,0,0,0,0,0,1,1,0,0,1,0,0,0,8,1,2,8,9,9,9,9,9,9,9,1,4,1,18,2,2,0,3,1,4 
F-014,1,1,25,1,10,0,1,1,0,3,2,1,4,restfood,unsk,7,3.5,restfood,unsk,7,3.5,9,6,1,1,1,1,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,5,1,3,8,2,4,1,2,0,2,1,1,0,1,6,0,1,0,3,0,0 
F-015,1,3,23,1,5,1,1,0,0,3,9,6,4,unknown,99,99,unknwon,99,99,99,99,99,9,9,9,9,9,9,9,9,9,9,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,9,9,9,9,9,9,9,9,9,9,9,9,99,9,9,9,9,9,9 
F-016,1,3,30,0,5,1,1,1,0,2,3,3,2,clean,serv,unsk,7,7,clean,serv,unsk,7,7,10,5,1,1,1,0,0,0,0,1,0,0,1,0,1,1,1,0,0,0,0,0,0,0,1,1,0,1,0,0,0,0,4,1,1,8,2,1,1,4,2,1,2,3,0,1,9,1,1,0,1,1,3 
F-017,1,3,21,0,10,0,1,1,0,3,2,1,1,domest,garden,7,5,homekeeper,1,5,8,6,1,1,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,2,8,3,2,1,3,2,2,2,4,0,1,9,0,1,0,2,1,5 
F-018,1,3,23,1,10,1,1,1,0,3,2,3,2,ambulant,unsk,7,restfood,unsk,7,99,9,1,1,1,0,1,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,1,1,8,3,2,1,2,0,2,2,3,0,1,12,2,9,9,2,1,4 
F-019,1,3,34,1,4,0,1,1,0,1,1,2,4,domest,garden,7,3,professional,10,3,99,10,9,1,1,0,1,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6,1,2,8,9,9,9,9,9,9,9,1,0,2,20,1,1,0,1,1,8 
F-020,1,3,33,1,3,1,1,0,0,1,2,1,4,domestic,serv,7,1.25,sales,work,unsk,7,1.25,12,5,1,1,1,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,1,1,8,1,1,1,4,0,1,1,1,4,1,14,1,1,0,1,1,4 
F-021,1,3,33,0,5,1,0,1,1,4,3,2,2,clean,serv,unsk,7,9,clean,serv,unsk,7,9,10,3,1,1,1,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,10,1,3,8,3,4,1,2,3,2,2,1,1,1,14,1,1,0,2,1,3 
F-022,1,3,33,1,3,1,1,1,0,1,2,2,1,sales,work,uns,7,99,clean,serv,unsk,7,99,8,99,1,1,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,1,8,1,1,1,1,5,1,1,2,3,2,12,1,1,0,1,1,8 
F-024,1,3,26,1,15,1,1,1,0,3,2,2,4,restfood,unsk,7,8.75,sales,work,unsk,7,8.75,99,5,7,1,1,0,0,1,0,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,1,8,3,1,1,2,0,2,1,1,0,2,13,1,1,0,9,0,0 
F-025,1,2,31,1,6,0,1,1,0,1,3,5,2,serv,rest,skill,8,7.5,restfood,unsk,7,7.5,12,9,1,2,1,1,0,0,0,0,0,0,1,0,1,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,13,0,3,8,3,4,3,2,0,2,2,1,0,2,12,2,1,0,1,1,1 
F-026,1,3,31,0,6,0,1,1,0,3,3,5,4,serv,hotel,skill,8,8,manager,proffes,10,8,8,5,1,1,1,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,11,1,3,8,3,5,5,2,0,1,2,1,0,1,13,2,1,0,3,1,3 
F-027,1,3,20,1,14,0,1,1,0,1,1,3,4,adm,asist,NGO,10,3.75,superv,rest,skill\,8,3.75,8,8,8,1,1,0,1,0,0,0,0,0,9,1,0,1,1,0,1,0,0,0,0,0,1,1,0,0,1,0,0,0,3,1,2,8,9,9,9,9,9,1,1,1,0,1,12,2,1,0,9,1,4 
F-028,1,1,20,0,10,0,1,1,0,3,1,5,1,manufcloth,unsk,7,2.5,adm,asist,NGO,10,2.5,8,7,1,2,1,1,0,0,0,0,0,0,4,1,1,1,0,0,1,0,0,1,1,0,0,1,0,0,0,0,0,0,1,1,1,8,3,1,1,2,0,2,2,1,0,1,12,0,1,0,3,1,4 
F-032,1,3,22,1,6,0,1,1,0,1,2,2,1,restfood,unsk,7,6.25,restfood,unsk,7,6.25,12,9,1,2,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,1,1,8,2,1,1,1,2,2,2,1,0,9,9,1,1,0,1,0,0 
F-033,1,1,20,1,10,0,1,1,0,1,2,3,1,restfood,unsk,7,12,servworker,skil;,8,12,10,6,1,2,1,0,0,1,0,0,0,0,1,1,1,0,1,0,1,0,0,1,1,0,0,0,1,1,0,0,0,0,2,1,2,8,1,1,1,2,0,2,2,1,0,2,12,1,1,0,1,1,3 
F-034,1,3,30,0,4,1,1,1,0,1,3,2,3,manufa,unsk,4,99,domestic,serv,7,99,5,11,1,1,1,1,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,2,1,2,8,9,9,9,9,9,1,2,2,0,1,16,2,1,0,2,1,4 
F-035,1,3,22,1,10,0,1,1,0,1,2,5,1,cleanserv,unsk,7,10,restfood,unsk,7,10,10,9,9,1,1,1,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,1,8,1,2,7,4,0,2,2,1,0,1,7,1,1,0,1,1,6 
F-036,1,3,26,0,3,0,1,1,0,2,2,1,1,salesfood,unsk,7,6,domerstserv,uns,7,6,99,99,99,1,1,0,0,0,0,1,0,0,2,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,5,1,1,8,3,1,1,1,2,1,1,9,9,9,12,1,9,9,9,9,9 
F-037,1,3,25,1,10,0,0,1,0,3,2,5,1,restfood,unsk,7,99,restfood,unsk,7,99,4,3,1,1,1,0,1,0,0,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,8,2,1,2,1,2,2,2,1,0,2,7,1,1,0,1,0,0 
F-038,1,1,19,0,5,1,1,1,0,5,1,5,2,salespharm,uns,7,7.5,restfood,unsk,7,7.5,5,6,8,1,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,9,1,1,8,3,4,1,3,2,2,2,1,0,1,13,1,1,0,3,1,8 
F-039,1,3,21,1,13,0,1,1,1,3,2,5,4,manufac,unskil,4,5.25,salespharm,uns,7,5.25,99,9,1,1,1,0,0,0,0,0,0,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0,0,1,4,1,3,8,3,4,7,1,0,1,2,1,0,1,12,2,1,0,2,1,3 
F-040,1,3,20,0,5,1,1,0,0,4,1,5,1,manufac,unskill,4,5.5,clean,serv,unsk,7,5.5,8,5,9,1,1,0,0,0,0,1,0,0,1,0,0,1,1,0,0,0,0,0,0,0,1,1,0,1,1,0,0,0,2,1,2,8,3,2,1,3,0,1,2,1,0,1,12,0,1,0,2,1,8 
F-041,1,2,25,0,6,0,1,1,0,3,2,5,1,manufac,unskill,4,3,restfood,unsk,7,3,8,99,99,1,1,0,0,0,0,1,0,0,1 

Für die Codes, könnte ich lieber hier ein Wörterbuch der Wörterbücher verwenden.

z.

codes = {'Generation':{1:'First', 2: second}, 
     'Location':{1:'New York', 2:'Pennsylvania', 3: 'New Jersey'} 
     } 

Dann können Sie die Werte wie diese Referenz:

codes['Generation'][1] # yeilds 'First' 
+0

Danke. Ich bin neugierig, wie haben Sie die Daten bereinigt? Ich habe es versucht, aber ich bin seltsamerweise in NaN hineingerannt. – dabberson567

+0

Ich kopierte es einfach in 'vim' (ein Linux-Texteditor) und tat [dies] (http://stackoverflow.com/a/13761703/943773), was alle Leerzeichen zwischen den Elementen durch Kommas ersetzte. – ryanjdillon

+0

Außerdem habe ich bemerkt, dass einige der Header-Felder Leerzeichen hatten und diese auch durch Kommas ersetzt wurden (zB 'Ethnic, und Racial, roots, ... '). Sie könnten dies in einem Excel-ähnlichen Programm tun, indem Sie es importieren , dann speichern Sie es als CSV. – ryanjdillon