2016-08-03 12 views

Antwort

2

Hier ist ein vektorisiert Ansatz NumPy broadcasting -

def filter_rows(arr): 
    # Detect matches along same columns for both cols 
    samecol_mask1 = arr[:,None,0] == arr[:,0] 
    samecol_mask2 = arr[:,None,1] == arr[:,1] 
    samecol_mask = np.triu(samecol_mask1 | samecol_mask2,1) 

    # Detect matches across the two cols 
    diffcol_mask = arr[:,None,0] == arr[:,1] 

    # Get the combined matching mask 
    mask = samecol_mask | diffcol_mask 

    # Get the indices of the mask which gives us the row IDs that have matches 
    # across either same or different columns. Delete those rows for output. 
    dup_rowidx = np.unique(np.argwhere(mask)) 
    return np.delete(arr,dup_rowidx,axis=0) 

Beispiel verschiedene Szenarien

Fall # 1 zu präsentieren läuft : Mehrere Übereinstimmungen zwischen derselben und verschiedenen Spalten

In [313]: arr 
Out[313]: 
array([[0, 1], 
     [2, 3], 
     [4, 0], 
     [0, 4]]) 

In [314]: filter_rows(arr) 
Out[314]: array([[2, 3]]) 

Fall # 2: Partien entlang denselben Spalten

In [319]: arr 
Out[319]: 
array([[ 0, 1], 
     [ 2, 3], 
     [ 8, 10], 
     [ 0, 4]]) 

In [320]: filter_rows(arr) 
Out[320]: 
array([[ 2, 3], 
     [ 8, 10]]) 

Fall # 3: Partien entlang unterschiedlichen Spalten

In [325]: arr 
Out[325]: 
array([[ 0, 1], 
     [ 2, 3], 
     [ 8, 10], 
     [ 7, 0]]) 

In [326]: filter_rows(arr) 
Out[326]: 
array([[ 2, 3], 
     [ 8, 10]]) 

Case # 4: Spiele in der gleichen Zeile

In [331]: arr 
Out[331]: 
array([[ 0, 1], 
     [ 3, 3], 
     [ 8, 10], 
     [ 7, 0]]) 

In [332]: filter_rows(arr) 
Out[332]: array([[ 8, 10]]) 
1

Nur eine Alternative zu der beeindruckenden Lösung von @Divakar. Dieser Ansatz ist auf alle Fälle schlechter (insbesondere Effizienz), aber für Nicht-Numpy-Gurus vielleicht verständlicher.

import numpy as np 

def filter_(x): 
    unique = np.unique(x) # 1 
    unique_mapper = [np.where(x == z)[0] for z in unique] # 2 
    filtered_unique_mapper = list(map(lambda x: x if len(x) > 1 else [], unique_mapper)) # 3 
    all = np.concatenate(filtered_unique_mapper) # 4 
    to_delete = np.unique(all) # 5 
    return np.delete(x, all, axis=0) 

# 1 get global unique values 
# 2 for each unique value: get all rows with this value 
# -> multiple entries for one unique value: row's collide! 
# 3 remove entries from above, if only <= 1 rows hold that unique value 
# 4 collect all rows, which collided somehow 
# 5 remove multiple entries from above