Wie erstellt man eine spärliche Matrix in PySpark?

Ich bin neu in Spark. Ich möchte eine spärliche Matrix eine User-ID Item-ID-Matrix speziell für eine Empfehlungsmaschine machen. Ich weiß, wie ich das in Python machen würde. Wie macht man das in PySpark? Hier wäre, wie ich es in Matrix getan hätte. Der Tisch sieht jetzt so aus.Wie erstellt man eine spärliche Matrix in PySpark?

Session ID| Item ID | Rating 
    1   2  1 
    1   3  5

import numpy as np 

    data=df[['session_id','item_id','rating']].values 
    data 

    rows, row_pos = np.unique(data[:, 0], return_inverse=True) 
    cols, col_pos = np.unique(data[:, 1], return_inverse=True) 

    pivot_table = np.zeros((len(rows), len(cols)), dtype=data.dtype) 
    pivot_table[row_pos, col_pos] = data[:, 2]

Quelle

2016-06-30 ashish trehan

einen Blick auf sparsevector Nehmen: https://spark.apache.org/docs/1.1.0/api/python/pyspark.mllib .linalg.SparseVector-class.html – Gopala

wie folgt aus:

>>> from pyspark.mllib.linalg.distributed import CoordinateMatrix, MatrixEntry 
>>> table = sqlContext.createDataFrame(
...  sc.parallelize([[1, 2, 1], [1, 3, 5]]) 
...) 
>>> mat = CoordinateMatrix(table.rdd.map(lambda row: MatrixEntry(*row)))

Quelle

2016-06-30 23:27:31

Wie erstellt man eine spärliche Matrix in PySpark?

Antwort

Verwandte Themen