you will first have to define how you wanna cluster your data. The scikit-learn’s simple KMeans clustering is designed to work on numbers. However scikit-learn can be also be used to cluster documents by topics using a bag-of-words approach. This is done by extracting the features using scipy.sparse matrix instead of standard numpy arrays
One of the demo example is given here:
http://scikit-learn.org/stable/auto_examples/text/document_clustering.html