Different resolutions of the visualization are available:
TIMIT is a phoneme recognition data set of 510222 samples in 49 classes (48 phonenmes and a miscellaneous class) (shown by colors). The visualization was created with the following steps:
- Each sample is a 50ms segment of speech, where three sliding windows of 30ms are represented by 39 MFCC features, in total 117 dimensions. The resulting data matrix X is sized 510222×117.
- Calculate 20-Nearest-Neighbor graph, using the Euclidean distances; A=fastknn(X,20);
- Supply A to NE using wtsne_p, with over-attraction initialilzation, i.e. Y=wtsne_p(A, true);
- The class names are placed in the median location of each class in the 2-D space.