Update

This is an old barebones very fast v0 of SlowMoMan. For a slightly slower version with more quality-of-life features using the D3 library:

We keep this version around because it is substantially more efficient for extremely large data sets.

Deol K, Weber GM, Yu YW. SlowMoMan: A web app for discovery of important features along user-drawn trajectories in 2D embeddings. bioRxiv. 2022 Aug 25:2022-08.


Greetings!

Welcome to SlowMoMan (slow motions on manifolds), a Javascript webtool for doing feature selection on 2D embeddings of manifolds by finding the features that have the lowest frequency variations across some user-drawn path.

Nonlinear low-dimensional embeddings (such as van der Maaten and Hinton's t-SNE) are great for visualizing high-dimensional data, allowing humans to see shapes and clusters in the data. Unfortunately, interpreting those embeddings can be a bit trickier because the axes of the embedding cannot be directly related to the original features of interest. We can see patterns in the embedding, but figuring out what those patterns correspond to in the original data is much harder.

SlowMoMan solves this problem by allowing the user to draw a path (or using fancy mathematical terms, 1D manifold) directly onto a 2D embedding. Then, it backprojects using a nearest neighbors algorithm up to a high dimensional space where each dimension/variable is a potential feature of interest. SlowMoMan does a Fast Fourier Transform on each variable as it varies along that manifold, and computes an FFT score defined as the harmonic sum of FFT magnitudes. We thus weight the low freqencies much more highly, favoring features that vary slowly along the manifold.

Pre-requisites

Before getting started, note that SlowMoMan does all computation locally, so you'll need a fast enough computer. We recommend a machine with at least 4 GiB of RAM and a browser that includes a modern Javascript rendering engine. We've tested on Firefox 65, Chrome 71, and Edge 44, but any modern browser should work. Notably, this does not include Internet Explorer, which has been deprecated.

Getting started

First, we want to emphasize that you're not actually uploading any data to our servers. Everything is done on your own computer. In fact, if you're paranoid, you can disconnect from the network once you've loaded this page; everything will still work. However, the flip-side is that you'll have to download our example data files locally. We provide three sets of examples.

  1. Pick one of the three examples.
  2. Download both 2D TSNE embedding and the high dimensional features/variables from the list below.
  3. Choose the TSNE file as the "2D embedding CSV file" in the first box.
  4. Choose the other high dimensional data file as the "original space CSV file".
  5. Draw a path in the box below by holding down the left mouse button and dragging.
  6. Click "Compute Important Variables via FFT."
  7. SlowMoMan then lists out the important variables in the table below, and graphs them in the box in the lower right.
  8. You can select different variables to view, either individually, or together by shift/ctrl clicking in the table.
  9. Hitting "Reset Path" will erase the path you drew, but keep the data loaded.
  10. If you want to try one of the other examples, you can upload those data as well, though sometimes uploading multiple data files in succession will crash the page because of using up too much memory. For safety and lower memory consumption, we recommend refreshing the page when you load new data.
  11. If you want to save a path you've previously drawn, just copy and paste out the "Path points" textbox. To reload a previous path, hit "Toggle text input", paste a list of points back in, and hit "Toggle text input" one more time to draw the new path.

Swiss Roll

TSNE
Original data

Fashion MNIST

TSNE
Trained CNN node activations in final layer

Human Microbiome Project

TSNE
First 1000 OTUs data file (smaller high dimensional data file that has most of the important variables for easier downloading and faster computation. Recommended for playing around with SlowMoMan).
All 27,655 OTUs data file (file is really large, at 189 MiB, and computation will take up to a few minutes on a slow computer).

Bacterial 16S rRNA sequences

Taken 108,413 isolated named strains of bacteria in the Greengenes database
TSNE (9.3 MiB. TSNE generated using Hamming distance on 16S rRNA Multiple Sequence Alignment)
Sequence loci nucleotides (163 MiB. Only for variants whose consensus makes up less than 90% of the dataset. 0=empty, 1=A, 2=C, 3=G, 4=T.)


Upload your 2D embedding CSV File (e.g. *-2Dtsne.csv), with columns labeled "X" and "Y" for coordinates, and optionally "class" and "desc".
Upload your feature space CSV File (e.g. *-features*.csv), with headers specifying variable names
Labels (if any)
Draw a path of interest on the canvas below:
Path points
 

 
Label Color Variable Score