Jump to content

Help with homework


Meowmeow

Recommended Posts

I have 40 columns of data with 10,000 rows. I want to reduce the dimensionality to 2 or 3 dimensions, such that the clustering is not by Euclidean distance but by the probability distribution of the cluster. How to do this? R or Python is ok. 

Link to comment
Share on other sites

2 minutes ago, Meowmeow said:

I have 40 columns of data with 10,000 rows. I want to reduce the dimensionality to 2 or 3 dimensions, such that the clustering is not by Euclidean distance but by the probability distribution of the cluster. How to do this? R or Python is ok. 

PCA (Principal Component Analysis)

Link to comment
Share on other sites

1 minute ago, Meowmeow said:

That's Euclidean distance no?

Nevermind, that's not even Euclidean distance, but based on covariance. I need to probability distribution of the cluster. 

Link to comment
Share on other sites

2 minutes ago, Meowmeow said:

That's Euclidean distance no?

Try PCA with n_components=2 but depends on R2 how much of data is explained

eucledian distance is used in KNN

Link to comment
Share on other sites

Just now, Meowmeow said:

Nevermind, that's not even Euclidean distance, but based on covariance. I need to probability distribution of the cluster. 

Clustering and dimensionality reduction are two different things.. Do you want to reduce 40 columns into 2 or 3 columns 

or you want to assign each row in the 10K rows to a cluster number that you specify?

Link to comment
Share on other sites

Just now, kathanayaka said:

Try PCA with n_components=2 but depends on R2 how much of data is explained

eucledian distance is used in KNN

PCA is only for the variance in the data. tSNE is for Euclidian distance between clusters. I need something for the probability distribution of the cluster. 

Link to comment
Share on other sites

your wording is not correct 

I want to reduce the dimensionality to 2 or 3 dimensions -- Does this mean you want to divide the dataset into clusters like 2 or 3?

Link to comment
Share on other sites

1 minute ago, ring_master said:

Clustering and dimensionality reduction are two different things.. Do you want to reduce 40 columns into 2 or 3 columns 

or you want to assign each row in the 10K rows to a cluster number that you specify?

I am doing clustering, the distance between clusters denoting the probability distribution of finding the cluster. 

Link to comment
Share on other sites

1 minute ago, kathanayaka said:

your wording is not correct 

I want to reduce the dimensionality to 2 or 3 dimensions -- Does this mean you want to divide the dataset into clusters like 2 or 3?

The number of clusters should depend on how different the data is (ideally 40 columns would become 7-8 clusters) Atleast that's my solution to the problem based on Euclidian distance. 

Link to comment
Share on other sites

Just now, Meowmeow said:

I am doing clustering, the distance between clusters denoting the probability distribution of finding the cluster. 

K- means vaadu but K-means assigns cluster to it's members based on euclidean distance between clusters

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...