Summary Notes on Unsupervised Learning

 Target: Clustering | Anomaly detection 


in contrast: clustering



Applications of clustering:
Grouping similar news
Market segmentation
DNA analysis
Astronomical data analysis



K-means intuition:






Repeat the steps above to achieve the following result.


Continue until no further changes occur, indicating convergence.




K-means algorithm:


If a cluster has zero points, then k=k-1 clusters. or reinitialize k cluster centroids.


K-means Optimize the distortion cost function:



The distortion cost function should go down.

How to initialize k-means?



Always use multiple random initializations, as this significantly improves K-means' ability to minimize the distortion cost function and select better cluster centroids.


Choosing the number of clusters:

Elbow method is not frequently used





You can use K-means to determine t-shirt sizes by clustering data. Using three clusters may categorize sizes as small, medium, and large, while five clusters could define extra small, small, medium, large, and extra large. Both approaches are valid, but the choice depends on the trade-off between better fit with more sizes and higher manufacturing and shipping costs. A practical approach is to run K-means with K=3K  and K=5K , compare the results, and decide based on what best balances fit and cost for the business.

Comments

Popular posts from this blog

Analysis of Repeated Measures Data using SAS (1)

Medical information for Melanoma, Merkel cell carcinoma and tumor mutation burden

Four essential statistical functions for simulation in SAS