Tk-merge: Computationally Efficient Robust Clustering Under General Assumptions

Abstract

We address general-shaped clustering problems under very weak parametric assumptions with a two-step hybrid robust clustering algorithm based on trimmed k-means and hierarchical agglomeration. The algorithm has low computational complexity and effectively identifies the clusters also in the presence of data contamination. Its generalizations and an adaptive procedure to estimate the amount of contamination are also presented.

Publication
Luca Insolia
Luca Insolia
Postdoctoral Researcher

My primary research interests concern robust statistics and high-dimensional modeling. During my PhD, I developed statistical methodologies for analyzing sparse regression problems affected by different forms of adversarial data contamination. The developed methodologies encompass continuous optimization methods as well as mixed-integer programming techniques. I applied these tools to analyze biomedical data and to investigate the main possible drivers of honey bee colony loss.