Topology and Data

“Topology and Data” by Gunnar Carlsson

Introduction

Today I asked ChatGBT how much data Google stores, and was given the non-specific response that the volume would be measured in exabytes (billions of gigabytes) or zettabytes (trillion gigabytes). In problem spaces where there is a proponderance of data motivates my interest in this article, where we are exploring how topology can aid us in findin signal in a world of noise. One of the challenges Carlsson points out immediately in this paper is that data is often high-dimensional, which restricts our ability to visualize it. Whenever I want to make high-dimensional datasets tractable in my imagination I turn to topology. For instance, maybe An analogy for simplicial complexes might be throwing a bag of toothipicks on the ground, mostly blue but a handful of red. If all the red toothpicks were to move towards their nearest blue toothpick, the constellation of toothpick clusters would be a deformation retraction of the initial unorganized arrangement.