For grouping temporal segments of an audio piece, which is structured into
main parts repeatedly occurring in the audio piece, into various segment
classes, at first a similarity representation for the segments is
provided, wherein the similarity representation for each segment
comprises an associated plurality of similarity values, wherein the
similarity values indicate how similar the segment is to every other
segment of the audio piece. Hereupon, using the similarity values
associated with the segment, a similarity threshold value for a segment
is calculated in order to then associate a segment with a segment class
when the similarity value of the segment meets a predetermined relation
with reference to the similarity threshold value. With this, clustering
is achieved, which also works efficiently and correctly where there are
segments with strongly different or almost equal combined similarity
values.