pattern_clustering.boost.pattern_clustering_without_preprocess

pattern_clustering_without_preprocess(lines: list, map_name_dfa: Optional[dict] = None, densities: Optional[list] = None, max_dist: float = 0.6, use_async: bool = True, make_mg: Optional[callable] = None) → list[source]

Computes the pattern clustering of input lines without aggregating duplicated PAs.

Parameters

lines – A list(str) gathering the input lines.
map_name_dfa – A dict{str : Automaton} mapping each pattern name with the corresponding Automaton.
densities – A density vector. See make_densities().
max_dist – The maximum distance between an element of a cluster and the cluster representative. As distances are normalized, this value should be between 0.0 and 1.0.
use_async – Pass True to run computations using async calls. This accelerates computations.
make_mg – A MultiGrepFunctor instance.

Returns

A list(int) mapping each line index with its corresponding cluster identifier.