pattern_clustering.boost.pattern_clustering_without_preprocess
- pattern_clustering_without_preprocess(lines: list, map_name_dfa: Optional[dict] = None, densities: Optional[list] = None, max_dist: float = 0.6, use_async: bool = True, make_mg: Optional[callable] = None) list [source]
Computes the pattern clustering of input lines without aggregating duplicated PAs.
- Parameters
lines – A
list(str)
gathering the input lines.map_name_dfa – A
dict{str : Automaton}
mapping each pattern name with the corresponding Automaton.densities – A density vector. See
make_densities()
.max_dist – The maximum distance between an element of a cluster and the cluster representative. As distances are normalized, this value should be between
0.0
and1.0
.use_async – Pass
True
to run computations using async calls. This accelerates computations.make_mg – A
MultiGrepFunctor
instance.
- Returns
A
list(int)
mapping each line index with its corresponding cluster identifier.