pattern_clustering.boost.pattern_clustering_with_preprocess
- pattern_clustering_with_preprocess(lines: list, map_name_dfa: Optional[dict] = None, densities: Optional[list] = None, max_dist: float = 0.6, use_async: bool = True, make_mg: Optional[callable] = None) list [source]
Computes the pattern clustering of input lines by grouping matching PAs.
This implies that lines having matching PatternAutomaton always fall in the same clusters which accelerate the code. Sometimes, this may lead to weird cluster, especially if some lines are unrelated and conform to the same
PatternAutomaton
.- Parameters
lines – A
list(str)
gathering the input lines.map_name_dfa – A
dict{str : Automaton}
mapping each pattern name with the corresponding Automaton.densities – A density vector. See
make_densities()
.max_dist – The maximum distance between an element of a cluster and the cluster representative. As distances are normalized, this value should be between
0.0
and1.0
.use_async – Pass
True
to run computations using async calls. This accelerates computations.make_mg – A
MultiGrepFunctor
instance.
- Returns
A
list(int)
mapping each line index with its corresponding cluster identifier.