This tokenizer uses stringi::stri_split_boundaries()
to tokenize a
character
vector. To be used with [explain.character()`.
default_tokenize(text)
text to tokenize as a character
vector
a character
vector.
data('train_sentences')
default_tokenize(train_sentences$text[1])
#> [1] "although" "the" "internet" "as" "level"
#> [6] "topology" "has" "been" "extensively" "studied"
#> [11] "over" "the" "past" "few" "years"
#> [16] "little" "is" "known" "about" "the"
#> [21] "details" "of" "the" "as" "taxonomy"