This tokenizer uses stringi::stri_split_boundaries() to tokenize a character vector. To be used with [explain.character()`.

default_tokenize(text)

Arguments

text

text to tokenize as a character vector

Value

a character vector.

Examples

data('train_sentences') default_tokenize(train_sentences$text[1])
#> [1] "although" "the" "internet" "as" "level" #> [6] "topology" "has" "been" "extensively" "studied" #> [11] "over" "the" "past" "few" "years" #> [16] "little" "is" "known" "about" "the" #> [21] "details" "of" "the" "as" "taxonomy"