Default function to tokenize — default

This tokenizer uses stringi::stri_split_boundaries() to tokenize a character vector. To be used with [explain.character()`.

default_tokenize(text)

Arguments

text: text to tokenize as a character vector

Value

a character vector.

Examples

data('train_sentences')
default_tokenize(train_sentences$text[1])
#>  [1] "although"    "the"         "internet"    "as"          "level"      
#>  [6] "topology"    "has"         "been"        "extensively" "studied"    
#> [11] "over"        "the"         "past"        "few"         "years"      
#> [16] "little"      "is"          "known"       "about"       "the"        
#> [21] "details"     "of"          "the"         "as"          "taxonomy"