VPTrees.jl
Implementation of Vantage Point Trees in Julia. A Vantage Point Tree is a metric tree which can be used for nearest neighbor or radius searches in any metric space. See Data structures and algorithms for nearest neighbor search in general metric spaces.
Example:
using VPTrees
using StringDistances
data = ["bla", "blub", "asdf", ":assd", "ast", "baube"]
vptree = VPTree(data, Levenshtein()))
query = "blau"
radius = 2
data[find(vptree, query, radius)]
# 2-element Array{String,1}:
# "bla"
# "blub"
n_neighbors = 3
data[find_nearest(vptree, query, n_neighbors)]
# 3-element Array{String,1}:
# "baube"
# "blub"
# "bla"
Related Packages
The following packages implement other data structures for use in nearest neighbor and radius search in metric space:
- BKTrees.jl
- NearestNeighbors.jl implementing Ball Trees
API
VPTrees.VPTree
— TypeVPTree(data::Vector{InputType}, metric; threaded=nothing)
Construct Vantage Point Tree with a vector of data
and given a callable metric
. threaded
uses threading is only avaible in Julia 1.3+ to parallelize construction of the Tree. When not explicitly set, is set to true when the necessary conditions are met.
Example:
using VPTrees
using StringDistances
data = ["bla", "blub", "asdf", ":assd", "ast", "baube"]
vptree = VPTree(data, Levenshtein())
query = "blau"
radius = 2
data[find(vptree, query, radius)]
# 2-element Array{String,1}:
# "bla"
# "blub"
n_neighbors = 3
data[find_nearest(vptree, query, n_neighbors)]
# 3-element Array{String,1}:
# "baube"
# "blub"
# "bla"
VPTrees.find
— Methodfind(vptree::VPTree{InputType, MetricReturnType}, query::InputType, radius::MetricReturnType, skip=nothing)::Vector{Int}
Find all items in vptree
within radius
of query
with respect to the metric defined in the VPTree. Returns indices into VPTree.data. The optional skip
argument is a function f(::Int)::Bool
which can be used to omit points in the tree from the search based on their index.
Example
using VPTrees
using StringDistances
data = ["bla", "blub", "asdf", ":assd", "ast", "baube"]
metric = (a, b) -> evaluate(Levenshtein(),a,b)
vptree = VPTree(data, metric)
query = "blau"
radius = 2
data[find(vptree, query, radius)]
# 2-element Array{String,1}:
# "bla"
# "blub"
VPTrees.find_nearest
— Methodfind_nearest(vptree::VPTree{InputType, MetricReturnType}, query::InputType, n_neighbors::Int, skip=nothing)::Vector{Int}
Find n_neighbors
items in vptree
closest to query
with respect to the metric defined in the VPTree. Returns indices into VPTree.data. The optional skip
argument is a function f(::Int)::Bool
which can be used to omit points in the tree from the search based on their index.
Example:
using VPTrees
using StringDistances
data = ["bla", "blub", "asdf", ":assd", "ast", "baube"]
metric = (a, b) -> evaluate(Levenshtein(),a,b)
vptree = VPTree(data, metric)
query = "blau"
n_neighbors = 3
data[find_nearest(vptree, query, n_neighbors)]
# 3-element Array{String,1}:
# "baube"
# "blub"
# "bla"