Public API
Neighborhood.jl is a Julia package that provides a unified interface for doing neighbor searches in Julia. This interface is described in this page.
Search Structures
Neighborhood.searchstructure
— Functionsearchstructure(S, data, metric; kwargs...) → ss
Create a search structure ss
of type S
(e.g. KDTree, BKTree, VPTree
etc.) based on the given data
and metric
. The data types and supported metric types are package-specific, but typical choices are subtypes of <:Metric
from Distances.jl. Some common metrics are re-exported by Neighborhood.jl.
All currently supported search structures are:
BruteForce KDTree
The following functions are defined for search structures:
Neighborhood.datatype
— Functiondatatype(::Type{S}) :: Type
Get the type of the data points (arguments to the metric function) of a search struture of type S
.
Neighborhood.getmetric
— Functiongetmetric(ss)
Get the metric function used by the search structure ss
.
Search functions
Neighborhood.search
— Functionsearch(ss, query, t::SearchType [, skip]; kwargs... ) → idxs, ds
Perform a neighbor search in the search structure ss
for the given query
with search type t
(see SearchType
). Return the indices of the neighbors (in the original data) and the distances from the query.
Optional skip
function takes as input the index of the found neighbor (in the original data) skip(i)
and returns true
if this neighbor should be skipped.
Package-specific keywords are possible.
Neighborhood.isearch
— Functionisearch(args...; kwargs... ) → idxs
Same as search
but only return the neighbor indices.
Neighborhood.inrange
— Functioninrange(ss, query, r::Real [, skip]; kwargs...)
search
for WithinRange(r)
search type.
Neighborhood.knn
— Functionknn(ss, query, k::Int [, skip]; kwargs...)
search
for NeighborNumber(k)
search type.
Search types
Neighborhood.SearchType
— TypeSupertype of all possible search types of the Neighborhood.jl common API.
Neighborhood.WithinRange
— TypeWithinRange(r::Real) <: SearchType
Search type representing all neighbors with distance ≤ r
from the query (according to the search structure's metric).
Neighborhood.NeighborNumber
— TypeNeighborNumber(k::Int) <: SearchType
Search type representing the k
nearest neighbors of the query (or approximate neighbors, depending on the search structure).
Bulk searches
Some packages support higher performance when doing bulk searches (instead of individually calling search
many times).
Neighborhood.bulksearch
— Functionbulksearch(ss, queries, t::SearchType [, skip]; kwargs... ) → vec_of_idxs, vec_of_ds
Same as search
but many searches are done for many input query points.
In this case skip
takes two arguments skip(i, j)
where now j
is simply the index of the query that we are currently searching for (j
is the index in queries
and goes from 1 to length(queries)
).
Neighborhood.bulkisearch
— Functionbulkisearch(ss, queries, t::SearchType [, skip]; kwargs... ) → vec_of_idxs
Same as bulksearch
but return only the indices.
Brute force searches
The BruteForce
"search structure" performs a linear search through its data array, calculating the distance from the query to each data point. This is the slowest possible implementation but can be used to check results from other search structures for correctness. The Neighborhood.bruteforcesearch
function can be used instead without having to create the search structure.
Neighborhood.bruteforcesearch
— Functionbruteforcesearch(data, metric, query, t::SearchType[, skip])
Perform a brute-force search of type t
against data array data
(by calculating the metric for query
and and every point in data
).
Neighborhood.BruteForce
— TypeBruteForce
A "search structure" which simply performs a brute-force search through the entire data array.
Theiler window
Neighborhood.Theiler
— TypeTheiler(w::Int, nidxs = nothing)
Struct that generates skip functions representing a Theiler window of size w ≥ 0
. This is useful when the query of a search is also part of the data used to create the search structure, typical in timeseries analysis. In this case, you do not want to find the query itself as its neighbor, and you typically want to avoid points with indices very close to the query as well. Giving w=0
excludes the query itself as its neighbor, see the boolean expressions below.
Let theiler = Theiler(w)
. Then, theiler
by itself can be used as a skip function in bulksearch
, because theiler(i, j) ≡ abs(i-j) ≤ w
. In addition, if the given argument nidxs
is not nothing
, then theiler(i, j) ≡ abs(i-nidxs[j]) ≤ w
. (useful to give as nidxs
the indices of the queries in the original data)
However theiler
can also be used in single searches. theiler(n)
(with one argument) generates the function i -> abs(i-n) ≤ w
. So theiler(n)
can be given to search
as the skip
argument. Notice that theiler
is an instance of Theiler
, and in summary you'd have to do Theiler(w)(n)
and give that to search
.