Public API
Neighborhood.jl is a Julia package that provides a unified interface for doing neighbor searches in Julia. This interface is described in this page.
Search Structures
Neighborhood.searchstructure — Functionsearchstructure(S, data, metric; kwargs...) → ssCreate a search structure ss of type S (e.g. KDTree, BKTree, VPTree etc.) based on the given data and metric. The data types and supported metric types are package-specific, but typical choices are subtypes of <:Metric from Distances.jl. Some common metrics are re-exported by Neighborhood.jl.
All currently supported search structures are:
BruteForce KDTree
The following functions are defined for search structures:
Neighborhood.datatype — Functiondatatype(::Type{S}) :: TypeGet the type of the data points (arguments to the metric function) of a search struture of type S.
Neighborhood.getmetric — Functiongetmetric(ss)Get the metric function used by the search structure ss.
Search functions
Neighborhood.search — Functionsearch(ss, query, t::SearchType [, skip]; kwargs... ) → idxs, dsPerform a neighbor search in the search structure ss for the given query with search type t (see SearchType). Return the indices of the neighbors (in the original data) and the distances from the query.
Optional skip function takes as input the index of the found neighbor (in the original data) skip(i) and returns true if this neighbor should be skipped.
Package-specific keywords are possible.
Neighborhood.isearch — Functionisearch(args...; kwargs... ) → idxsSame as search but only return the neighbor indices.
Neighborhood.inrange — Functioninrange(ss, query, r::Real [, skip]; kwargs...)search for WithinRange(r) search type.
Neighborhood.knn — Functionknn(ss, query, k::Int [, skip]; kwargs...)search for NeighborNumber(k) search type.
Search types
Neighborhood.SearchType — TypeSupertype of all possible search types of the Neighborhood.jl common API.
Neighborhood.WithinRange — TypeWithinRange(r::Real) <: SearchTypeSearch type representing all neighbors with distance ≤ r from the query (according to the search structure's metric).
Neighborhood.NeighborNumber — TypeNeighborNumber(k::Int) <: SearchTypeSearch type representing the k nearest neighbors of the query (or approximate neighbors, depending on the search structure).
Bulk searches
Some packages support higher performance when doing bulk searches (instead of individually calling search many times).
Neighborhood.bulksearch — Functionbulksearch(ss, queries, t::SearchType [, skip]; kwargs... ) → vec_of_idxs, vec_of_dsSame as search but many searches are done for many input query points.
In this case skip takes two arguments skip(i, j) where now j is simply the index of the query that we are currently searching for (j is the index in queries and goes from 1 to length(queries)).
Neighborhood.bulkisearch — Functionbulkisearch(ss, queries, t::SearchType [, skip]; kwargs... ) → vec_of_idxsSame as bulksearch but return only the indices.
Brute force searches
The BruteForce "search structure" performs a linear search through its data array, calculating the distance from the query to each data point. This is the slowest possible implementation but can be used to check results from other search structures for correctness. The Neighborhood.bruteforcesearch function can be used instead without having to create the search structure.
Neighborhood.bruteforcesearch — Functionbruteforcesearch(data, metric, query, t::SearchType[, skip])Perform a brute-force search of type t against data array data (by calculating the metric for query and and every point in data).
Neighborhood.BruteForce — TypeBruteForceA "search structure" which simply performs a brute-force search through the entire data array.
Theiler window
Neighborhood.Theiler — TypeTheiler(w::Int, nidxs = nothing)Struct that generates skip functions representing a Theiler window of size w ≥ 0. This is useful when the query of a search is also part of the data used to create the search structure, typical in timeseries analysis. In this case, you do not want to find the query itself as its neighbor, and you typically want to avoid points with indices very close to the query as well. Giving w=0 excludes the query itself as its neighbor, see the boolean expressions below.
Let theiler = Theiler(w). Then, theiler by itself can be used as a skip function in bulksearch, because theiler(i, j) ≡ abs(i-j) ≤ w. In addition, if the given argument nidxs is not nothing, then theiler(i, j) ≡ abs(i-nidxs[j]) ≤ w. (useful to give as nidxs the indices of the queries in the original data)
However theiler can also be used in single searches. theiler(n) (with one argument) generates the function i -> abs(i-n) ≤ w. So theiler(n) can be given to search as the skip argument. Notice that theiler is an instance of Theiler, and in summary you'd have to do Theiler(w)(n) and give that to search.