Usage examples
Building an IVFADC index
Building an index can be performed using an outer constructor
julia> using IVFADC
julia> data = rand(10, 100); # 100 points, 10 dimensions
julia> ivfadc = IVFADCIndex(data, kc=5, k=8, m=2)
IVFADCIndex, naive coarse quantizer, 6-byte encoding (4 + 1×2), 100 Float64 vectors
The coarse quantizer, used in the first level coarse neighbor search, can be specified using the coarse_quantizer
keyword argument
julia> ivfadc = IVFADCIndex(data, kc=5, k=8, m=2, coarse_quantizer=:hnsw) # fast!
IVFADCIndex, HNSW coarse quantizer, 6-byte encoding (4 + 1×2), 100 Float64 vectors
julia> ivfadc = IVFADCIndex(data, kc=5, k=8, m=2, coarse_quantizer=:naive) # simple
IVFADCIndex, naive coarse quantizer, 6-byte encoding (4 + 1×2), 100 Float64 vectors
The HNSW coarse quantizer is recommended is 'many' clusters are being used and coarse search dominates search time as opposed to in-cluster search.
Searching the index
Searching into the index is done with knn_search
for multiple queries
julia> points = [rand(10) for _ in 1:3];
julia> idxs, dists = knn_search(ivfadc, points, 5)
(Array{UInt32,1}[[0x0000002b, 0x0000004c, 0x00000025, 0x0000005a, 0x00000028], [0x00000052, 0x00000063, 0x00000014, 0x00000037, 0x00000027], [0x00000011, 0x00000029, 0x00000010, 0x0000001c, 0x0000004a]], Array{Float64,1}[[1.16005, 1.33942, 1.37424, 1.38669, 1.53475], [0.536787, 0.607866, 0.680027, 0.683599, 0.711491], [1.43329, 1.43329, 1.4974, 1.4974, 1.4974]])
as well as single queries
julia> point = data[:, 55];
julia> idxs, dists = knn_search(ivfadc, point, 5)
(UInt32[0x00000036, 0x0000003a, 0x0000003e, 0x00000044, 0x00000057], [0.926899, 1.04755, 1.05666, 1.05666, 1.27094])
Internally, the IVFADC index uses 0-based indexing; to retrieve the actual 1-based neighbor indexes that correspond to indexes in data
, a simple transform has to be performed:
julia> int_idxs = Int.(idxs) .+ 1
5-element Array{Int64,1}:
55
59
63
69
88
Results may vary depending on how many clusters are being used to search into, option configurable through the keyword argument w
of knn_search
julia> knn_search(ivfadc, point, 5, w=10) # search into 10 clusters
(UInt32[0x00000036, 0x0000003a, 0x0000003e, 0x00000044, 0x00000057], [0.926899, 1.04755, 1.05666, 1.05666, 1.27094])
Updating the index
Adding and removing points to and from the index is done with push!
, pop!
, pushfirst!
and popfirst!
methods. As they imply, point can be added (and quantized) or removed (and reconstructed) at the beginning or end of the index. In practice, this implies updating the point indexes in the index, if the case.
julia> for i in 1:5
push!(ivfadc, rand(10))
end
julia> ivfadc
IVFADCIndex, naive coarse quantizer, 6-byte encoding (4 + 1×2), 105 Float64 vectors
julia> pop!(ivfadc)
10-element Array{Float64,1}:
0.4278865893757469
0.48768402317231363
0.5669178941501075
0.4967803874304485
0.2512302354041912
0.5013444839659885
0.9092147732548748
0.10947415853711279
0.3032012403226358
0.17239422778439623
julia> pushfirst!(ivfadc, 0.1*collect(1:10))
julia> popfirst!(ivfadc)
10-element Array{Float64,1}:
0.30934330887804384
0.3870051821983087
0.21908583060157394
0.5562288311655249
0.3008518663887265
0.7256142520362612
0.4263482326184592
0.7082323645729623
0.7540262753194262
0.5544077840927342
When adding a new point, its index will always be the number of already existing points. When deleting points, the indexes of all points are updated so that they are consecutive.
The function delete_from_index!
removes the points indicated in the vector of indexes.
julia> delete_from_index!(ivfadc, [10, 21, 32]);
julia> ivfadc
IVFADCIndex, naive coarse quantizer, 6-byte encoding (4 + 1×2), 101 Float64 vectors
Deleting from the index is a slow operation as the all indexes of the points contained need to be properly updated depending on the positions of the points that are being deleted.
Limits and advanced usage of the index
It is not possible to add more points than the maximum value of the indexing type
julia> ivfadc = IVFADCIndex(rand(2,256), kc=2, k=16, m=1, index_type=UInt8)
IVFADCIndex, naive coarse quantizer, 2-byte encoding (1 + 1×1), 256 Float64 vectors
Adding a new point to an index that already has 256 points (which is the maximum for the UInt8
) throws an error
julia> push!(ivfadc, rand(2))
ERROR: AssertionError: Cannot index, exceeding index capacity of 256 points
It is however possible to add the point after deleting another one
julia> popfirst!(ivfadc)
2-element Array{Float64,1}:
0.18229975483475844
0.04730535031025618
julia> push!(ivfadc, rand(2))
julia> ivfadc
IVFADCIndex, naive coarse quantizer, 2-byte encoding (1 + 1×1), 256 Float64 vectors
In the example above, the index is being used as a FIFO buffer where the first point is removed and a new one is appended to the buffer.