Usage examples

Usage examples

Building an IVFADC index

Building an index can be performed using an outer constructor

julia> using IVFADC

julia> data = rand(10, 100);  # 100 points, 10 dimensions

julia> ivfadc = IVFADCIndex(data, kc=5, k=8, m=2)
IVFADCIndex, naive coarse quantizer, 6-byte encoding (4 + 1×2), 100 Float64 vectors

The coarse quantizer, used in the first level coarse neighbor search, can be specified using the coarse_quantizer keyword argument

julia> ivfadc = IVFADCIndex(data, kc=5, k=8, m=2, coarse_quantizer=:hnsw)  # fast!
IVFADCIndex, HNSW coarse quantizer, 6-byte encoding (4 + 1×2), 100 Float64 vectors

julia> ivfadc = IVFADCIndex(data, kc=5, k=8, m=2, coarse_quantizer=:naive) # simple
IVFADCIndex, naive coarse quantizer, 6-byte encoding (4 + 1×2), 100 Float64 vectors

The HNSW coarse quantizer is recommended is 'many' clusters are being used and coarse search dominates search time as opposed to in-cluster search.

Searching the index

Searching into the index is done with knn_search for multiple queries

julia> points = [rand(10) for _ in 1:3];

julia> idxs, dists = knn_search(ivfadc, points, 5)
(Array{UInt32,1}[[0x0000002b, 0x0000004c, 0x00000025, 0x0000005a, 0x00000028], [0x00000052, 0x00000063, 0x00000014, 0x00000037, 0x00000027], [0x00000011, 0x00000029, 0x00000010, 0x0000001c, 0x0000004a]], Array{Float64,1}[[1.16005, 1.33942, 1.37424, 1.38669, 1.53475], [0.536787, 0.607866, 0.680027, 0.683599, 0.711491], [1.43329, 1.43329, 1.4974, 1.4974, 1.4974]])

as well as single queries

julia> point = data[:, 55];

julia> idxs, dists = knn_search(ivfadc, point, 5)
(UInt32[0x00000036, 0x0000003a, 0x0000003e, 0x00000044, 0x00000057], [0.926899, 1.04755, 1.05666, 1.05666, 1.27094])

Internally, the IVFADC index uses 0-based indexing; to retrieve the actual 1-based neighbor indexes that correspond to indexes in data, a simple transform has to be performed:

julia> int_idxs = Int.(idxs) .+ 1
5-element Array{Int64,1}:
 55
 59
 63
 69
 88

Results may vary depending on how many clusters are being used to search into, option configurable through the keyword argument w of knn_search

julia> knn_search(ivfadc, point, 5, w=10)  # search into 10 clusters
(UInt32[0x00000036, 0x0000003a, 0x0000003e, 0x00000044, 0x00000057], [0.926899, 1.04755, 1.05666, 1.05666, 1.27094])

Updating the index

Adding and removing points to and from the index is done with push!, pop!, pushfirst! and popfirst! methods. As they imply, point can be added (and quantized) or removed (and reconstructed) at the beginning or end of the index. In practice, this implies updating the point indexes in the index, if the case.

julia> for i in 1:5
           push!(ivfadc, rand(10))
       end

julia> ivfadc
IVFADCIndex, naive coarse quantizer, 6-byte encoding (4 + 1×2), 105 Float64 vectors

julia> pop!(ivfadc)
10-element Array{Float64,1}:
 0.4278865893757469
 0.48768402317231363
 0.5669178941501075
 0.4967803874304485
 0.2512302354041912
 0.5013444839659885
 0.9092147732548748
 0.10947415853711279
 0.3032012403226358
 0.17239422778439623

julia> pushfirst!(ivfadc, 0.1*collect(1:10))

julia> popfirst!(ivfadc)
10-element Array{Float64,1}:
 0.30934330887804384
 0.3870051821983087
 0.21908583060157394
 0.5562288311655249
 0.3008518663887265
 0.7256142520362612
 0.4263482326184592
 0.7082323645729623
 0.7540262753194262
 0.5544077840927342
Note

When adding a new point, its index will always be the number of already existing points. When deleting points, the indexes of all points are updated so that they are consecutive.

The function delete_from_index! removes the points indicated in the vector of indexes.

julia> delete_from_index!(ivfadc, [10, 21, 32]);

julia> ivfadc
IVFADCIndex, naive coarse quantizer, 6-byte encoding (4 + 1×2), 101 Float64 vectors
Note

Deleting from the index is a slow operation as the all indexes of the points contained need to be properly updated depending on the positions of the points that are being deleted.

Limits and advanced usage of the index

It is not possible to add more points than the maximum value of the indexing type

julia> ivfadc = IVFADCIndex(rand(2,256), kc=2, k=16, m=1, index_type=UInt8)
IVFADCIndex, naive coarse quantizer, 2-byte encoding (1 + 1×1), 256 Float64 vectors

Adding a new point to an index that already has 256 points (which is the maximum for the UInt8) throws an error

julia> push!(ivfadc, rand(2))
ERROR: AssertionError: Cannot index, exceeding index capacity of 256 points

It is however possible to add the point after deleting another one

julia> popfirst!(ivfadc)
2-element Array{Float64,1}:
 0.18229975483475844
 0.04730535031025618

julia> push!(ivfadc, rand(2))

julia> ivfadc
IVFADCIndex, naive coarse quantizer, 2-byte encoding (1 + 1×1), 256 Float64 vectors

In the example above, the index is being used as a FIFO buffer where the first point is removed and a new one is appended to the buffer.