Lightning.Invocation.DataclipSearchVectorWorker (Lightning v2.16.8-pre)

View Source

Backfills the full-text search_vector on dataclips rows.

Dataclips are inserted with search_vector left NULL; the vector is built here rather than on the insert path. Building it inline was risky: jsonb_to_tsvector over a large dataclip body is slow and runs inside the transaction that persists the run, so a slow (or failing) vector build could roll back the dataclip insert and lose the run (#4800). Deferring it keeps jsonb_to_tsvector off that hot path. Search is eventually consistent as a result, typically catching up within a minute.

Two database objects support this: safe_jsonb_to_tsvector(regconfig, jsonb), which builds the vector from the dataclip body while tolerating NULL and oversized input, and a partial index over search_vector IS NULL, which keeps locating pending rows cheap as the table grows. Vectors use the english_nostop config to match the read side (Lightning.Invocation), which queries with to_tsquery('english_nostop', ...).

Each run drains pending rows newest-first, in batches up to a per-run budget (batch size and max batches are configurable via Lightning.Config). A run that exhausts its budget leaves backlog behind and enqueues an immediate follow-up ("snowball"); otherwise the minute-ly cron tick keeps pace. The worker shares the search_indexing queue with Lightning.LogLines.SearchVectorWorker; that queue runs at concurrency 2, so the two workers each get a slot and their snowball chains never starve one another. The cron tick and the snowball carry distinct trigger args, so job uniqueness allows one of each to queue but never a duplicate.