summaryrefslogtreecommitdiff
path: root/README.md
blob: c16308c50ed44810db68b25f24fe97ce56fa6d61 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
## Getting started

You need a copy of https://kaikki.org/frwiktionary/raw-wiktextract-data.jsonl.gz

## Initial import speed

Problem: current import speed is too slow.

Current import speed with encoding/json:        (1780000-990000)/(22:37:09-20:46:10)
                                                790000/((22*3600+37*60+9)-(20*3600+46*60+10))
                                                119 inserts per second

What if we:

1) use goccy/go-json for decoding?
    (40000)/(46*60+9)-(40*60+25) = 116 inserts per second
    Looks like the database is our bottleneck.
2) parallelize?
3) other performance optimizations?
    - https://stackoverflow.com/questions/1711631/improve-insert-per-second-performance-of-sqlite
    - wrap all inserts in one transaction:
        410000/(29-13) = 25,625 inserts per second!! Much, much better!
        (using plain old encoding/json instead of goccy: about 20,000 per second)

Decided on using goccy to unmarhsal, and doing everything in one SQLite transaction.