blob: c16308c50ed44810db68b25f24fe97ce56fa6d61 (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
|
## Getting started
You need a copy of https://kaikki.org/frwiktionary/raw-wiktextract-data.jsonl.gz
## Initial import speed
Problem: current import speed is too slow.
Current import speed with encoding/json: (1780000-990000)/(22:37:09-20:46:10)
790000/((22*3600+37*60+9)-(20*3600+46*60+10))
119 inserts per second
What if we:
1) use goccy/go-json for decoding?
(40000)/(46*60+9)-(40*60+25) = 116 inserts per second
Looks like the database is our bottleneck.
2) parallelize?
3) other performance optimizations?
- https://stackoverflow.com/questions/1711631/improve-insert-per-second-performance-of-sqlite
- wrap all inserts in one transaction:
410000/(29-13) = 25,625 inserts per second!! Much, much better!
(using plain old encoding/json instead of goccy: about 20,000 per second)
Decided on using goccy to unmarhsal, and doing everything in one SQLite transaction.
|