You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: entries/ghatem/README.md
+15-15Lines changed: 15 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,10 +1,10 @@
1
-
author: ghatem
1
+
# Georges Hatem
2
2
3
-
**requirements:**
3
+
## Requirements
4
4
- mORMot2 library
5
5
- 64-bit compilation
6
6
7
-
**hardware + environment:**
7
+
## Hardware + Environment
8
8
host:
9
9
- Dell XPS 15 (9560, 2017)
10
10
- OS: ArchLinux
@@ -20,13 +20,13 @@ VM (VirtualBox):
20
20
note about the hash:
21
21
run with DEBUG compiler directive to write from stream directly to file, otherwise the hash will not match.
22
22
23
-
#baseline
23
+
## Baseline
24
24
the initial implementation (the Delphi baseline found in /baseline) aimed to get a correct output, regardless of performance:
25
25
"Make it work, then make it work better".
26
26
It turns out even the baseline caused some trouble, namely the `Ceil` implementation was yielding different results between FPC and Delphi (and different results between Delphi Win32/Win64).
27
27
After the input of several peers including gcarreno, abouchez and paweld (thanks!), this last detail was ironed out, and the baseline yielded a matching hash.
28
28
29
-
#single-threaded attempt (2024-04-03)
29
+
## Single-Threaded Attempt (2024-04-03)
30
30
31
31
in this first attempt, the implementation is broken down into 3 major steps:
32
32
1. read the input file
@@ -37,12 +37,12 @@ a key point:
37
37
- the reading / writing (steps 1 an 3) will be done on the main thread.
38
38
- the processing (step 2) is where a future submission will attempt to parallelize the work done.
39
39
40
-
## 1. read the input file
40
+
## 1. Read The Input File
41
41
42
-
#### v1. file stream
42
+
#### v1. File Stream
43
43
In the baseline implementation, a file stream is used to read line by line, which is far from optimal.
44
44
45
-
#### v2. memory-mapped file
45
+
#### v2. Memory-Mapped File
46
46
An improvement was to read the whole file into memory in one shot, using memory mapping.
47
47
In this implementation, I use `GetFileSize` and `CreateFileMapping`, procedure found online (need to find URL).
48
48
First thing to note is, the usable memory by a Win32 process is limited to ~1.5-2 GB of RAM. Exceeding this limit yields an out-of-memory exception, so we must compile for Win64.
@@ -54,12 +54,12 @@ Some issues with this implementation (see unit FileReader.pas):
54
54
- if we wanted to move forward with this implementation, we would need to call `CreateFileMapping` in 4 or 5 batches, which would take 1.7 x 5 ~= 8.5 seconds just to read the data.
55
55
- attempt aborted, see v3.
56
56
57
-
#### v3. memory-mapped file, provided by mORMot2
57
+
#### v3. Memory-Mapped File, Provided by `mORMot2`
58
58
A v3 attempt at reading the file was using a ready-made implementation of file memory-mapping, provided by synopse/mORMot, big thanks @abouchez!
59
59
The function returns a pAnsiChar and the size as Int64 of the mapped data. Performance-wise, it all happens in under 0.1 seconds, but now we must delve into pointers.
60
60
61
61
62
-
## 2. process the file
62
+
## 2. Process the File
63
63
64
64
Well, at a glance this is straightforward:
65
65
- look for new-line chars to delimit each line, split it to extract StationName / Temperature.
@@ -68,7 +68,7 @@ Well, at a glance this is straightforward:
68
68
69
69
A few optimizations done here, to the best of my knowledge:
70
70
71
-
#### for each line, iterate backwards
71
+
#### For Each line, Iterate Backwards
72
72
`Length(StationName) > Length(Temperature)`, so for each line, better look for the `;` starting from the end.
73
73
Given the below input:
74
74
```
@@ -78,11 +78,11 @@ Rock Hill;-54.3
78
78
the 3 last characters will be mandatorily present, so we can skip them while iterating.
79
79
I tried unrolling the loop over the last 2-3 characters that must be checked, but measuring it, it showed to be slower, don't know why.
80
80
81
-
#### extract the strings using SetString
81
+
#### Extract the Strings Using `SetString`
82
82
manual string concatenation and splitting proved to be very slow.
83
83
Using `SetString` yielded a noticeable improvement. Remaining in the realm of pointers would probably be much faster, but I haven't ventured there (yet, maybe if time is available).
84
84
85
-
#### decode temperature into numerical value
85
+
#### Decode Temperature Into Numerical Value
86
86
First attempt was to use `StrToFloat`, which was pretty catastrophic. Using `Val` was still slow, but definitely a step-up. `Val` with SmallInt proved to be faster than with Double, even though there's extra operations to be done.
87
87
So now we need to get rid of the `.` character.
88
88
@@ -91,14 +91,14 @@ Again, string functions being very slow, replicating the last character at lengt
91
91
Finally, assuming temperatures in the range `-100` and `+100`, with 1 decimal figure, there should be 2000 different temperatures possible.
92
92
Instead of decoding the same temperature values using `Val`, do it once, store it in a TDictionary (TemperatureString -> TemperatureSmallInt). There were I believe 1998 different temperature values, so we only call `Val` 1998 times instead of 1 billion times. Over an input size of 100M, the gain was 4-5 seconds (total 28s -> 23s)
93
93
94
-
#### accumulate data into a dictionary of records
94
+
#### Accumulate Data Into a Dictionary of Records
95
95
- the records are packed, with minimal size
96
96
- the dictionary maps StationName -> Pointer to record, to avoid passing around full records
97
97
- records are pre-allocated in an array of 45,000, instead of allocating them on-the-fly.
98
98
- when a station is not found in the dictionary, we point to the next element in the records-array.
99
99
- with an input of size 100M, this accumulation step takes a considerable amount of time (9 seconds out of 23 total). I haven't identified yet if it is the `dict.Add` that takes time, the `dict.TryGetValue`, or just generally the dictionary hash collisions. Even though the dictionary is pre-allocated with a capacity of 45,000, but that did not seem to improve much. I also tried the dictionary implementation of Spring4D, but also no improvements.
100
100
101
-
## 3. output the results
101
+
## 3. Output the Results
102
102
Since I started using pointers (pAnsiChar), getting a matching hash was a bit of a pickle:
103
103
Some Unicode characters were messed up in their display, or messed up in their ordering.
104
104
Eventually, the ordering issue was resolved by using `AnsiStrings.CompareStr` instead of `SysUtils.CompareStr`. This step will clearly remain single-threaded, but takes 0.15 seconds for all 45,000 stations, so it is not a big deal.
0 commit comments