Skip to content

Commit 705d2ba

Browse files
committed
docs(README): A bit of styling for ghatem
1 parent 0d9fd37 commit 705d2ba

1 file changed

Lines changed: 15 additions & 15 deletions

File tree

entries/ghatem/README.md

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
1-
author: ghatem
1+
# Georges Hatem
22

3-
**requirements:**
3+
## Requirements
44
- mORMot2 library
55
- 64-bit compilation
66

7-
**hardware + environment:**
7+
## Hardware + Environment
88
host:
99
- Dell XPS 15 (9560, 2017)
1010
- OS: ArchLinux
@@ -20,13 +20,13 @@ VM (VirtualBox):
2020
note about the hash:
2121
run with DEBUG compiler directive to write from stream directly to file, otherwise the hash will not match.
2222

23-
# baseline
23+
## Baseline
2424
the initial implementation (the Delphi baseline found in /baseline) aimed to get a correct output, regardless of performance:
2525
"Make it work, then make it work better".
2626
It turns out even the baseline caused some trouble, namely the `Ceil` implementation was yielding different results between FPC and Delphi (and different results between Delphi Win32/Win64).
2727
After the input of several peers including gcarreno, abouchez and paweld (thanks!), this last detail was ironed out, and the baseline yielded a matching hash.
2828

29-
# single-threaded attempt (2024-04-03)
29+
## Single-Threaded Attempt (2024-04-03)
3030

3131
in this first attempt, the implementation is broken down into 3 major steps:
3232
1. read the input file
@@ -37,12 +37,12 @@ a key point:
3737
- the reading / writing (steps 1 an 3) will be done on the main thread.
3838
- the processing (step 2) is where a future submission will attempt to parallelize the work done.
3939

40-
## 1. read the input file
40+
## 1. Read The Input File
4141

42-
#### v1. file stream
42+
#### v1. File Stream
4343
In the baseline implementation, a file stream is used to read line by line, which is far from optimal.
4444

45-
#### v2. memory-mapped file
45+
#### v2. Memory-Mapped File
4646
An improvement was to read the whole file into memory in one shot, using memory mapping.
4747
In this implementation, I use `GetFileSize` and `CreateFileMapping`, procedure found online (need to find URL).
4848
First thing to note is, the usable memory by a Win32 process is limited to ~1.5-2 GB of RAM. Exceeding this limit yields an out-of-memory exception, so we must compile for Win64.
@@ -54,12 +54,12 @@ Some issues with this implementation (see unit FileReader.pas):
5454
- if we wanted to move forward with this implementation, we would need to call `CreateFileMapping` in 4 or 5 batches, which would take 1.7 x 5 ~= 8.5 seconds just to read the data.
5555
- attempt aborted, see v3.
5656

57-
#### v3. memory-mapped file, provided by mORMot2
57+
#### v3. Memory-Mapped File, Provided by `mORMot2`
5858
A v3 attempt at reading the file was using a ready-made implementation of file memory-mapping, provided by synopse/mORMot, big thanks @abouchez!
5959
The function returns a pAnsiChar and the size as Int64 of the mapped data. Performance-wise, it all happens in under 0.1 seconds, but now we must delve into pointers.
6060

6161

62-
## 2. process the file
62+
## 2. Process the File
6363

6464
Well, at a glance this is straightforward:
6565
- look for new-line chars to delimit each line, split it to extract StationName / Temperature.
@@ -68,7 +68,7 @@ Well, at a glance this is straightforward:
6868

6969
A few optimizations done here, to the best of my knowledge:
7070

71-
#### for each line, iterate backwards
71+
#### For Each line, Iterate Backwards
7272
`Length(StationName) > Length(Temperature)`, so for each line, better look for the `;` starting from the end.
7373
Given the below input:
7474
```
@@ -78,11 +78,11 @@ Rock Hill;-54.3
7878
the 3 last characters will be mandatorily present, so we can skip them while iterating.
7979
I tried unrolling the loop over the last 2-3 characters that must be checked, but measuring it, it showed to be slower, don't know why.
8080

81-
#### extract the strings using SetString
81+
#### Extract the Strings Using `SetString`
8282
manual string concatenation and splitting proved to be very slow.
8383
Using `SetString` yielded a noticeable improvement. Remaining in the realm of pointers would probably be much faster, but I haven't ventured there (yet, maybe if time is available).
8484

85-
#### decode temperature into numerical value
85+
#### Decode Temperature Into Numerical Value
8686
First attempt was to use `StrToFloat`, which was pretty catastrophic. Using `Val` was still slow, but definitely a step-up. `Val` with SmallInt proved to be faster than with Double, even though there's extra operations to be done.
8787
So now we need to get rid of the `.` character.
8888

@@ -91,14 +91,14 @@ Again, string functions being very slow, replicating the last character at lengt
9191
Finally, assuming temperatures in the range `-100` and `+100`, with 1 decimal figure, there should be 2000 different temperatures possible.
9292
Instead of decoding the same temperature values using `Val`, do it once, store it in a TDictionary (TemperatureString -> TemperatureSmallInt). There were I believe 1998 different temperature values, so we only call `Val` 1998 times instead of 1 billion times. Over an input size of 100M, the gain was 4-5 seconds (total 28s -> 23s)
9393

94-
#### accumulate data into a dictionary of records
94+
#### Accumulate Data Into a Dictionary of Records
9595
- the records are packed, with minimal size
9696
- the dictionary maps StationName -> Pointer to record, to avoid passing around full records
9797
- records are pre-allocated in an array of 45,000, instead of allocating them on-the-fly.
9898
- when a station is not found in the dictionary, we point to the next element in the records-array.
9999
- with an input of size 100M, this accumulation step takes a considerable amount of time (9 seconds out of 23 total). I haven't identified yet if it is the `dict.Add` that takes time, the `dict.TryGetValue`, or just generally the dictionary hash collisions. Even though the dictionary is pre-allocated with a capacity of 45,000, but that did not seem to improve much. I also tried the dictionary implementation of Spring4D, but also no improvements.
100100

101-
## 3. output the results
101+
## 3. Output the Results
102102
Since I started using pointers (pAnsiChar), getting a matching hash was a bit of a pickle:
103103
Some Unicode characters were messed up in their display, or messed up in their ordering.
104104
Eventually, the ordering issue was resolved by using `AnsiStrings.CompareStr` instead of `SysUtils.CompareStr`. This step will clearly remain single-threaded, but takes 0.15 seconds for all 45,000 stations, so it is not a big deal.

0 commit comments

Comments
 (0)