You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: entries/ghatem-fpc/README.md
+28Lines changed: 28 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -247,3 +247,31 @@ Another trial with various hash functions, a simple modulus vs. a slightly more
247
247
Can be tested with the HASHMULT build option
248
248
249
249
Finally, it seems choosing a dictionary size that is a prime number is also recommended: shaves 1 second out of 20 on my PC.
250
+
251
+
## v.6 (2024-05-04)
252
+
253
+
As of the latest results executed by Paweld, there are two main bottlenecks throttling the entire implementation, according to CallGrind and KCacheGrind:
254
+
- function ExtractLineData, 23% of total cost, of which 9% is due to `fpc_stackcheck`
255
+
- the hash lookup function, at 40% of total cost
256
+
257
+
Currently, the hash lookup is done on an array of records. Increasing the array size causes slowness, and reducing it causes further collisions.
258
+
Will try to see how to reduce collisions (increase array size), all while minimizing the cost of cache misses.
259
+
260
+
Edit:
261
+
The goal is to both:
262
+
- minimize collisions on the hashes (keys) by having a good hash function, but also increase the size of the keys storage
263
+
- minimize the size of the array of packed records
264
+
265
+
The idea:
266
+
- the dictionary will no longer point to a PStationData pointer, but rather to an index between 0 and StationCount, where the record is stored in the array.
267
+
- -> data about the same station will be stored at the same index for all threads' data-arrays
268
+
- -> names will also be stored at that same index upon first encounter, and is common to all threads
269
+
- no locking needs to occur when the key is already found, since there is no multiple-write occurring
270
+
- the data-arrays are pre-allocated, and a atomic-counter will be incremented to know where the next element will be stored.
271
+
272
+
Thinking again, this is likely similar to the approach mentioned by @synopse in one of his comments.
273
+
274
+
For the ExtractLineData, three ideas to try implementing:
275
+
- avoid using a function, to get rid of the cost of stack checking
276
+
- reduce branching, I think it should be possible to go from 3 if-statements, to only 1
277
+
- unroll the loop (although I had tried this in the past, did not show any improvements)
0 commit comments