Expand hardware drivers to 64 bits (address issue #2331)#3932
Expand hardware drivers to 64 bits (address issue #2331)#3932
Conversation
| rtapi_s64 l; // JE001 AP001 | ||
| struct byte_tag { | ||
| signed char b0; | ||
| signed char b1; | ||
| signed char b2; | ||
| signed char b3; | ||
| } byte; |
There was a problem hiding this comment.
A rtapi_s64 has 8 bytes, not 4.
| /* check for - to + transition */ | ||
| if ((oldpos.byte.b2 & 0xc0) == 0xc0 && (pos.byte.b2 == 0)) | ||
| pos.byte.b3++; | ||
| pos.l += 0x01000000; | ||
| else | ||
| if ((oldpos.byte.b2 == 0) && (pos.byte.b2 & 0xc0) == 0xc0) | ||
| pos.byte.b3--; | ||
| pos.l -= 0x01000000; |
There was a problem hiding this comment.
Byte-testing is a bad option. Better to look at the value as a whole and mask unwanted bits using a binary and (&). Then the compiler can sort it out properly.
| @@ -616,7 +624,7 @@ static void stg_counter_capture(void *arg, long period) | |||
| stg->pos_scale[n] = 1.0; | |||
| } | |||
| /* scale count to make floating point position */ | |||
| *(stg->pos[n]) = *(stg->count[n]) / stg->pos_scale[n]; | |||
| *(stg->pos[n]) = *(stg->count64[n]) / stg->pos_scale[n]; | |||
There was a problem hiding this comment.
Writing a hal pin and reading the same pin just a bit later is very inefficient because the compiler must physically read the value from memory. The volatile modifier prevents caching. However, the value does not change here between the two lines. Therefore, a local intermediary allows the compiler to be more efficient.
This issue is seen all over this and other code, writing and then shortly thereafter again reading same pin.
| /* check for - to + transition */ | ||
| if ((oldpos.byte.b2 & 0xc0) == 0xc0 && (pos.byte.b2 == 0)) |
There was a problem hiding this comment.
What exactly is the counter's width and signedness? Masking two bits that must be set looks strange. If this two's complement, one's complement or another format?
There was a problem hiding this comment.
The value read from the FPGA is an unsigned 16 bit number. So the positive wrap looks like:
0xFFFF
0x0000
0x0001
...
And the reverse wrap is
0x0001
0x0000
0xFFFF
....
There was a problem hiding this comment.
But that means a simple unsigned or two's complement signed 16 bit value. Then the proper procedure would be the same as for the 32 bit case, but simply with a 16 bit sign-extended starting point:
rtapi_s32 oldval = _initval_;
rtapi_s64 bigcounter = _initval_;
...
rtapi_s32 newval = (((rtapi_s32)(signed char)MSB) << 8) + (LSB & 0xff);
rtapi_s32 delta = oldval - newval;
oldval = newval;
bigcounter += delta;The first MSB cast makes the value a signed type and then the next rtapi_s32 cast forces sign-extension.
And the assumption is that it does not wrap within a servo period (should be safe with just over 65 MHz at 1ms servo thread).
There was a problem hiding this comment.
That could be interpreted as signed two's complement or unsigned.
|
There are probably some more issues that need to be addressed, but my first list is probably a start ;-) |
|
BTW, the CI fail is due to a warning from the ini-file merge. That just got a fix which I merged in #3931. Rebasing/FF your code should fix that. |
| int i; | ||
| hal_s32_t temp1 = 0, temp2; | ||
| rtapi_s32 temp1 = 0, temp2, delta_counts; | ||
| hal_float_t vel; |
There was a problem hiding this comment.
Local variables must not use HAL types.
| } | ||
| else | ||
| { | ||
| if(temp2 > (device->encoder[i].last_index_latch + (hal_s32_t)(device->encoder[i].counts_per_rev/4))) |
| { | ||
| device->encoder[i].index_offset -= device->encoder[i].last_index_latch + device->encoder[i].counts_per_rev - temp2; | ||
| } | ||
| else if(temp2 < (device->encoder[i].last_index_latch - (hal_s32_t)(device->encoder[i].counts_per_rev/4))) |
| delta_counts = temp1 - device->encoder[i].old_counts; | ||
| device->encoder[i].old_counts = temp1; | ||
| *(device->encoder[i].rawcounts) += delta_counts; | ||
| *(device->encoder[i].rawcounts64) += delta_counts; |
There was a problem hiding this comment.
This looks fine. You do:
s32_deltacount = s32_counter - s32_saved_counter;
s32_saved_counter = s32_counter;
s64 += s32_deltacount;I assume that the s32_counter is two-s complement. However, it should also work for an unsigned counter because the deltacount should always be result in correct two's complement.
There was a problem hiding this comment.
The only assumption is that you cannot wrap a 32-bit counter in one servo period. But I guess we should be save there?
There was a problem hiding this comment.
I think so. The Mesa counters max out at 20MHz, the internet suggests that 500MHz is possible.
Wrapping 32 bits in 1ms is a few Teraherz.
There was a problem hiding this comment.
At 500MHz you have a comfortable 8.5 seconds before you wrap :-)
| byteindex = ENCTB; | ||
| timebase.byte.b0 = (unsigned char)slot->rd_buf[byteindex++]; | ||
| timebase.byte.b1 = (unsigned char)slot->rd_buf[byteindex]; |
There was a problem hiding this comment.
The timebase variable seems to be a 16-bit value and needs to be read from a stream in an array(rd_buf[]). The data looks to be in little endian (LSB in the lower address). We are running this code on a little endian machine. Then this method seems better (easier for the compiler to optimize):
unsigned short val;
val = slot->rd_buf[byteindex + 1] & 0xff;
val <<= 8;
val += slot->rd_buf[byteindex + 0] & 0xff;
// The byteindex seems to be overwritten subsequently,
// so this line may be dead code
byteindex += 1; // Only increments once in the code...Alternate formulation:
val = (slot->rd_buf[byteindex + 0] & 0xff) + ((slot->rd_buf[byteindex + 1] & 0xff) << 8);
// Same comment...
byteindex += 1; // Only increments once in the code...| pos.byte.b0 = (signed char)slot->rd_buf[byteindex++]; | ||
| pos.byte.b1 = (signed char)slot->rd_buf[byteindex++]; | ||
| pos.byte.b2 = (signed char)slot->rd_buf[byteindex++]; | ||
| pos.byte.b3 = oldpos.byte.b3; |
There was a problem hiding this comment.
Deletion of b3 leaves it undefined and at a random value.
| pos.byte.b0 = (signed char)slot->rd_buf[byteindex++]; | ||
| pos.byte.b1 = (signed char)slot->rd_buf[byteindex++]; | ||
| pos.byte.b2 = (signed char)slot->rd_buf[byteindex++]; |
There was a problem hiding this comment.
You are reading a value from a stream. The better method is to shift the value in byte-by-byte into the larger type instead of using a unonized type.
If the value you stream-read is a two's complement value, then there is a second advantage. You can do proper sign extension in one go by casting the MSB to a signed char and then to the signed larger type. Shifting left will preserve the sign.
|
Note that in all cases I have attempted to touch the existing code as little as possible. I am not sure that fixing long-standing inelegancies is within the scope of this change. |
Inelegancies besides, there are a lot of problems with the code. "Just touching the counter wrap" may actually break stuff in ways you didn't imagine because of the way it is written. The code is (very) fragile in my opinion. I see it as a pest verses cholera choice, you choose how to die ;-) |
|
Given a 16bit hardware counter, suppose i have a reading of 16383, followed by a reading of -16384, how is the code supposed to know if a wrap occured in this situation or not? Obviously that wrapping can only work if the maximum allowed change between sampling is less than half of the total encoder counts. That should probably be documented somewhere, although it is probably unlikely to occur on real hardware. |
That is a very good observation. The code has a problem at half the range disregarding directional information. What about this: // your example
rtapi_s32 oldval = 16383;
rtapi_s32 newval = -16384;
rtapi_s32 delta = newval - oldval; // delta is now -32767 (within range)
// Full 16 bit range forward
rtapi_s32 oldval = 32767;
rtapi_s32 newval = -32768; // moved one position
rtapi_s32 delta = newval - oldval; // delta is now -65535, should be +1
// Full 16 bit range reverse
rtapi_s32 oldval = -32768;
rtapi_s32 newval = 32767; // moved one position
rtapi_s32 delta = newval - oldval; // delta is now +65535, should be -1
// half-way problem correction
if(delta > 32768) // -32768 -> 0 is valid
delta -= 65536;
else if(delta < -32768) // 0 -> -32768 is valid
delta += 65536;It also means that 32767 -> -1 (or 32766 -> -2) will become -32768. |
|
I don't like that bit manipulation stuff. IMHO it would be better to make that stuff explicit. Then we also would not have to assume anything about endianness or how exactly stuff is represented and could deal with arbitrary number of max counts, e.g. 18 bit absolute encoders. lets assume a counter with 4000 counts going from 0 to 3999. midpoint would be 1999. calculate delta of new value and old value, this would be in the range [-3999 to +3999] wrap around occured if new value <= 1999 and old value > 1999 (or vice versa) and abs(delta) >= 2000. in case abs(delta) < 2000 no wraparound. else in case new < old, the delta will be negative, e.g. new = 100, old = 3900, delta = -3800. The correct delta in this case would be +200, so we need to add 4000. in case old < new, the delta will be positive, e.g. old = 3900, new = 100, delta = 3800. The correct delta in this case would be -200, so we need to subtract 4000. that same logic should apply no matter if counts are treated as signed or unsigned. the delta should be calculated in something large enough. |
Bear in mind that this is the PPMC driver, which gets the bytes one at a time through the EPP parallel port.
In the case of the other drivers, the values captured from the hardware are signed, and just doing "counts += delta_counts" works without any conditional execution even through the wrap, with the correctly-sized signed datatypes. |
|
I discussed the wrap-management techniques with Jeff Epler during the work on this PR. "Here's the implementation from pluto, which takes a bit count argument Way back in the day I tested it to my satisfaction. However it has a conditional branch in it which is not ideal. It's better to do like you suggest, computing a delta value .. the trick to do it on an odd size like 24 is to shift the bits up so that the 24 bits are at the top of the integer to find the correct delta value from the new low bits.. I have to credit my time working with MicroPython for teaching me this idea, they need to do it for efficient math on 30 and 31 bit signed values. This program is an input to Double check it for yourself because it's late here on a holiday night and I shouldn't be writing low level C code. But you successfully nerd-sniped me." |
|
hmm. am i missing something? in case of two's complement the sign is convention. in 24 bit signed 8388607 + 1 wraps to -8388608, in unsigned it becomes +8388608, both have representation 10000000 00000000 00000000. extension to 32bit would differ though in the uppermost byte. |
|
The first thing from jeff epler seems to be exactly what I had in mind. The second thing is quite clever, shifting avoids the need for sign extension... I wonder given MicroPython targets microcontrollers if a branch on such architectures is really worse than 3 shifts... The problem with clever code is it is clever ;) and 5 years from now nobody knows anymore. |
Mainly a re-working of the register wrapping to 64 bits.
Note that this does not yet include Hostmot2, though that has already been at least partially patched in 8a3ac3f