Skip to content

Expand hardware drivers to 64 bits (address issue #2331)#3932

Draft
andypugh wants to merge 2 commits intomasterfrom
andypugh/64_bit_drivers
Draft

Expand hardware drivers to 64 bits (address issue #2331)#3932
andypugh wants to merge 2 commits intomasterfrom
andypugh/64_bit_drivers

Conversation

@andypugh
Copy link
Copy Markdown
Collaborator

@andypugh andypugh commented Apr 12, 2026

Mainly a re-working of the register wrapping to 64 bits.
Note that this does not yet include Hostmot2, though that has already been at least partially patched in 8a3ac3f

@andypugh andypugh marked this pull request as draft April 12, 2026 22:44
@andypugh andypugh requested a review from BsAtHome April 12, 2026 22:44
Comment on lines +1081 to 1087
rtapi_s64 l; // JE001 AP001
struct byte_tag {
signed char b0;
signed char b1;
signed char b2;
signed char b3;
} byte;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A rtapi_s64 has 8 bytes, not 4.

Comment on lines 1115 to +1120
/* check for - to + transition */
if ((oldpos.byte.b2 & 0xc0) == 0xc0 && (pos.byte.b2 == 0))
pos.byte.b3++;
pos.l += 0x01000000;
else
if ((oldpos.byte.b2 == 0) && (pos.byte.b2 & 0xc0) == 0xc0)
pos.byte.b3--;
pos.l -= 0x01000000;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Byte-testing is a bad option. Better to look at the value as a whole and mask unwanted bits using a binary and (&). Then the compiler can sort it out properly.

Comment on lines 617 to +627
@@ -616,7 +624,7 @@ static void stg_counter_capture(void *arg, long period)
stg->pos_scale[n] = 1.0;
}
/* scale count to make floating point position */
*(stg->pos[n]) = *(stg->count[n]) / stg->pos_scale[n];
*(stg->pos[n]) = *(stg->count64[n]) / stg->pos_scale[n];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing a hal pin and reading the same pin just a bit later is very inefficient because the compiler must physically read the value from memory. The volatile modifier prevents caching. However, the value does not change here between the two lines. Therefore, a local intermediary allows the compiler to be more efficient.

This issue is seen all over this and other code, writing and then shortly thereafter again reading same pin.

Comment on lines 1115 to 1116
/* check for - to + transition */
if ((oldpos.byte.b2 & 0xc0) == 0xc0 && (pos.byte.b2 == 0))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly is the counter's width and signedness? Masking two bits that must be set looks strange. If this two's complement, one's complement or another format?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value read from the FPGA is an unsigned 16 bit number. So the positive wrap looks like:
0xFFFF
0x0000
0x0001
...
And the reverse wrap is
0x0001
0x0000
0xFFFF
....

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that means a simple unsigned or two's complement signed 16 bit value. Then the proper procedure would be the same as for the 32 bit case, but simply with a 16 bit sign-extended starting point:

rtapi_s32 oldval = _initval_;
rtapi_s64 bigcounter = _initval_;
...
rtapi_s32 newval = (((rtapi_s32)(signed char)MSB) << 8) + (LSB & 0xff);
rtapi_s32 delta = oldval - newval;
oldval = newval;
bigcounter += delta;

The first MSB cast makes the value a signed type and then the next rtapi_s32 cast forces sign-extension.

And the assumption is that it does not wrap within a servo period (should be safe with just over 65 MHz at 1ms servo thread).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could be interpreted as signed two's complement or unsigned.

@BsAtHome
Copy link
Copy Markdown
Contributor

There are probably some more issues that need to be addressed, but my first list is probably a start ;-)

@BsAtHome
Copy link
Copy Markdown
Contributor

BTW, the CI fail is due to a warning from the ini-file merge. That just got a fix which I merged in #3931. Rebasing/FF your code should fix that.

int i;
hal_s32_t temp1 = 0, temp2;
rtapi_s32 temp1 = 0, temp2, delta_counts;
hal_float_t vel;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Local variables must not use HAL types.

}
else
{
if(temp2 > (device->encoder[i].last_index_latch + (hal_s32_t)(device->encoder[i].counts_per_rev/4)))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong type in cast

{
device->encoder[i].index_offset -= device->encoder[i].last_index_latch + device->encoder[i].counts_per_rev - temp2;
}
else if(temp2 < (device->encoder[i].last_index_latch - (hal_s32_t)(device->encoder[i].counts_per_rev/4)))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong type in cast

Comment on lines +1537 to +1540
delta_counts = temp1 - device->encoder[i].old_counts;
device->encoder[i].old_counts = temp1;
*(device->encoder[i].rawcounts) += delta_counts;
*(device->encoder[i].rawcounts64) += delta_counts;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine. You do:

s32_deltacount = s32_counter - s32_saved_counter;
s32_saved_counter = s32_counter;
s64 += s32_deltacount;

I assume that the s32_counter is two-s complement. However, it should also work for an unsigned counter because the deltacount should always be result in correct two's complement.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only assumption is that you cannot wrap a 32-bit counter in one servo period. But I guess we should be save there?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. The Mesa counters max out at 20MHz, the internet suggests that 500MHz is possible.
Wrapping 32 bits in 1ms is a few Teraherz.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At 500MHz you have a comfortable 8.5 seconds before you wrap :-)

Comment on lines 1102 to 1104
byteindex = ENCTB;
timebase.byte.b0 = (unsigned char)slot->rd_buf[byteindex++];
timebase.byte.b1 = (unsigned char)slot->rd_buf[byteindex];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timebase variable seems to be a 16-bit value and needs to be read from a stream in an array(rd_buf[]). The data looks to be in little endian (LSB in the lower address). We are running this code on a little endian machine. Then this method seems better (easier for the compiler to optimize):

unsigned short val;
val = slot->rd_buf[byteindex + 1] & 0xff;
val <<= 8;
val += slot->rd_buf[byteindex + 0] & 0xff;
// The byteindex seems to be overwritten subsequently,
// so this line may be dead code
byteindex += 1; // Only increments once in the code...

Alternate formulation:

val = (slot->rd_buf[byteindex + 0] & 0xff) + ((slot->rd_buf[byteindex + 1] & 0xff) << 8);
// Same comment...
byteindex += 1; // Only increments once in the code...

pos.byte.b0 = (signed char)slot->rd_buf[byteindex++];
pos.byte.b1 = (signed char)slot->rd_buf[byteindex++];
pos.byte.b2 = (signed char)slot->rd_buf[byteindex++];
pos.byte.b3 = oldpos.byte.b3;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deletion of b3 leaves it undefined and at a random value.

Comment on lines 1112 to 1114
pos.byte.b0 = (signed char)slot->rd_buf[byteindex++];
pos.byte.b1 = (signed char)slot->rd_buf[byteindex++];
pos.byte.b2 = (signed char)slot->rd_buf[byteindex++];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are reading a value from a stream. The better method is to shift the value in byte-by-byte into the larger type instead of using a unonized type.

If the value you stream-read is a two's complement value, then there is a second advantage. You can do proper sign extension in one go by casting the MSB to a signed char and then to the signed larger type. Shifting left will preserve the sign.

@andypugh
Copy link
Copy Markdown
Collaborator Author

Note that in all cases I have attempted to touch the existing code as little as possible. I am not sure that fixing long-standing inelegancies is within the scope of this change.

@BsAtHome
Copy link
Copy Markdown
Contributor

Note that in all cases I have attempted to touch the existing code as little as possible. I am not sure that fixing long-standing inelegancies is within the scope of this change.

Inelegancies besides, there are a lot of problems with the code. "Just touching the counter wrap" may actually break stuff in ways you didn't imagine because of the way it is written. The code is (very) fragile in my opinion. I see it as a pest verses cholera choice, you choose how to die ;-)

@rmu75
Copy link
Copy Markdown
Collaborator

rmu75 commented Apr 13, 2026

Given a 16bit hardware counter, suppose i have a reading of 16383, followed by a reading of -16384, how is the code supposed to know if a wrap occured in this situation or not? Obviously that wrapping can only work if the maximum allowed change between sampling is less than half of the total encoder counts. That should probably be documented somewhere, although it is probably unlikely to occur on real hardware.

@BsAtHome
Copy link
Copy Markdown
Contributor

Given a 16bit hardware counter, suppose i have a reading of 16383, followed by a reading of -16384, how is the code supposed to know if a wrap occured in this situation or not?

That is a very good observation. The code has a problem at half the range disregarding directional information.

What about this:

// your example
rtapi_s32 oldval = 16383;
rtapi_s32 newval = -16384;
rtapi_s32 delta = newval - oldval; // delta is now -32767 (within range)

// Full 16 bit range forward
rtapi_s32 oldval = 32767;
rtapi_s32 newval = -32768; // moved one position
rtapi_s32 delta = newval - oldval; // delta is now -65535, should be +1

// Full 16 bit range reverse
rtapi_s32 oldval = -32768;
rtapi_s32 newval = 32767; // moved one position
rtapi_s32 delta = newval - oldval; // delta is now +65535, should be -1

// half-way problem correction
if(delta > 32768)  // -32768 -> 0 is valid
    delta -= 65536;
else if(delta < -32768) // 0 -> -32768 is valid
    delta += 65536;

It also means that 32767 -> -1 (or 32766 -> -2) will become -32768.

@rmu75
Copy link
Copy Markdown
Collaborator

rmu75 commented Apr 13, 2026

I don't like that bit manipulation stuff. IMHO it would be better to make that stuff explicit. Then we also would not have to assume anything about endianness or how exactly stuff is represented and could deal with arbitrary number of max counts, e.g. 18 bit absolute encoders.

lets assume a counter with 4000 counts going from 0 to 3999. midpoint would be 1999.

calculate delta of new value and old value, this would be in the range [-3999 to +3999]

wrap around occured if new value <= 1999 and old value > 1999 (or vice versa) and abs(delta) >= 2000.

in case abs(delta) < 2000 no wraparound. else

in case new < old, the delta will be negative, e.g. new = 100, old = 3900, delta = -3800. The correct delta in this case would be +200, so we need to add 4000.

in case old < new, the delta will be positive, e.g. old = 3900, new = 100, delta = 3800. The correct delta in this case would be -200, so we need to subtract 4000.

that same logic should apply no matter if counts are treated as signed or unsigned. the delta should be calculated in something large enough.

@andypugh
Copy link
Copy Markdown
Collaborator Author

andypugh commented Apr 13, 2026

I don't like that bit manipulation stuff. IMHO it would be better to make that stuff explicit.

Bear in mind that this is the PPMC driver, which gets the bytes one at a time through the EPP parallel port.
Endian-ness isn't an issue in that regard.

And the value is definitely unsigned 16 bit, this has been confirmed with Jon @ Pico Systems.
Sorry, I misremembered, it is 24 bit, unsigned. But definitely unsigned.

In the case of the other drivers, the values captured from the hardware are signed, and just doing "counts += delta_counts" works without any conditional execution even through the wrap, with the correctly-sized signed datatypes.

@andypugh
Copy link
Copy Markdown
Collaborator Author

I discussed the wrap-management techniques with Jeff Epler during the work on this PR.

"Here's the implementation from pluto, which takes a bit count argument

static inline int64_t extend(int64_t old, int newlow, int nbits) {
    int64_t mask = (1<<nbits) - 1;
    int64_t maxdelta = mask / 2;
    int64_t oldhigh = old & ~mask;
    int64_t oldlow = old & mask;
    int64_t candidate1, candidate2;

    candidate1 = oldhigh | newlow;
    if(oldlow < newlow) candidate2 = candidate1 - (1<<nbits);
    else                candidate2 = candidate1 + (1<<nbits);

    if (llabs(old-candidate1) > maxdelta)
        return candidate2;
    else
        return candidate1;
}

Way back in the day I tested it to my satisfaction. However it has a conditional branch in it which is not ideal.

It's better to do like you suggest, computing a delta value .. the trick to do it on an odd size like 24 is to shift the bits up so that the 24 bits are at the top of the integer to find the correct delta value from the new low bits.. I have to credit my time working with MicroPython for teaching me this idea, they need to do it for efficient math on 30 and 31 bit signed values.

#include <stdint.h>
#include <limits.h>
#include <assert.h>

#ifdef __CPROVER__
#define assume(x) __CPROVER_assume((x))
#define assert(x) __CPROVER_assert((x), #x)
#else
#define assume(x)
#include <assert.h>
#endif

int32_t nondet_s32();
int64_t nondet_s64();

static inline int64_t extend(int64_t old, int newlow, int nbits) {
    int nshift = 64 - nbits;
    uint64_t oldlow_shifted = ((uint64_t)old << nshift);
    uint64_t newlow_shifted = ((uint64_t)newlow << nshift);
    int64_t diff_shifted = newlow_shifted - oldlow_shifted;
    return (uint64_t)old + (diff_shifted >> nshift);
}

int main() {
    int nbits = nondet_s32();
    assume(nbits > 8 && nbits < 32);

    int mask = UINT_MAX >> (32 - nbits);
    int64_t old = nondet_s64();

    int delta = nondet_s32() & (mask >> 1); // Can handle deltas up to this big

    int64_t new1 = (uint64_t)old + (uint64_t)delta;
    int64_t res1 = extend(old, new1 & mask, nbits);
    assert(new1 == res1);

    int64_t new2 = (uint64_t)old - (uint64_t)delta;
    int64_t res2 = extend(old, new2 & mask, nbits);
    assert(new2 == res2);
}

This program is an input to cbmc which proves that the assertions hold for ALL values that match the assume() clauses. And this version is branch free and free of undefined behavior due to signed integer overflow: cbmc --signed-overflow-check extend2.c

Double check it for yourself because it's late here on a holiday night and I shouldn't be writing low level C code. But you successfully nerd-sniped me."

@rmu75
Copy link
Copy Markdown
Collaborator

rmu75 commented Apr 13, 2026

hmm. am i missing something? in case of two's complement the sign is convention. in 24 bit signed 8388607 + 1 wraps to -8388608, in unsigned it becomes +8388608, both have representation 10000000 00000000 00000000. extension to 32bit would differ though in the uppermost byte.

@rmu75
Copy link
Copy Markdown
Collaborator

rmu75 commented Apr 13, 2026

The first thing from jeff epler seems to be exactly what I had in mind. The second thing is quite clever, shifting avoids the need for sign extension...

I wonder given MicroPython targets microcontrollers if a branch on such architectures is really worse than 3 shifts... The problem with clever code is it is clever ;) and 5 years from now nobody knows anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants