Skip to content

WIP: Rtapi cleanup v2#3919

Open
hdiethelm wants to merge 9 commits intoLinuxCNC:masterfrom
hdiethelm:rtapi_cleanup_v2
Open

WIP: Rtapi cleanup v2#3919
hdiethelm wants to merge 9 commits intoLinuxCNC:masterfrom
hdiethelm:rtapi_cleanup_v2

Conversation

@hdiethelm
Copy link
Copy Markdown
Contributor

@hdiethelm hdiethelm commented Apr 9, 2026

Continuation of #3908 reverted in #3918

Target: Move some classes out of the huge uspace_rtapi_app.cc

uspace_rtapi_main: Contains the main function and helpers
uspace_rtapi_app: Contains the RtapiApp class
uspace_posix: Contains the PosixApp class

Other fixes:

  • Don't start master just for exit: 371793c
  • Remove unused rtapi_task::ratio: 3097578
    • Set but only read to force a fixed ratio which probably was not the intention
  • Slightly different lock to avoid needing reference to instance: c8af827
  • All real time implementations are now library's and handled the same way: 62e5ea6
  • No hard-coded library paths e1bbaf4

@hdiethelm
Copy link
Copy Markdown
Contributor Author

Still open:

  • libuspace-xxx.so.0 -> liblinuxcnc-rtapi-xxx.so ?
    • Longest name would be liblinuxcnc-rtap-xenomai-evl.so.0
  • Check if slightly different lock to avoid needing reference to instance: c8af827 is a good idea
  • Review

And

In the Submakefile it reads:

$(call TOOBJSDEPS, $(USPACE_POSIX_SRCS)): EXTRAFLAGS += -pthread -fPIC

Shouldn't that be (not adding to EXTRAFLAGS):

$(call TOOBJSDEPS, $(USPACE_POSIX_SRCS)): EXTRAFLAGS = -pthread -fPIC

Hmm, I don't understand makefiles in depth. I just copied what was already there a few lines below and edited it to match my lib. As much as I understand this, it just adds this flags to the global EXTRAFLAGS in Makefile for one compile command only. I did not see any duplicated flags.

@NTULINUX
Copy link
Copy Markdown
Contributor

NTULINUX commented Apr 11, 2026

Works here on Gentoo, rip, system install, clang and gcc builds, whole shebang!

edit: Have not yet tested with RTAI.

@hdiethelm
Copy link
Copy Markdown
Contributor Author

hdiethelm commented Apr 11, 2026

Thanks for testing!

I renamed the library's to:

liblinuxcnc-uspace-posix.so.0 liblinuxcnc-uspace-xenomai-evl.so.0
liblinuxcnc-uspace-rtai.so.0    liblinuxcnc-uspace-xenomai.so.0

A bit long names, but I think it is fine. But i'm open for other suggestions.

Additionally, I reduced the globals.

I tested all 5 configurations in a VM and they all work. There are two issues but I don't think they are due to this PR:

  • LXRT doesn't start on an isolated CPU. Workaround: RTAPI_CPU_NUMBER=1 linuxcnc
  • RTAI doesn't unload the modules in the correct order and fails to unload all

@hdiethelm
Copy link
Copy Markdown
Contributor Author

@NTULINUX: Do you have a test setup you can share or is it all just manually setup? A series of docker files would be nice, so different OS can be tested if something starts. I messed my VM up slightly by using make install and just after the fact figured out that there is no make uninstall but I was able to restore it.
If there is nothing yet, I can create some, should not be to involved. I often use podman for similar things where I don't want to mess up my host. Podman is similar to docker but it doesn't need root.

@NTULINUX
Copy link
Copy Markdown
Contributor

I have VM images up right now but they're in flux. I'm going to post new VMs with all the right fixes in a bit. Currently tracking this PR here:

#3925

My ebuilds at the moment are broken but we're in the middle of sorting it all out for good. Will share link to new VM soon with everything tied together, cleanly.

case SIGXCPU:
// should not happen - must be handled in RTAPI if enabled
rtapi_print_msg(RTAPI_MSG_ERR, "rtapi_app: BUG: SIGXCPU received - exiting\n");
exit(0);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cannot call exit() in a signal handler. You must call _exit()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also,using printf or puts in signal handlers is not allowed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a reference why? I now understand the difference between the two but I did not find anything why _exit() is needed in signal handlers.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reference. Corrected.


case SIGTERM:
rtapi_print_msg(RTAPI_MSG_ERR, "rtapi_app: SIGTERM - shutting down\n");
exit(0);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exit() -> _exit()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected

kill(getpid(), sig);
break;
}
exit(1);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exit() -> _exit()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected


default: // pretty bad
rtapi_print_msg(RTAPI_MSG_ERR, "rtapi_app: caught signal %d - dumping core\n", sig);
sleep(1); // let syslog drain
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calling sleep() in a signal handler may be problematic because it can be implemented using SIGALRM. Signals in signal is a very bad concept.

Comment on lines +540 to +544
//If called in master mode with exit command, no need to start master
//and exit again
if (args.size() == 1 && args[0] == "exit") {
return 0;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you exit here, then why perform the socket() and bind() calls?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is a bit funny: Bind is used to detect if a master is running. It returns 0 if no master is already running, so a new one is started.

Now I had the case when exiting, everything was initialized again due to master exits automatically as soon as there is no instance any more: Line 417.

Due to that, master was started again just to call exit and "Note: Using POSIX realtime" was shown a second time before exiting when I closed latency test.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But isn't that is just as race prone as any other method?

The concept "the first becomes master" is flawed when you try to exit.

break;
if (i == 0)
srand48(t0.tv_sec ^ t0.tv_usec);
usleep(lrand48() % 100000);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can there be unforeseen interaction with SIGALRM here? I don't think you want to involve signals. Also, wouldn't there be a minimum wait wanted here? A random wait can return close to zero all the time and you only loop three times.

return s;
}

static int get_fifo_path(char *buf, size_t bufsize) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use char* and size_t? Why not have a std::string& as argument?
And why are there two get_fifo_path() functions with different arguments? Can't they be replaced by one?

Copy link
Copy Markdown
Contributor Author

@hdiethelm hdiethelm Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first function gets the path as std::string. The second copies it to a char pointer and checks the size.
That was the original implementation, I found that somewhat convoluted and the return NULL was also not good to read, so I reordered it a bit without changing to much:
https://github.com/LinuxCNC/linuxcnc/blob/master/src/rtapi/uspace_rtapi_app.cc#L503

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the whole cascade of get_fifo_path feels wrong IMO. We should get rid of char buffers as much as we can (everywhere in the code).

pthread_cancel(queue_thread);
pthread_join(queue_thread, nullptr);
rtapi_msg_queue.consume_all([](const message_t &m) {
fputs(m.msg, m.level == RTAPI_MSG_ALL ? stdout : stderr);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why fputs()?

Comment on lines +298 to +355
static int read_number(int fd) {
int r = 0, neg = 1;
char ch;

while (1) {
int res = read(fd, &ch, 1);
if (res != 1)
return -1;
if (ch == '-')
neg = -1;
else if (ch == ' ')
return r * neg;
else
r = 10 * r + ch - '0';
}
}

static std::string read_string(int fd) {
int len = read_number(fd);
if (len < 0)
throw ReadError();
if (!len)
return std::string();
std::string str(len, 0);
if (read(fd, str.data(), len) != len)
throw ReadError();
return str;
}

static std::vector<std::string> read_strings(int fd) {
std::vector<std::string> result;
int count = read_number(fd);
if (count < 0)
return result;
for (int i = 0; i < count; i++) {
result.push_back(read_string(fd));
}
return result;
}

static void write_number(std::string &buf, int num) {
buf = buf + fmt::format("{} ", num);
}

static void write_string(std::string &buf, const std::string &s) {
write_number(buf, s.size());
buf += s;
}

static void write_strings(int fd, const std::vector<std::string> &strings) {
std::string buf;
write_number(buf, strings.size());
for (unsigned int i = 0; i < strings.size(); i++) {
write_string(buf, strings[i]);
}
if (write(fd, buf.data(), buf.size()) != (ssize_t)buf.size())
throw WriteError();
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a socket communication we work with. Why isn't this passed through and handled by the standard sendmsg() and recvmsg() system calls? Wouldn't they be much better at handling this? Or is there and advantage to all this code?

@andypugh
Copy link
Copy Markdown
Collaborator

Some discussion in the Sunday video meet-up has suggested that we should look at incorporating this into the rtapi cleanup:

#918

Comment on lines +479 to +480
len = snprintf(buf + 1, bufsize - 1, "%s", s.c_str());
return len;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then we're back to snprintf(). fmt::format would fit better. Length check can be done where it is relevant.

@hdiethelm
Copy link
Copy Markdown
Contributor Author

I commented and corrected the things I have changed.
Now do we really want do do all of that in this MR?

I always prefer to have refactoring (moving code around) and bug fixing/functional changes as separated as possible. That makes review and testing easier, you just check that the moved code arrived as it was and so avoids having merge conflicts due to the branch staying open for to long.

But I have to admit, I also did some changes that I just was not able to leave it as it was and which simplified moving code due to removed dependency's and of course created a bug doing so.

@BsAtHome
Copy link
Copy Markdown
Contributor

See: https://www.man7.org/linux/man-pages/man7/signal-safety.7.html for the callyou are allowed to make in signal handlers.

@BsAtHome
Copy link
Copy Markdown
Contributor

I commented and corrected the things I have changed. Now do we really want do do all of that in this MR?

Well, that is a good question. You are doing "cleanup" and that would imply refactor and fixes, IMO.

I always prefer to have refactoring (moving code around) and bug fixing/functional changes as separated as possible. That makes review and testing easier, you just check that the moved code arrived as it was and so avoids having merge conflicts due to the branch staying open for to long.

That is a possibility too, but there are soooo (key got stuck) many problems with this code that it is a good question why not hit those two birds with one stone two lines with one keypress.

But I have to admit, I also did some changes that I just was not able to leave it as it was and which simplified moving code due to removed dependency's and of course created a bug doing so.

And I appreciate the changes! They are very necessary. Before we're done it needs to be tested,of course. But when the code gets better structured, then that should also become easier, I hope.

But, if you want to split it, then I think we need to have a very clear split where you move without actual changes and then refactor. My opinion is that moving code also classifies as refactoring and therefore it would be a missed opportunity if not fixed in one go.

@BsAtHome
Copy link
Copy Markdown
Contributor

BTW, I've been wanting to vacuum this code for a long time but have not gotten arround to that point yet. You know, INI-file reader just done, HAL types/access revamp in the pipeline, tcl9 stuff that still needs fixing, build system cleanup, and so on.

(and also got CI's -Werror merged)

@hdiethelm
Copy link
Copy Markdown
Contributor Author

I commented and corrected the things I have changed. Now do we really want do do all of that in this MR?

Well, that is a good question. You are doing "cleanup" and that would imply refactor and fixes, IMO.

Might be I did not specify my (initial) intentions well enough. I just found the rtapi_uspace hard to manage while implementing xenomai and wanted to split it up without changing the existing code if not needed. But that diverged anyway already a bit.

With the new structure, it should also be easier to implement rtapi_udp_sendto() or something similar needed for RTNet.

I always prefer to have refactoring (moving code around) and bug fixing/functional changes as separated as possible. That makes review and testing easier, you just check that the moved code arrived as it was and so avoids having merge conflicts due to the branch staying open for to long.

That is a possibility too, but there are soooo (key got stuck) many problems with this code that it is a good question why not hit those two birds with one stone two lines with one keypress.

But I have to admit, I also did some changes that I just was not able to leave it as it was and which simplified moving code due to removed dependency's and of course created a bug doing so.

And I appreciate the changes! They are very necessary. Before we're done it needs to be tested,of course. But when the code gets better structured, then that should also become easier, I hope.

But, if you want to split it, then I think we need to have a very clear split where you move without actual changes and then refactor. My opinion is that moving code also classifies as refactoring and therefore it would be a missed opportunity if not fixed in one go.

I have to look into it a bit more next week if I find time. But from my point of view it would make sense to split the more involved changes in a new PR or also two.

But there is also always some testing effort behind which I don't know how much you have to do from your side.

Also there is: #3925
Just randomly, I found also that DLSYM halpr_find_comp_by_name https://github.com/hdiethelm/linuxcnc-fork/blob/rtapi_cleanup_v2/src/rtapi/uspace_rtapi_main.cc#L112 does not match the signature (char* vs const char*),

I was now wondering if there is not a better solution for this. Often I use the following pattern to define the function and the matching type in the same header and then from then on only FUN_T* for function pointers. So you naturally change both. By some define magic, it should be possible do do both in one, but then it gets hard to read.

typedef int FUN_T(void *, void *);
int fun(void *, void *);

You know of a good way to test if both definitions match?

It would also be nicer to check at compile time if a .so has the needed main/exit functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants