Skip to content

JAMES-4133 Add support for unicode adddresses as defined in RFC6532.#2728

Draft
arnt wants to merge 18 commits intoapache:masterfrom
arnt:parse-rfc6532-addresses
Draft

JAMES-4133 Add support for unicode adddresses as defined in RFC6532.#2728
arnt wants to merge 18 commits intoapache:masterfrom
arnt:parse-rfc6532-addresses

Conversation

@arnt
Copy link
Copy Markdown

@arnt arnt commented May 22, 2025

This adds Domain tests and does enough to parse unicode email addresses.

However, I'm not quite happy. This needs documentation changes, I wouldn't mind another few Domain tests, and I do want to investigate what's up with Jakarta and संपर्क@डाटामेल.भारत but haven't time to do that this week.

Consider it a WIP.

@chibenwa chibenwa changed the title WIP: Add support for unicode adddresses as defined in RFC6532. JAMES-4133 Add support for unicode adddresses as defined in RFC6532. May 23, 2025
@chibenwa chibenwa marked this pull request as draft May 23, 2025 07:15
@chibenwa
Copy link
Copy Markdown
Contributor

chibenwa commented May 23, 2025

Hello @arnt

Thanks for this major contribution.

I did update #2724 with your changes and can happily say the SMTP-IN stack in James is RFC-6532 compliant :-)

However, I did face major issues as any modification (header for instance, but also the defensive copy done in the memory mail queue) causes the loss of the UTF-8 characters. This clearly see like a lack of support for RFC-6532 from jakarta.mail.internet.MimeMessage which James (sadly...) still heavily relies on. (quick checks tends to confirm this dependency do not support RFC-6532).

BTW I did turn the PR into a draft and add the ticket number into the title in order to make this clear.

Best regards,

Benoit TELLIER

chibenwa and others added 18 commits June 17, 2025 07:13
This changes the handling of some noncompliant IMAP clients, which James
would not tolerate before.
Application-layer code often assumes that addresses can be compared using
String.equalsIgnoreCase(), and some also uses regular expressions or
substring matching on addresses. This commit provides addresses to
upper-layer code in their UTF8 form, so that kind of code continues to
work.

This might also have security implications: If upper-layer code can be
confused about whether two addresses are the same, that sounds as if an
attacker could exploit the confusion. This change should block the
possibility.
RFC 6532 says we SHOULD do this and JAMES is generally very careful, so I
did this as well.
This should not make a difference, but a sufficiently inventive attacker
might combine it with something to confuse some code...
@arnt
Copy link
Copy Markdown
Author

arnt commented Apr 27, 2026

@chibenwa could you possibly look at this now and fix the pom.xml, for example?

Most of it is gated on SMTPUTF8, UTF8=ACCEPT etc, but 1062017 changes James in the opposite case, namely when James' user does NOT enable support and a client tries to use foo@xn--bar anyway. I had a headache there. The code I wrote tries to preserve properties people generally think addresses have (e.g. string comparability) even when a fun-loving SMTP client connects to a James server and uses features the James server author didn't know about.

Feel free to add additional comments and push it on the #2724 PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants