I’ve just found the answer to my main question myself. I didn’t find anything in ISO-32000-1 or the ISO-32000-2 draft, but studying the Acrobat JavaScript reference, I found the cCharset
parameter that is available for the submitForm()
method. That parameter defines:
The encoding for the values submitted. String values are utf-8,
utf-16, Shift-JIS, BigFive, GBK, and UHC. If not passed, the current
Acrobat behavior applies. For XML-based formats, utf-8 is used. For
other formats, Acrobat tries to find the best host encoding for the
values being submitted. XFDF submission ignores this value and always
uses utf-8.
In other words: in my case GBK was used because it fits best to submit Chinese characters. However, one could force UTF-8 by using the submitForm()
JavaScript method using the appropriate value.
Based on this question, I have asked the ISO committee to fix this problem in ISO-32000-2.
As a result, an extra possible entry was added to the table entitled Additional entries specific to a submit-form action in section 12.7.6.2:
CharSet: string
(Optional; inheritable) Possible values include: utf-8, utf-16,
Shift-JIS, BigFive, GBK, or UHC.
Starting with PDF 2.0, this problem will no longer exist.
Update: my suggestion made ISO 32000-2 (aka PDF 2.0):
The CharSet key doesn’t exist in ISO 32000-1; it was introduced in ISO 32000-2.