Data encoding when submitting a PDF form using AcroForm technology

I’ve just found the answer to my main question myself. I didn’t find anything in ISO-32000-1 or the ISO-32000-2 draft, but studying the Acrobat JavaScript reference, I found the cCharset parameter that is available for the submitForm() method. That parameter defines:

The encoding for the values submitted. String values are utf-8,
utf-16, Shift-JIS, BigFive, GBK, and UHC. If not passed, the current
Acrobat behavior applies. For XML-based formats, utf-8 is used. For
other formats, Acrobat tries to find the best host encoding for the
values being submitted. XFDF submission ignores this value and always
uses utf-8.

In other words: in my case GBK was used because it fits best to submit Chinese characters. However, one could force UTF-8 by using the submitForm() JavaScript method using the appropriate value.

Based on this question, I have asked the ISO committee to fix this problem in ISO-32000-2.
As a result, an extra possible entry was added to the table entitled Additional entries specific to a submit-form action in section 12.7.6.2:

CharSet: string

(Optional; inheritable) Possible values include: utf-8, utf-16,
Shift-JIS, BigFive, GBK, or UHC.

Starting with PDF 2.0, this problem will no longer exist.

Update: my suggestion made ISO 32000-2 (aka PDF 2.0):

enter image description here

The CharSet key doesn’t exist in ISO 32000-1; it was introduced in ISO 32000-2.

Leave a Comment