When rendering HTML (or XML) a small set of markup characters must be escaped (examples: &,',",<,>. In a "text/html" view the obvious choice is the encodeAsHTML codec for providing markup escaping. (Cf.
In addition to escaping markup the encodeAsHTML codec converts a random choice of additional characters to HTML character entities. I say "random" because the content type of a view has two dimensions, MIME type and charset. The need for additional conversion depends on the charset. If the charset of the view is UTF-8 there is no need to convert any additional characters. (So whether
GRAILS-3321 is a bug depends on the charset.)
For instance, given UTF-8 content in a Nordic language like Swedish, the only effect of the additional conversion is to add some 30% content length. The content is valid before and after the conversion.
Given content in, say, Chinese, the chance of finding additional characters to convert is small.
The choice of codec should reflect the two dimensions of content type. One way is a super intelligen encodeAsHTML codec that takes the current charset into account. Considering the number of possible charsets the codec would turn into a monster. A better way is to provide codecs for the most common combinations of MIME type and charset and extend the convention for specifying default codec. For instance, in Config.groovy:
grails.views.default.codec = 'html/utf8'
grails.views.default.codec = 'html/iso8859-1
would select the new codecs encodeAsHtmlUtf8 or encodeAsHtmlIso88591, respectively.
Since all strings in the Java world originally are Unicode there is an intrinsic problem converting into a smaller charset. For instance, there is no meaningful conversion of Japanese into ISO-8859-1. I'm not sure what the current encodeAsHTML does about that.