Grails
  1. Grails
  2. GRAILS-7322

encodeAsHTML oblivious to page charset

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: 1.3.6
    • Fix Version/s: 1.3.7
    • Component/s: View technologies
    • Labels:
      None

      Description

      When rendering HTML (or XML) a small set of markup characters must be escaped (examples: &,',",<,>). In a "text/html" view the obvious choice is the encodeAsHTML codec for providing markup escaping. (Cf. GRAILS-2606, GRAILS-6682)

      In addition to escaping markup the encodeAsHTML codec converts a random choice of additional characters to HTML character entities. I say "random" because the content type of a view has two dimensions, MIME type and charset. The need for additional conversion depends on the charset. If the charset of the view is UTF-8 there is no need to convert any additional characters. (So whether GRAILS-3321 is a bug depends on the charset.)

      For instance, given UTF-8 content in a Nordic language like Swedish, the only effect of the additional conversion is to add some 30% content length. The content is valid before and after the conversion.

      Given content in, say, Chinese, the chance of finding additional characters to convert is small.

      The choice of codec should reflect the two dimensions of content type. One way is a super intelligen encodeAsHTML codec that takes the current charset into account. Considering the number of possible charsets the codec would turn into a monster. A better way is to provide codecs for the most common combinations of MIME type and charset and extend the convention for specifying default codec. For instance, in Config.groovy:

      grails.views.default.codec = 'html/utf8'
      or
      grails.views.default.codec = 'html/iso8859-1

      would select the new codecs encodeAsHtmlUtf8 or encodeAsHtmlIso88591, respectively.

      Since all strings in the Java world originally are Unicode there is an intrinsic problem converting into a smaller charset. For instance, there is no meaningful conversion of Japanese into ISO-8859-1. I'm not sure what the current encodeAsHTML does about that.

        Activity

        Hide
        Hakan Soderstrom added a comment -

        Even I think it should have been minor, my mistake!

        Show
        Hakan Soderstrom added a comment - Even I think it should have been minor, my mistake!
        Hide
        Marc Palmer added a comment -

        This no longer appears to be a problem, as in Spring 3.0.5 (which supplies the codec) that is used in Grails 1.3.7 and higher, characters without an explicit HTML entity are not encoded at all.

        Which means you must have the correct encoding set now - but who doesn't use utf-8 now?!

        Show
        Marc Palmer added a comment - This no longer appears to be a problem, as in Spring 3.0.5 (which supplies the codec) that is used in Grails 1.3.7 and higher, characters without an explicit HTML entity are not encoded at all. Which means you must have the correct encoding set now - but who doesn't use utf-8 now?!

          People

          • Assignee:
            Marc Palmer
            Reporter:
            Hakan Soderstrom
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development