ChatGPT解决这个技术问题 Extra ChatGPT

HTML-encoding lost when attribute read from input field

I’m using JavaScript to pull a value out from a hidden field and display it in a textbox. The value in the hidden field is encoded.

For example,

<input id='hiddenId' type='hidden' value='chalk &amp; cheese' />

gets pulled into

<input type='text' value='chalk &amp; cheese' />

via some jQuery to get the value from the hidden field (it’s at this point that I lose the encoding):

$('#hiddenId').attr('value')

The problem is that when I read chalk &amp; cheese from the hidden field, JavaScript seems to lose the encoding. I do not want the value to be chalk & cheese. I want the literal amp; to be retained.

Is there a JavaScript library or a jQuery method that will HTML-encode a string?

Can you show the Javascript you are using?
have added how I get value from hidden field
Do NOT use the innerHTML method (the jQuery .html() method uses innerHTML), as on some (I've only tested Chrome) browsers, this won't escape quotes, so if you were to put your value into an attribute value, you would end up with an XSS vulnerability.
in what context is chalk and cheese ever used together 0_o
@d-_-b when comparing two items. example. they are as different as chalk and cheese ;)

C
Chirag Soni

EDIT: This answer was posted a long ago, and the htmlDecode function introduced a XSS vulnerability. It has been modified changing the temporary element from a div to a textarea reducing the XSS chance. But nowadays, I would encourage you to use the DOMParser API as suggested in other anwswer.

I use these functions:

function htmlEncode(value){
  // Create a in-memory element, set its inner text (which is automatically encoded)
  // Then grab the encoded contents back out. The element never exists on the DOM.
  return $('<textarea/>').text(value).html();
}

function htmlDecode(value){
  return $('<textarea/>').html(value).text();
}

Basically a textarea element is created in memory, but it is never appended to the document.

On the htmlEncode function I set the innerText of the element, and retrieve the encoded innerHTML; on the htmlDecode function I set the innerHTML value of the element and the innerText is retrieved.

Check a running example here.


This works for most scenarios, but this implementation of htmlDecode will eliminate any extra whitespace. So for some values of "input", input != htmlDecode(htmlEncode(input)). This was a problem for us in some scenarios. For example, if input = "

\t Hi \n There

", a roundtrip encode/decode will yield "

Hi There

". Most of the time this is okay, but sometimes it isn't. :)
Depends on the browser, on Firefox it is including the whitespaces, new lines... On IE it strips all.
C
Community

The jQuery trick doesn't encode quote marks and in IE it will strip your whitespace.

Based on the escape templatetag in Django, which I guess is heavily used/tested already, I made this function which does what's needed.

It's arguably simpler (and possibly faster) than any of the workarounds for the whitespace-stripping issue - and it encodes quote marks, which is essential if you're going to use the result inside an attribute value for example.

function htmlEscape(str) {
    return str
        .replace(/&/g, '&amp;')
        .replace(/"/g, '&quot;')
        .replace(/'/g, '&#39;')
        .replace(/</g, '&lt;')
        .replace(/>/g, '&gt;');
}

// I needed the opposite function today, so adding here too:
function htmlUnescape(str){
    return str
        .replace(/&quot;/g, '"')
        .replace(/&#39;/g, "'")
        .replace(/&lt;/g, '<')
        .replace(/&gt;/g, '>')
        .replace(/&amp;/g, '&');
}

Update 2013-06-17:
In the search for the fastest escaping I have found this implementation of a replaceAll method:
http://dumpsite.com/forum/index.php?topic=4.msg29#msg29
(also referenced here: Fastest method to replace all instances of a character in a string)
Some performance results here:
http://jsperf.com/htmlencoderegex/25

It gives identical result string to the builtin replace chains above. I'd be very happy if someone could explain why it's faster!?

Update 2015-03-04:
I just noticed that AngularJS are using exactly the method above:
https://github.com/angular/angular.js/blob/v1.3.14/src/ngSanitize/sanitize.js#L435

They add a couple of refinements - they appear to be handling an obscure Unicode issue as well as converting all non-alphanumeric characters to entities. I was under the impression the latter was not necessary as long as you have an UTF8 charset specified for your document.

I will note that (4 years later) Django still does not do either of these things, so I'm not sure how important they are:
https://github.com/django/django/blob/1.8b1/django/utils/html.py#L44

Update 2016-04-06:
You may also wish to escape forward-slash /. This is not required for correct HTML encoding, however it is recommended by OWASP as an anti-XSS safety measure. (thanks to @JNF for suggesting this in comments)

        .replace(/\//g, '&#x2F;');

T
ThinkingStiff

Here's a non-jQuery version that is considerably faster than both the jQuery .html() version and the .replace() version. This preserves all whitespace, but like the jQuery version, doesn't handle quotes.

function htmlEncode( html ) {
    return document.createElement( 'a' ).appendChild( 
        document.createTextNode( html ) ).parentNode.innerHTML;
};

Speed: http://jsperf.com/htmlencoderegex/17

https://i.stack.imgur.com/NI3c4.png

Output:

https://i.stack.imgur.com/zE07Z.png

Script:

function htmlEncode( html ) {
    return document.createElement( 'a' ).appendChild( 
        document.createTextNode( html ) ).parentNode.innerHTML;
};

function htmlDecode( html ) {
    var a = document.createElement( 'a' ); a.innerHTML = html;
    return a.textContent;
};

document.getElementById( 'text' ).value = htmlEncode( document.getElementById( 'hidden' ).value );

//sanity check
var html = '<div>   &amp; hello</div>';
document.getElementById( 'same' ).textContent = 
      'html === htmlDecode( htmlEncode( html ) ): ' 
    + ( html === htmlDecode( htmlEncode( html ) ) );

HTML:

<input id="hidden" type="hidden" value="chalk    &amp; cheese" />
<input id="text" value="" />
<div id="same"></div>

This begs the question: why isn't it a global function in JS already?!
C
Community

I know this is an old one, but I wanted to post a variation of the accepted answer that will work in IE without removing lines:

function multiLineHtmlEncode(value) {
    var lines = value.split(/\r\n|\r|\n/);
    for (var i = 0; i < lines.length; i++) {
        lines[i] = htmlEncode(lines[i]);
    }
    return lines.join('\r\n');
}

function htmlEncode(value) {
    return $('<div/>').text(value).html();
} 

T
TJ VanToll

Underscore provides _.escape() and _.unescape() methods that do this.

> _.unescape( "chalk &amp; cheese" );
  "chalk & cheese"

> _.escape( "chalk & cheese" );
  "chalk &amp; cheese"

l
leepowers

Good answer. Note that if the value to encode is undefined or null with jQuery 1.4.2 you might get errors such as:

jQuery("<div/>").text(value).html is not a function

OR

Uncaught TypeError: Object has no method 'html'

The solution is to modify the function to check for an actual value:

function htmlEncode(value){ 
    if (value) {
        return jQuery('<div/>').text(value).html(); 
    } else {
        return '';
    }
}

jQuery('<div/>').text(value || '').html()
t
tdog

For those who prefer plain javascript, here is the method I have used successfully:

function escapeHTML (str)
{
    var div = document.createElement('div');
    var text = document.createTextNode(str);
    div.appendChild(text);
    return div.innerHTML;
}

J
JAAulde

FWIW, the encoding is not being lost. The encoding is used by the markup parser (browser) during the page load. Once the source is read and parsed and the browser has the DOM loaded into memory, the encoding has been parsed into what it represents. So by the time your JS is execute to read anything in memory, the char it gets is what the encoding represented.

I may be operating strictly on semantics here, but I wanted you to understand the purpose of encoding. The word "lost" makes it sound like something isn't working like it should.


D
Dave Brown

Faster without Jquery. You can encode every character in your string:

function encode(e){return e.replace(/[^]/g,function(e){return"&#"+e.charCodeAt(0)+";"})}

Or just target the main characters to worry about (&, inebreaks, <, >, " and ') like:

function encode(r){ return r.replace(/[\x26\x0A\<>'"]/g,function(r){return"&#"+r.charCodeAt(0)+";"}) } test.value=encode('Encode HTML entities!\n\n"Safe" escape