Filype Pereira Web developer πŸ’₯

Why is an emoji string length 2?

I am trying to understand how emojis work and how does any textarea in my browser handle a seemingly 2 chars represented as one?

For example:

"πŸ‘".length
// -> 2

More examples here: https://jsbin.com/zazexenigi/edit?js,console

The explanation to this is interesting. Javascript uses UTF-16 (source) to manage strings.

In UTF-16 there are 1,112,064 possible characters. Each character uses code points to be represented. In UTF-16 one code-point use two bytes (16 bits) to be saved. This means that with one code point you can have only 65536 different characters.

This means some characters has to be represented with two code points.

String.length() returns the number of code units in the string, not the number of characters.

MDN explains quite well String.length()

This property returns the number of code units in the string. UTF-16, the string format used by JavaScript, uses a single 16-bit code unit to represent the most common characters, but needs to use two code units for less commonly-used characters, so it’s possible for the value returned by length to not match the actual number of characters in the string.

RSS specification HTML code inside RSS feed

Is HTML code inside the <description> tag complaint in RSS 2.0?

The RSS 2.0 specification says that you can include HTML in the description element so long as you properly encode the markup.

There are two ways to do this:

  1. Convert tags to escaped HTML entities:

     <description>this is &lt;b&gt;bold&lt;/b&gt;</description>
    
  2. Wrap the description content within a CDATA section:

     <description><![CDATA[this is <b>bold</b>]]></description>