One of the responses to a question I asked yesterday suggested that I should make sure my database can handle UTF-8 characters correctly. How I can do this with MySQL?
CHARACTER SETs
; 5.1.24 messed with the collation of German sharp-s (ß), which was rectified by adding another collation in 5.1.62 (arguably making things worse); 5.5.3 filled out utf8 with the new charset utf8mb4.
utf8
. It only supports up to 3-byte characters. The correct character set you should use in MySQL is utf8mb4
.
Update:
Short answer - You should almost always be using the utf8mb4
charset and utf8mb4_unicode_ci
collation.
To alter database:
ALTER DATABASE dbname CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
See:
Aaron's comment on this answer How to make MySQL handle UTF-8 properly
What's the difference between utf8_general_ci and utf8_unicode_ci
Conversion guide: https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-conversion.html
Original Answer:
MySQL 4.1 and above has a default character set of UTF-8. You can verify this in your my.cnf
file, remember to set both client and server (default-character-set
and character-set-server
).
If you have existing data that you wish to convert to UTF-8, dump your database, and import it back as UTF-8 making sure:
use SET NAMES utf8 before you query/insert into the database
use DEFAULT CHARSET=utf8 when creating new tables
at this point your MySQL client and server should be in UTF-8 (see my.cnf). remember any languages you use (such as PHP) must be UTF-8 as well. Some versions of PHP will use their own MySQL client library, which may not be UTF-8 aware.
If you do want to migrate existing data remember to backup first! Lots of weird choping of data can happen when things don't go as planned!
Some resources:
complete UTF-8 migration (cdbaby.com)
article on UTF-8 readiness of php functions (note some of this information is outdated)
To make this 'permanent', in my.cnf
:
[client]
default-character-set=utf8
[mysqld]
character-set-server = utf8
To check, go to the client and show some variables:
SHOW VARIABLES LIKE 'character_set%';
Verify that they're all utf8
, except ..._filesystem
, which should be binary
and ..._dir
, that points somewhere in the MySQL installation.
create table my_name(field_name varchar(25) character set utf8);
utf8
is not "full" UTF-8.
set character_set_client=utf8;
to set a new value
MySQL 4.1 and above has a default character set that it calls utf8
but which is actually only a subset of UTF-8 (allows only three-byte characters and smaller).
Use utf8mb4
as your charset if you want "full" UTF-8.
utf8
doesn't include chars like emoticons. utf8mb4
does. Check this for more info on how to update : mathiasbynens.be/notes/mysql-utf8mb4
The short answer: Use utf8mb4
in 4 places:
The bytes in your client are utf8, not latin1/cp1251/etc.
SET NAMES utf8mb4 or something equivalent when establishing the client's connection to MySQL
CHARACTER SET utf8mb4 on all tables/columns -- except columns that are strictly ascii/hex/country_code/zip_code/etc.
if you are outputting to HTML. (Yes the spelling is different here.)
The above links provide the "detailed canonical answer is required to address all the concerns". -- There is a space limit on this forum.
Edit
In addition to CHARACTER SET utf8mb4
containing "all" the world's characters, COLLATION utf8mb4_unicode_520_ci
is arguable the 'best all-around' collation to use. (There are also Turkish, Spanish, etc, collations for those who want the nuances in those languages.)
The charset is a property of the database (default) and the table. You can have a look (MySQL commands):
show create database foo;
> CREATE DATABASE `foo`.`foo` /*!40100 DEFAULT CHARACTER SET latin1 */
show create table foo.bar;
> lots of stuff ending with
> ) ENGINE=InnoDB AUTO_INCREMENT=252 DEFAULT CHARSET=latin1
In other words; it's quite easy to check your database charset or change it:
ALTER TABLE `foo`.`bar` CHARACTER SET utf8;
utf8
is not "full" UTF-8.
I followed Javier's solution, but I added some different lines in my.cnf:
[myslqd]
skip-character-set-client-handshake
collation_server=utf8_unicode_ci
character_set_server=utf8
I found this idea here: http://dev.mysql.com/doc/refman/5.0/en/charset-server.html in the first/only user comment on the bottom of the page. He mentions that skip-character-set-client-handshake has some importance.
skip-character-set-client-handshake
was the key.
To change the character set encoding to UTF-8 for the database itself, type the following command at the mysql> prompt. USE ALTER DATABASE
.. Replace DBNAME with the database name:
ALTER DATABASE DBNAME CHARACTER SET utf8 COLLATE utf8_general_ci;
This is a duplicate of this question How to convert an entire MySQL database characterset and collation to UTF-8?
Set your database collation
to UTF-8
then apply table collation
to database default.
Your answer is you can configure by MySql Settings. In My Answer may be something gone out of context but this is also know is help for you.
how to configure Character Set
and Collation
.
For applications that store data using the default MySQL character set and collation (latin1, latin1_swedish_ci), no special configuration should be needed. If applications require data storage using a different character set or collation, you can configure character set information several ways:
Specify character settings per database. For example, applications that use one database might require utf8, whereas applications that use another database might require sjis.
Specify character settings at server startup. This causes the server to use the given settings for all applications that do not make other arrangements.
Specify character settings at configuration time, if you build MySQL from source. This causes the server to use the given settings for all applications, without having to specify them at server startup.
The examples shown here for your question to set utf8 character set , here also set collation for more helpful(utf8_general_ci
collation`).
Specify character settings per database
CREATE DATABASE new_db
DEFAULT CHARACTER SET utf8
DEFAULT COLLATE utf8_general_ci;
Specify character settings at server startup
[mysqld]
character-set-server=utf8
collation-server=utf8_general_ci
Specify character settings at MySQL configuration time
shell> cmake . -DDEFAULT_CHARSET=utf8 \
-DDEFAULT_COLLATION=utf8_general_ci
To see the values of the character set and collation system variables that apply to your connection, use these statements:
SHOW VARIABLES LIKE 'character_set%';
SHOW VARIABLES LIKE 'collation%';
This May be lengthy answer but there is all way, you can use. Hopeful my answer is helpful for you. for more information http://dev.mysql.com/doc/refman/5.7/en/charset-applications.html
This worked for me:
mysqli_query($connection, "SET NAMES 'utf8'");
DATABASE CONNECTION TO UTF-8
$connect = mysql_connect('$localhost','$username','$password') or die(mysql_error());
mysql_set_charset('utf8',$connect);
mysql_select_db('$database_name','$connect') or die(mysql_error());
SET NAMES UTF8
This is does the trick
SET NAMES UTF8
(or UTF8mb4
) is correct, you don't explain what it does (character set used for this connection). "This does the trick" sounds like it would solve the problem (make MySQL handle UTF-8 properly), but many MySQL databases are set to latin1 by default, so that wouldn't make it a proper solution. I would change the default charset and the table charsets to utf8mb4. Really, this answer is rather incomplete, so I downvoted it.
Set your database connection to UTF8:
if($handle = @mysql_connect(DB_HOST, DB_USER, DB_PASS)){
//set to utf8 encoding
mysql_set_charset('utf8',$handle);
}
mysql_*
interface. Switch to mysqli_*
or PDO
.
Was able to find a solution. Ran the following as specified at http://technoguider.com/2015/05/utf8-set-up-in-mysql/
SET NAMES UTF8;
set collation_server = utf8_general_ci;
set default-character-set = utf8;
set init_connect = ’SET NAMES utf8′;
set character_set_server = utf8;
set character_set_client = utf8;
CHARACTER SET utf8
. root
will not execute the all-important init_connect
.
Success story sharing
utf8
within MySQL only refers to a small subset of full Unicode. You should useutf8mb4
instead to force full support. See mathiasbynens.be/notes/mysql-utf8mb4 "For a long time, I was using MySQL’s utf8 charset for databases, tables, and columns, assuming it mapped to the UTF-8 encoding described above."latin1
andlatin1_swedish_ci
for the default charset and collation. See the "Server Character Set and Collation" page in the MySQL manual for confirmation: dev.mysql.com/doc/refman/5.1/en/charset-server.htmlutf8mb4
taking extra storage when most text is ASCII. Althoughchar
strings are preallocated,varchar
strings are not -- see the last few lines on this documentation page. For example,char(10)
will be pessimistically reserve 40 bytes under utf8mb4, butvarchar(10)
will allocate bytes in keeping with the variable length encoding.varchar(n)
to thetext
data type if you attempt to alter avarchar(n)
field to larger than the feasible byte size (while issuing a warning). An index will also have a lower worst-case upper bound, and that may present other problems.