ChatGPT解决这个技术问题 Extra ChatGPT

Change MySQL default character set to UTF-8 in my.cnf?

Currently we are using the following commands in PHP to set the character set to UTF-8 in our application.

Since this is a bit of overhead, we'd like to set this as the default setting in MySQL. Can we do this in /etc/my.cnf or in another location?

SET NAMES 'utf8'
SET CHARACTER SET utf8

I've looked for a default charset in /etc/my.cnf, but there's nothing there about charsets.

At this point, I did the following to set the MySQL charset and collation variables to UTF-8:

skip-character-set-client-handshake
character_set_client=utf8
character_set_server=utf8

Is that a correct way to handle this?

Note that a better default would be utf8mb4, i.e. real UTF-8 with full Unicode support. See How to support full Unicode in MySQL databases.
@Jorre would you object to changing this to be utf8mb4 this is kind setting a dangerous precedent as it sits?

P
PHP Guru

To set the default to UTF-8, you want to add the following to my.cnf/my.ini

[client]
default-character-set=utf8mb4

[mysql]
default-character-set=utf8mb4


[mysqld]
collation-server = utf8mb4_unicode_520_ci
init-connect='SET NAMES utf8mb4'
character-set-server = utf8mb4

If you want to change the character set for an existing DB, let me know... your question didn't specify it directly so I am not sure if that's what you want to do.

Edit: I replaced utf8 with utf8mb4 in the original answer due to utf8 only being a subset of UTF-8. MySQL and MariaDB both call UTF-8 utf8mb4.


The above my.cnf settings worked for me as well. Additionally, I had to make sure the table was set properly, such as ALTER TABLE Table CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
Does not work for mysql 5.5. I used:
[mysqld] # Changes for utf-8 collation-server = utf8mb4_unicode_ci init-connect='SET NAMES utf8mb4' character-set-server = utf8mb4 and utf8mb4 elsewhere as mentioned above.
On Ubuntu 12.04, this worked for me - if I removed the first line after [mysqld].
It looks like default-character-set is no longer allowed in the [mysqld] section
Note that if UTF-8 is what you want, don’t use MySQL’s utf8 charset. Use utf8mb4 instead.
P
Peter Mortensen

For the recent version of MySQL,

default-character-set = utf8

causes a problem. It's deprecated I think.

As Justin Ball says in "Upgrade to MySQL 5.5.12 and now MySQL won’t start, you should:

Remove that directive and you should be good. Then your configuration file ('/etc/my.cnf' for example) should look like that: [mysqld] collation-server = utf8_unicode_ci init-connect='SET NAMES utf8' character-set-server = utf8 Restart MySQL. For making sure, your MySQL is UTF-8, run the following queries in your MySQL prompt: First query: mysql> show variables like 'char%'; The output should look like: +--------------------------+---------------------------------+ | Variable_name | Value | +--------------------------+---------------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | utf8 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | utf8 | | character_set_system | utf8 | | character_sets_dir | /usr/local/mysql/share/charsets/| +--------------------------+---------------------------------+ Second query: mysql> show variables like 'collation%'; And the query output is: +----------------------+-----------------+ | Variable_name | Value | +----------------------+-----------------+ | collation_connection | utf8_general_ci | | collation_database | utf8_unicode_ci | | collation_server | utf8_unicode_ci | +----------------------+-----------------+


Is your first line max_allowed_packet=64M related to this UTF8 issue in any way?
I have character_set_filesystem | utf8. I'm not sure, is it ok?
Tried on MariaDB v5.5 and it works, thanks! mysql Ver 15.1 Distrib 5.5.35-MariaDB, for debian-linux-gnu (x86_64) using readline 5.1
I think you don't need to remove default-character-set, instead you can change to loose-default-character-set = utf8 — that is, prefix with 'loose-'. This makes mysqlbinlog happy, if you need to use it — google for: "loose-default-character-set" mysqlbinlog.
Is init-connect='SET NAMES utf8' really necessary? If not we can definitely do without it for performance.
4
4 revs

This question already has a lot of answers, but Mathias Bynens mentioned that 'utf8mb4' should be used instead of 'utf8' in order to have better UTF-8 support ('utf8' does not support 4 byte characters, fields are truncated on insert). I consider this to be an important difference. So here is yet another answer on how to set the default character set and collation. One that'll allow you to insert a pile of poo (💩).

This works on MySQL 5.5.35.

Note, that some of the settings may be optional. As I'm not entirely sure that I haven't forgotten anything, I'll make this answer a community wiki.

Old Settings

mysql> SHOW VARIABLES LIKE 'char%'; SHOW VARIABLES LIKE 'collation%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | latin1                     |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

+----------------------+-------------------+
| Variable_name        | Value             |
+----------------------+-------------------+
| collation_connection | utf8_general_ci   |
| collation_database   | latin1_swedish_ci |
| collation_server     | latin1_swedish_ci |
+----------------------+-------------------+
3 rows in set (0.00 sec)

Config

# 💩 𝌆
# UTF-8 should be used instead of Latin1. Obviously.
# NOTE "utf8" in MySQL is NOT full UTF-8: http://mathiasbynens.be/notes/mysql-utf8mb4

[client]
default-character-set = utf8mb4

[mysqld]
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci

[mysql]
default-character-set = utf8mb4

New Settings

mysql> SHOW VARIABLES LIKE 'char%'; SHOW VARIABLES LIKE 'collation%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8mb4                    |
| character_set_connection | utf8mb4                    |
| character_set_database   | utf8mb4                    |
| character_set_filesystem | binary                     |
| character_set_results    | utf8mb4                    |
| character_set_server     | utf8mb4                    |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

+----------------------+--------------------+
| Variable_name        | Value              |
+----------------------+--------------------+
| collation_connection | utf8mb4_general_ci |
| collation_database   | utf8mb4_unicode_ci |
| collation_server     | utf8mb4_unicode_ci |
+----------------------+--------------------+
3 rows in set (0.00 sec)

character_set_system is always utf8.

This won't affect existing tables, it's just the default setting (used for new tables). The following ALTER code can be used to convert an existing table (without the dump-restore workaround):

ALTER DATABASE databasename CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Edit:

On a MySQL 5.0 server: character_set_client, character_set_connection, character_set_results, collation_connection remain at latin1. Issuing SET NAMES utf8 (utf8mb4 not available in that version) sets those to utf8 as well.

Caveat: If you had a utf8 table with an index column of type VARCHAR(255), it can't be converted in some cases, because the maximum key length is exceeded (Specified key was too long; max key length is 767 bytes.). If possible, reduce the column size from 255 to 191 (because 191 * 4 = 764 < 767 < 192 * 4 = 768). After that, the table can be converted.


I followed your settings, but I find character_set_database is still utf8 and collation_database is utf8_bin`. Did I miss something?
Found what I missed. Those 2 settings are set when the database itself is created. See this question; stackoverflow.com/questions/22572558/…
add character-set-client-handshake = FALSE to [mysqld] section, so it will always use default encoding even if you will make mistake on application layer
hi I have tried this solution but still I can't show character_set_client | utf8mb4. it's utf8. just character_set_database utf8mb4 and character_set_server utf8mb4 are utf8mb4 other not changed. Please help me
@baic6 I suggest adding a note clarifying why only indexed columns can cause the issue. 767 bytes limitation is only for indexed columns, coz the limitation for "key"s. For non-indexed columns in general, this doesn't apply. If a column had value with max byte length, say 255 characters and the max 3 bytes of utf8 were used, it would still need only 255*3 bytes, utf8mb4 in these cases will use only 3-bytes, as these characters in utf8mb4 also need only 3 bytes. That is, unless a character that doesn't fit in utf8 was stored into the column in which case the info is already lost.
t
tomazzlender

On MySQL 5.5 I have in my.cnf

[mysqld] 
init_connect='SET collation_connection = utf8_unicode_ci' 
init_connect='SET NAMES utf8' 
character-set-server=utf8 
collation-server=utf8_unicode_ci 
skip-character-set-client-handshake

Result is

mysql> show variables like "%character%";show variables like "%collation%";

+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | utf8                       |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

+----------------------+-----------------+
| Variable_name        | Value           |
+----------------------+-----------------+
| collation_connection | utf8_unicode_ci |
| collation_database   | utf8_unicode_ci |
| collation_server     | utf8_unicode_ci |
+----------------------+-----------------+
3 rows in set (0.00 sec)

What does skip-character-set-client-handshake do exactly and under what circumstances is it required?
@Simon, when using show variables like "%collation%"; I was seeing 'collation_connection' as utf8_general_ci and collation_database and collation_server as utf8_unicode_ci. Adding the line skip-character-set-client-handshake changed the general entry to unicode, so all three were consistent.
On MySQL 5.6, the skip-character-set-client-handshake does not do what @Vaughany said. Any suggestion?
The skip-character-set-client-handshake did it for me (along with the "character-set-server" and "collation-server", of course -- on Debian 7). The init_connects did not make a difference with or without the skip and seem redundant. +1 and thank you.
skip-character-set-client-handshake ignores character set information sent by the client. It's nice for forcing a specific character set during communication, but if the client is expecting something else, it could lead to issues. I'd say you're better off specifying this on the client (your application).
М
М.Б.

Note: my.cnf file is located at /etc/mysql/

After adding these lines:

[mysqld]
collation-server = utf8_unicode_ci
init-connect='SET NAMES utf8'
character-set-server = utf8
skip-character-set-client-handshake

[client]
default-character-set   = utf8

[mysql]
default-character-set   = utf8

Don't forget to restart server:

sudo service mysql restart

Note: This changes defaults; it does not change the encoding for any existing columns.
J
Justin Watt

MySQL v5.5.3 and greater:

Just add three lines only in the [mysqld] section:

[mysqld]
character-set-server = utf8
collation-server = utf8_unicode_ci
skip-character-set-client-handshake

Note: Including skip-character-set-client-handshake here obviates the need to include both init-connect in [mysqld] and default-character-set in the [client] and [mysql] sections.


Wow, so many duplicative answers to this question, and yet this seems to be the simplest one. Also works for me on v5.5.44.
D
Derek

NijaCat was close, but specified overkill:

To set the default to UTF-8, you want to add the following to my.cnf

[client]
default-character-set=utf8

[mysqld]
default-character-set = utf8

Then, to verify:

mysql> show variables like "%character%";show variables like "%collation%";

+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | utf8                       |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

+----------------------+-----------------+
| Variable_name        | Value           |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database   | utf8_general_ci |
| collation_server     | utf8_general_ci |
+----------------------+-----------------+
3 rows in set (0.00 sec)

Nice answer. Maybe a brief explanation about which parts are overkill would help people decide exactly what they want in their config.
@Derek, What does [mysql] default-character-set=utf8 do then?
The [client] option group already sets options for all MySQL clients that read the my.cnf file. The [mysql] group sets options for just the "mysql" client binary, specifically. If you want to set a mysql option that does not apply to other clients, using this group is appropriate, but setting the same option value in both locations is redundant. dev.mysql.com/doc/refman/5.6/en/option-files.html
@Mike Samuel, setting the default character set to utf8 already implies the default collation mode of utf8_general_ci. On the other hand, "init-connect='SET NAMES utf8'" is a somewhat interesting idea, as it attempts to force all clients connecting to the server to use utf8, but this is applied inconsistently depending on the connection privileges of the connecting user and I suspect that some connecting third party clients might be confused when the server overrides the charset they requested. dev.mysql.com/doc/refman/5.6/en/charset-applications.html
For [mysqld] use character-set-server instead of default-character-set
P
Peter Mortensen

I also have found out that after setting default-character-set = utf8 under [mysqld] title, MySQL 5.5.x would not start under Ubuntu 12.04 (Precise Pangolin).


That’s MySQL bug #52047. Under [mysqld], you should use character-set-server instead of default-character-set. (Extremely confusing, I agree!)
Yes. Deprecated in 5.0; removed in 5.5.
S
Sergei Danielian

All settings listed here are correct, but here are the most optimal and sufficient solution:

[mysqld]
init_connect='SET collation_connection = utf8_unicode_ci'
character-set-server = utf8
collation-server = utf8_unicode_ci

[client]
default-character-set = utf8

Add these to /etc/mysql/my.cnf.

Please note, I choose utf8_unicode_ci type of collation due to the performance issue.

The result is:

mysql> SHOW VARIABLES LIKE 'character%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | utf8                       |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+

mysql> SHOW VARIABLES LIKE 'collation%';
+----------------------+-----------------+
| Variable_name        | Value           |
+----------------------+-----------------+
| collation_connection | utf8_unicode_ci |
| collation_database   | utf8_unicode_ci |
| collation_server     | utf8_unicode_ci |
+----------------------+-----------------+

And this is when you connect as non-SUPER user!

For example, the difference between connection as SUPER and non-SUPER user (of course in case of utf8_unicode_ci collation):

user with SUPER priv.:

mysql> SHOW VARIABLES LIKE 'collation%';
+----------------------+-----------------+
| Variable_name        | Value           |
+----------------------+-----------------+
| collation_connection | utf8_general_ci | <---
| collation_database   | utf8_unicode_ci |
| collation_server     | utf8_unicode_ci |
+----------------------+-----------------+

user with non-SUPER priv.:

mysql> SHOW VARIABLES LIKE 'collation%';
+----------------------+-----------------+
| Variable_name        | Value           |
+----------------------+-----------------+
| collation_connection | utf8_unicode_ci |
| collation_database   | utf8_unicode_ci |
| collation_server     | utf8_unicode_ci |
+----------------------+-----------------+

I wrote a comprehensive article (rus) explaining in details why you should use one or the other option. All types of Character Sets and Collations are considered: for server, for database, for connection, for table and even for column.

I hope this and the article will help to clarify unclear moments.


I wish I could read your linked article, but can't read Russian and Google translate is not that helpful for technical texts. From what I can tell though it would be very interesting. Would you please consider posting an English version?
Could you explain which setting fails to set the SUPER user connection collation? (and is there a solution)
@gahcep, You stated you use utf8_unicode_ci due to performance issues, then, why not use utf8_bin instead?
Hi, Paceriar. Good point. Now I'm not sure if my choice was correct about using utf8_unicode_ci at all. Never spent time testing the performance.
F
Falko

Under Xubuntu 12.04 I simply added

[mysqld]
character_set_server = utf8

to /etc/mysql/my.cnf

And the result is

mysql> show variables like "%character%";show variables like "%collation%";
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | utf8                       |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

+----------------------+-----------------+
| Variable_name        | Value           |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database   | utf8_general_ci |
| collation_server     | utf8_general_ci |
+----------------------+-----------------+
3 rows in set (0.00 sec)

Also take a look at http://dev.mysql.com/doc/refman/5.6/en/charset-server.html


You also need [client]default-character-set=utf8 otherwise when you use mysql at the command line it won't use utf8 potentially destroying any backups you pipe to file.
T
Tieme

On Fedora 21

$ vi /etc/my.cnf

Add follow:

[client]
default-character-set=utf8

[mysql]
default-character-set=utf8

[mysqld]
init_connect='SET collation_connection = utf8_unicode_ci'
init_connect='SET NAMES utf8'
character-set-server=utf8
collation-server=utf8_unicode_ci 
skip-character-set-client-handshake

Save and exit.

Final remember restart service mysqld with service mysqld restart.


L
Loren

The directive has changed to character-set-system=utf8

http://dev.mysql.com/doc/refman/5.6/en/charset-configuration.html


This leads to an error on MySQL 8.0: [ERROR] [MY-000067] [Server] unknown variable 'character-set-system=utf8mb4'
P
Peter Mortensen

MySQL versions and Linux distributions may matter when making configurations.

However, the changes under [mysqld] section is encouraged.

I want to give a short explanation of tomazzlender's answer:

[mysqld] 
init_connect='SET collation_connection = utf8_unicode_ci' 
init_connect='SET NAMES utf8' 
character-set-server=utf8 
collation-server=utf8_unicode_ci 
skip-character-set-client-handshake

[mysqld]

This will change collation_connection to utf8_unicode_ci

init_connect='SET collation_connection = utf8_unicode_ci'

Using SET NAMES:

init_connect='SET NAMES utf8'

The SET NAMES will influence three characters, that is:

character_set_client
character_set_results
character_set_connection

This will set character_set_database & character_set_server

character-set-server=utf8

This will only affect collation_database & collation_server

collation-server=utf8_unicode_ci

Sorry, I'm not so sure what is this for. I don't use it however:

skip-character-set-client-handshake

The documentation notes that character_set_server should not be set manually.
So what is the state-of-art approach? We can explicitly pass the character set when establishing database connection, however looks a little annoying.
Actually I made a mistake. The documentations says character_set_database should not be set dynamically. Doesn't mention character_set_server. However, I'm not sure you need to worry about character_set_server, since I think it only affects the default value for a newly created database?
As I mentioned above and I did the experiment again just now, character_set_server in [mysqld] section will affect character_set_database and character_set_server. Now I think a good practice is to explicitly point out the character set that you want to use when creating a database, a table and database connection. Honestly speaking, the documentation of mysql is not so easily understood. Do you know if there is a generic way to get all the character set and collation settings done in mysql?
G
Gajus

MySQL 5.5, all you need is:

[mysqld]
character_set_client=utf8
character_set_server=utf8
collation_server=utf8_unicode_ci

collation_server is optional.

mysql> show variables like 'char%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | utf8                       |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

p
paiego

If you're having trouble confirming the client's character-set support using MySQL Workbench, then keep the following note in mind:

Important All connections opened by MySQL Workbench automatically set the client character set to utf8. Manually changing the client character set, such as using SET NAMES ..., may cause MySQL Workbench to not correctly display the characters. For additional information about client character sets, see Connection Character Sets and Collations.

Thus I was unable to override MySQL Workbench's character sets with my.cnf changes. e.g. 'set names utf8mb4'


m
mathielo

If you are confused by your setting for client and conn is reseted after restart mysql service. Try these steps (which worked for me):

vi /etc/my.cnf add the contents blow and :wq [client] character-sets-dir=/usr/local/mysql/share/mysql/charsets restart mysql and login mysql , use database, input command status;, you'll find the character-set for 'client' and 'conn' is set to 'utf8'.

Check the reference for more info.


S
Sudhakar Krishnan

https://i.stack.imgur.com/DSVYK.png

For utf8mb4_general_ci

[client]
default-character-set=utf8mb4

[mysql]
default-character-set=utf8mb4
    
[mysqld]
collation-server = utf8mb4_general_ci
init-connect='SET NAMES utf8mb4'
character-set-server = utf8mb4

As per symfony framework Documentation at https://symfony.com/doc/2.6/book/doctrine.html#configuring-the-database

We recommend against MySQL’s utf8 character set, since it does not support 4-byte unicode characters, and strings containing them will be truncated. This is fixed by the newer utf8mb4 character set.


郭浩伟

You can do it the way it does, and if it doesn't work, you need to restart mysql.


P
Peter Mortensen

Change MySQL character:

Client

default-character-set=utf8

mysqld

character_set_server=utf8

We should not write default-character-set=utf8 in mysqld, because that could result in an error like:

start: Job failed to start

At last:

 +--------------------------+----------------------------+
 | Variable_name            | Value                      |
 +--------------------------+----------------------------+
 | character_set_client     | utf8                       |
 | character_set_connection | utf8                       |
 | character_set_database   | utf8                       |
 | character_set_filesystem | binary                     |
 | character_set_results    | utf8                       |
 | character_set_server     | utf8                       |
 | character_set_system     | utf8                       |
 | character_sets_dir       | /usr/share/mysql/charsets/ |
 +--------------------------+----------------------------+