者:俊達
表數據出現亂碼的情況通常是由于數據的真實編碼與相關參數不一致引起的,其中包括常見的參數如character_set_client、character_set_results、字段編碼以及終端編碼等。確保這些參數保持一致,可以有效預防和解決亂碼問題。
終端的字符編碼為utf8
mysql> show variables like '%char%';
+--------------------------+------------------------------------------------------+
| Variable_name | Value |
+--------------------------+------------------------------------------------------+
| character_set_client | utf8mb3 |
| character_set_connection | utf8mb3 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb3 |
| character_set_server | latin1 |
| character_set_system | utf8mb3 |
| character_sets_dir | /usr/local/Cellar/mysql/8.0.31/share/mysql/charsets/ |
+--------------------------+------------------------------------------------------+
8 rows in set (0.00 sec)
mysql> create table test_charset(a varchar(100)) default charset utf8;
Query OK, 0 rows affected, 1 warning (0.01 sec)
mysql> insert into test_charset values('列列列列列AAA');
Query OK, 1 row affected (0.01 sec)
mysql> select * from test_charset;
+--------------------+
| a |
+--------------------+
| 列列列列列AAA |
+--------------------+
1 row in set (0.00 sec)
將character_set_results設置為gbk,則無法正常顯示返回的數據:
mysql> set character_set_results=gbk;
Query OK, 0 rows affected (0.00 sec)
mysql> select *,hex(a) from test_charset;
+---------------+--------------------------------------+
| a | hex(a) |
+---------------+--------------------------------------+
| ??????????AAA | E58897E58897E58897E58897E58897414141 |
+---------------+--------------------------------------+
在另外一個GBK編碼的終端下:
mysql> set character_set_results=gbk;
Query OK, 0 rows affected (0.00 sec)
mysql> select *, hex(a) from test_charset;
+---------------+--------------------------------------+
| a | hex(a) |
+---------------+--------------------------------------+
| 列列列列列AAA | E58897E58897E58897E58897E58897414141 |
+---------------+--------------------------------------+
1 row in set (0.00 sec)
mysql> set character_set_results=utf8;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> select *, hex(a) from test_charset;
+--------------------+--------------------------------------+
| a | hex(a) |
+--------------------+--------------------------------------+
| 鍒楀垪鍒楀垪鍒桝AA | E58897E58897E58897E58897E58897414141 |
+--------------------+--------------------------------------+
1 row in set (0.01 sec)
上面這幾個例子中,數據庫中存儲的數據沒有問題(使用hex函數查看真實的數據,和字符的字符集(UTF8)編碼一致),但是因為客戶端的字符編碼設置不對,導致無法正常查看字符。
create table test_charset2(a varchar(100), b varchar(100)) default charset utf8;
-- utf8終端
mysql> set character_set_client='gbk';
Query OK, 0 rows affected (0.00 sec)
mysql> insert into test_charset2 values('term: utf8, charset client: gbk', '列列列列列AAA');
Query OK, 1 row affected (0.00 sec)
mysql> select *, hex(b) from test_charset2;
+---------------------------------+----------------------------+------------------------------------------------------+
| a | b | hex(b) |
+---------------------------------+----------------------------+------------------------------------------------------+
| term: utf8, charset client: gbk | 鍒楀垪鍒楀垪鍒桝AA | E98D92E6A580E59EAAE98D92E6A580E59EAAE98D92E6A19D4141 |
+---------------------------------+----------------------------+------------------------------------------------------+
1 row in set (0.00 sec)
漢字“列”的utf8編碼應該是E58897,但是表中實際存儲的數據不對。
>>> u"列".encode('utf8')
'\xe5\x88\x97'
測試2.1中,由于character_set_client和實際的數據的字符集編碼不匹配,導致表里存儲的數據本身就是有問題的(漢字“列”的utf8編碼為e58897)。
使用python模擬這個過程:
-- 本來是utf8編碼的數據,使用gbk編碼來解碼,然后再以utf8(表的字符集)方式編碼
>>> u"列列列列列AAA".encode('utf8').decode('gbk').encode('utf8')
'\xe9\x8d\x92\xe6\xa5\x80\xe5\x9e\xaa\xe9\x8d\x92\xe6\xa5\x80\xe5\x9e\xaa\xe9\x8d\x92\xe6\xa1\x9dAA'
>>> print u"列列列列列AAA".encode('utf8').decode('gbk').encode('utf8')
鍒楀垪鍒楀垪鍒桝AA
-- gbk終端
mysql> set character_set_client='utf8';
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> insert into test_charset2 values('term: gbk, charset client: utf8', '列列列列列AAA');
Query OK, 1 row affected, 1 warning (0.01 sec)
mysql> show warnings;
+---------+------+-------------------------------------------------------------------------+
| Level | Code | Message |
+---------+------+-------------------------------------------------------------------------+
| Warning | 1300 | Cannot convert string '\xC1\xD0\xC1\xD0\xC1\xD0...' from utf8mb3 to gbk |
+---------+------+-------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> select *, hex(b) from test_charset2;
+---------------------------------+--------------------+------------------------------------------------------+
| a | b | hex(b) |
+---------------------------------+--------------------+------------------------------------------------------+
| term: utf8, charset client: gbk | 鍒楀垪鍒楀垪鍒桝AA | E98D92E6A580E59EAAE98D92E6A580E59EAAE98D92E6A19D4141 |
| term: gbk, charset client: utf8 | ??????????AAA | 3F3F3F3F3F3F3F3F3F3F414141 |
+---------------------------------+--------------------+------------------------------------------------------+
2 rows in set (0.00 sec)
原因解析
本來時gbk方式編碼的數據,使用utf8編碼無法解碼,使用替換字符(3F) 替換無法解碼的數據。
如果我們將錯就錯,在utf8終端上,將character_set_results設置為gbk。
查詢返回看起來正常的數據:
-- utf8編碼的終端
mysql> set character_set_results=gbk;
Query OK, 0 rows affected (0.00 sec)
mysql> select *, hex(b) from test_charset2;
+---------------------------------+--------------------+------------------------------------------------------+
| a | b | hex(b) |
+---------------------------------+--------------------+------------------------------------------------------+
| term: utf8, charset client: gbk | 列列列列列AAA | E98D92E6A580E59EAAE98D92E6A580E59EAAE98D92E6A19D4141 |
| term: gbk, charset client: utf8 | ??????????AAA | 3F3F3F3F3F3F3F3F3F3F414141 |
+---------------------------------+--------------------+------------------------------------------------------+
2 rows in set (0.00 sec)
原因解析
數據庫中實際存儲的數據:
>>> u"列列列列列AAA".encode('utf8').decode('gbk').encode('utf8')
'\xe9\x8d\x92\xe6\xa5\x80\xe5\x9e\xaa\xe9\x8d\x92\xe6\xa5\x80\xe5\x9e\xaa\xe9\x8d\x92\xe6\xa1\x9dAA'
-- character_set_results 設置為gbk,則mysql以gbk編碼返回數據,
>>> u"列列列列列AAA".encode('utf8').decode('gbk').encode('gbk')
'\xe5\x88\x97\xe5\x88\x97\xe5\x88\x97\xe5\x88\x97\xe5\x88\x97AAA'
-- 由于終端字符集為utf8,以utf8編碼解釋返回的數據
>>> u"列列列列列AAA".encode('utf8').decode('gbk').encode('gbk').decode('utf8')
u'\u5217\u5217\u5217\u5217\u5217AAA'
u5217就是漢字“列“的unicode編碼
在gbk終端上,將character_set_results設置為utf8:
-- gbk編碼的終端
mysql> set character_set_results=utf8;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> select *, hex(b) from test_charset2;
+---------------------------------+----------------------------+------------------------------------------------------+
| a | b | hex(b) |
+---------------------------------+----------------------------+------------------------------------------------------+
| term: utf8, charset client: gbk | 閸掓鍨崚妤鍨崚妗滱A | E98D92E6A580E59EAAE98D92E6A580E59EAAE98D92E6A19D4141 |
| term: gbk, charset client: utf8 | ??????????AAA | 3F3F3F3F3F3F3F3F3F3F414141 |
+---------------------------------+----------------------------+------------------------------------------------------+
2 rows in set (0.00 sec)
原因解析
數據庫中實際存儲的數據:
>>> u"列列列列列AAA".encode('utf8').decode('gbk').encode('utf8')
'\xe9\x8d\x92\xe6\xa5\x80\xe5\x9e\xaa\xe9\x8d\x92\xe6\xa5\x80\xe5\x9e\xaa\xe9\x8d\x92\xe6\xa1\x9dAA'
-- character_set_results 設置為utf8,則mysql以utf8編碼返回數據,
>>> u"列列列列列AAA".encode('utf8').decode('gbk').encode('utf8')
'\xe9\x8d\x92\xe6\xa5\x80\xe5\x9e\xaa\xe9\x8d\x92\xe6\xa5\x80\xe5\x9e\xaa\xe9\x8d\x92\xe6\xa1\x9dAA'
-- 由于終端字符集為gbk,以gbk編碼解釋返回的數據。
>>> u"列列列列列AAA".encode('utf8').decode('gbk').encode('utf8').decode('gbk')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'gbk' codec can't decode bytes in position 4-5: illegal multibyte sequence
-- 加上error='replace'
>>> u"列列列列列AAA".encode('utf8').decode('gbk').encode('utf8').decode('gbk', errors='replace')
u'\u95b8\u6393\ufffd\u9368\ufffd\u5d1a\u59a4\ufffd\u7059\u95b8\u6393\ufffdAA'
>>> print u"列列列列列AAA".encode('utf8').decode('gbk').encode('utf8').decode('gbk', errors='replace')
閸掓?鍨?崚妤?灙閸掓?AA
-- 使gb18030,
>>> u"列列列列列AAA".encode('utf8').decode('gbk').encode('utf8').decode('gb18030', errors='replace')
u'\u95b8\u6393\ue685\u9368\ue048\u5d1a\u59a4\ufffd\u7059\u95b8\u6393\ue522AA'
>>> print u"列列列列列AAA".encode('utf8').decode('gbk').encode('utf8').decode('gb18030', errors='replace')
閸掓鍨崚妤?灙閸掓AA
-- 上面的數據和mysql終端顯示的數據(閸掓鍨崚妤鍨崚妗滱A )有一些差異。原因可能是他們對gbk無法解碼的數據處理方式有差異。
當客戶端字符編碼方式和MySQL參數character_set_client、character_set_results設置不匹配時,容易產生亂碼。有的時候,數據庫存儲的數據沒有問題,但是客戶端編碼設置不對,導致看上去有亂碼,這種情況下只需要將客戶端編碼設置正確,就可以了。有的時候,存儲到數據庫的數據就不對了,甚至原來的數據無法編碼,會被“?”等替換字符替換,這時可能就無法得到原先的數據了。
更多技術信息請查看云掣官網云掣YunChe - 可觀測運維專家 | 大數據運維托管 | 云MSP服務
mysql 中常常出現對中文支持不友好的情況常見的錯誤 “Illegal mix of collations for operation”
下面我們規整一下 mysql 數據庫中一共涉及到哪些字符集
character-set-server/default-character-set:服務器字符集,默認情況下所采用的
character-set-database:數據庫字符集
character-set-table:數據庫表字符集
一般情況下只需要設置character-set-server,而在創建數據庫和表時不特別指定字符集,這樣統一采用character-set-server字符集.
character-set-client:客戶端的字符集
character-set-results:結果字符集
在客戶端,如果沒有定義character-set-results,則采用character-set-client字符集作為默認的字符集
下面分享一個解決mysql 中文字符亂碼的解決方案:
修改mysql 配置
vim /etc/mysql/my.cnf
查找 “client” 一項:
上面黃色關鍵詞就是我們要找的 client,緊接著在下面添加一行:default-character-set=utf8
保存退出配置文件.
然后關閉mysql服務,重新啟動.
/etc/init.d/mysql stop
/etc/init.d/mysql start
然后我們進入 mysql ,查看字符集
mysql> show variables like 'character\_set\_%';
+--------------------------+--------+
| Variable_name | Value |
+--------------------------+--------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
+--------------------------+--------+
7 rows in set (0.00 sec)
上面結果可以看到字符集默認支持 utf8,我們在涉及到漢字的時候不會有亂碼的情況.
出自:一介布衣 http://yijiebuyi.com