ChatGPT解决这个技术问题 Extra ChatGPT

PHP json encode - Malformed UTF-8 characters, possibly incorrectly encoded

This question already has answers here: UTF-8 all the way through (13 answers) Closed 2 days ago.

I'm using json_encode($data) to an data array and there's a field contains Russian characters.

I used this mb_detect_encoding() to display what encoding it is for that field and it displays UTF-8.

I think the json encode failed due to some bad characters in it like "ра▒". I tried alot of things utf8_encode on the data and it will by pass that error but then the data doesn't look correct anymore.

What can be done with this issue?

I tried alot of things - Like what? Please show us your code/research.
Have you tried the JSON_UNESCAPED_UNICODE option?
Tried "JSON_UNESCAPED_UNICODE" already. Not work.
I tried other things and either will return the same error or the character totally changed something not readable.
utf8_encode() is for converting 8859-1 to UTF8, and feeding it a UTF8 string will corrupt it.

S
Stan Quinn

The issue happens if there are some non-utf8 characters inside even though most of them are utf8 chars. This will remove any non-utf8 characters and now it works.

$data['name'] = mb_convert_encoding($data['name'], 'UTF-8', 'UTF-8');

You might want to add this as well $mysqli->set_charset("utf8");
I've tried to find that invalid string by adding the following code: ` foreach ($addresses as $address) { $converted = mb_convert_encoding($address, 'UTF-8', 'UTF-8'); if ($converted !== $address) { dd($addresses); } }` Two points: 1. The $converted !== $address condition is never met. I suppose this is because === is a "binary-safe" operator… 2. I don't get error in the end, even though I never assign $converted to anything! It's like mb_convert_encoding() accepted string by reference, although it's not…
I
Irshad Khan

If you have a multidimensional array to encode in JSON format then you can use below function:

If JSON_ERROR_UTF8 occurred :

$encoded = json_encode( utf8ize( $responseForJS ) );

Below function is used to encode Array data recursively

/* Use it for json_encode some corrupt UTF-8 chars
 * useful for = malformed utf-8 characters possibly incorrectly encoded by json_encode
 */
function utf8ize( $mixed ) {
    if (is_array($mixed)) {
        foreach ($mixed as $key => $value) {
            $mixed[$key] = utf8ize($value);
        }
    } elseif (is_string($mixed)) {
        return mb_convert_encoding($mixed, "UTF-8", "UTF-8");
    }
    return $mixed;
}

mb_convert_encoding does the recursive work itself, as you can see in the documentation link: If val is an array, all its string values will be converted recursively. So the function utf8ize is not needed. All you need would be json_encode(mb_convert_encoding($responseForJS, "UTF-8", "UTF-8"));
mb_convert_encoding is only able to convert arrays if you are running PHP 7.2 or above, just for clarification. Otherwise, this function works perfectly.
T
Tom Ah

Please, make sure to initiate your Pdo object with the charset iso as utf8. This should fix this problem avoiding any re-utf8izing dance.

$pdo = new PDO("mysql:host=localhost;dbname=mybase;charset=utf8", 'user', 'password');

This solved my situation. It also works for other connection types, like dlib for MSSQL Server.
Was given an old project to fix encoding issues and this helped me a lot. Only difference is that this project was using ADO and solution was a little bit different, solved it by using setCharset(), info here adodb.org/dokuwiki/…
h
hugsbrugs

With php 7.2, two options allow to manage invalid UTF-8 direcly in json_encode :

https://www.php.net/manual/en/function.json-encode

json_encode($text, JSON_INVALID_UTF8_IGNORE);

Or

json_encode($text, JSON_INVALID_UTF8_SUBSTITUTE);

thanks, It works for me because my response in api has emoji in title string, but i have one confusion, that i have read somewhere that emoji is utf-8 character then why emoji in string gives this malformed utf-8 characters error?
@HaritsinhGohil perhaps some emojis are valid UTF-8 chars and others are not ...
M
M.Bilal Murtaza

you just add in your pdo connection charset=utf8 like below line of pdo connection:

$pdo = new PDO("mysql:host=localhost;dbname=mybase;charset=utf8", 'user', 'password');

hope this will help you


A
Adam Michalik

Remove HTML entities before JSON encoding. I used html_entity_decode() in PHP and the problem was solved

$json = html_entity_decode($source);
$data = json_decode($json,true);

K
Kees de Kooter

Do you by any chance have UUIDs in your result set? In that case the following database flag will help:

PDO::DBLIB_ATTR_STRINGIFY_UNIQUEIDENTIFIER => true

R
Rafa Rodríguez

If your data is well encoded in the database for example, make sure to use the mb_ * functions for string handling, before json_encode. Functions like substr or strlen do not work well with utf8mb4 and can cut your text and leave a malformed UTF8


F
Fernando Coelho

I know this is kind of an old topic, but for me it was what I needed. I just needed to modify the answer 'jayashan perera'.

//...code
        $stmt->execute();
        $result = $stmt->fetchAll(PDO::FETCH_ASSOC);


        for ($i=0; $i < sizeof($result) ; $i++) { 
            $tempCnpj = $result[$i]['CNPJ'];
            $tempFornecedor = json_encode(html_entity_decode($result[$i]['Nome_fornecedor']),true) ;
            $tempData = $result[$i]['efetivado_data'];
            $tempNota = $result[$i]['valor_nota'];
            $arrResposta[$i] = ["Status"=>"true", "Cnpj"=>"$tempCnpj", "Fornecedor"=>$tempFornecedor, "Data"=>"$tempData", "Nota"=>"$tempNota" ];
        }

        echo json_encode($arrResposta);

And no .js i have use

obj = JSON.parse(msg);