ChatGPT解决这个技术问题 Extra ChatGPT

How do I remove  from the beginning of a file?

I have a CSS file that looks fine when I open it using gedit, but when it's read by PHP (to merge all the CSS files into one), this CSS has the following characters prepended to it: 

PHP removes all whitespace, so a random  in the middle of the code messes up the entire thing. As I mentioned, I can't actually see these characters when I open the file in gedit, so I can't remove them very easily.

I googled the problem, and there is clearly something wrong with the file encoding, which makes sense being as I've been shifting the files around to different Linux/Windows servers via ftp and rsync, with a range of text editors. I don't really know much about character encoding though, so help would be appreciated.

If it helps, the file is being saved in UTF-8 format, and gedit won't let me save it in ISO-8859-15 format (the document contains one or more characters that cannot be encoded using the specified character encoding). I tried saving it with Windows and Linux line endings, but neither helped.

This appears to solve the problem. 95isalive.com/expression/index.html
Somebody strip us off the BOM

C
Community

Three words for you:

Byte Order Mark (BOM)

That's the representation for the UTF-8 BOM in ISO-8859-1. You have to tell your editor to not use BOMs or use a different editor to strip them out.

To automatize the BOM's removal you can use awk as shown in this question.

As another answer says, the best would be for PHP to actually interpret the BOM correctly, for that you can use mb_internal_encoding(), like this:

 <?php
   //Storing the previous encoding in case you have some other piece 
   //of code sensitive to encoding and counting on the default value.      
   $previous_encoding = mb_internal_encoding();

   //Set the encoding to UTF-8, so when reading files it ignores the BOM       
   mb_internal_encoding('UTF-8');

   //Process the CSS files...

   //Finally, return to the previous encoding
   mb_internal_encoding($previous_encoding);

   //Rest of the code...
  ?>

Yeah I found that when I googled it, but how do I remove them?
It doesn't remove the BOM, it ignores it.
Or the other way(ignore) could be change the encoding.
Windows Notepad (ugh) adds them; suggestion from a dup of this question is to use Notepad++, which allows setting "UTF-8 without BOM" as an encoding. Or use a Real Editor... (emacs!) :-)
That's exactly the issue, different character encodings use different bytes for the same characters. Read again the third paragraph of the answer.
P
Peter Mortensen

Open your file in Notepad++. From the Encoding menu, select Convert to UTF-8 without BOM, save the file, replace the old file with this new file. And it will work, damn sure.


In Notepad++ v7.6.6 (64-bit) you need to click Convert to UTF-8.
P
Peter Mortensen

In PHP, you can do the following to remove all non characters including the character in question.

$response = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $response);

in case you just want to kill the "ï" use this $response = preg_replace('/[\x80-\xFF]//', '', $response);
@guido_nhcol.com.br_ You add an extra /, it should be: $response = preg_replace('/[\x80-\xFF]/', '', $response);
P
Peter Mortensen

For those with shell access here is a little command to find all files with the BOM set in the public_html directory - be sure to change it to what your correct path on your server is

Code:

grep -rl $'\xEF\xBB\xBF' /home/username/public_html

and if you are comfortable with the vi editor, open the file in vi:

vi /path-to-file-name/file.php

And enter the command to remove the BOM:

set nobomb

Save the file:

wq

Use grep -rlI $'\xEF\xBB\xBF' . to ignore binary files.
E
Eugene Yokota

BOM is just a sequence of characters ($EF $BB $BF for UTF-8), so just remove them using scripts or configure the editor so it's not added.

From Removing BOM from UTF-8:

#!/usr/bin/perl
@file=<>;
$file[0] =~ s/^\xEF\xBB\xBF//;
print(@file);

I am sure it translates to PHP easily.


Note that the BOM is not a sequence of characters, it is a single character. If the file is in UTF-8, then the character is represented in three bytes. If the file is in UTF-8, then viewing it in another encoding (i.e., one in which EF BB BF appears where the BOM should be) is an error. To remove the BOM from a UTF-8 file, one should remove the (single) charcter U+FEFF. Yeah, pedantry!
I couldn't get that working in PHP (that's just my incompetence, not yours :P), so I did a check to see if the BOM is there and remove the first 3 characters. Here's the code, if anyone needs it: if( substr($css, 0,3) == pack("CCC",0xef,0xbb,0xbf) ) { $css = substr($css, 3); }
it translates to php as $string = preg_replace('/\x{EF}\x{BB}\x{BF}/','',$string); . before you use this, reconsider if you can't fix the problem at the source instead.
J
Jeffrey L Whitledge

I don't know PHP, so I don't know if this is possible, but the best solution would be to read the file as UTF-8 rather than some other encoding. The BOM is actually a ZERO WIDTH NO BREAK SPACE. This is whitespace, so if the file were being read in the correct encoding (UTF-8), then the BOM would be interpreted as whitespace and it would be ignored in the resulting CSS file.

Also, another advantage of reading the file in the correct encoding is that you don't have to worry about characters being misinterpreted. Your editor is telling you that the code page you want to save it in won't do all the characters that you need. If PHP is then reading the file in the incorrect encoding, then it is very likely that other characters besides the BOM are being silently misinterpreted. Use UTF-8 everywhere, and these problems disappear.


N
NickWebman

For me, this worked:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

If I remove this meta, the  appears again. Hope this helps someone...


P
Peter Mortensen

You can use

vim -e -c 'argdo set fileencoding=utf-8|set encoding=utf-8| set nobomb| wq'

Replacing with awk seems to work, but it is not in place.


S
Simone

grep -rl $'\xEF\xBB\xBF' * | xargs vim -e -c 'argdo set fileencoding=utf-8|set encoding=utf-8| set nobomb| wq'


Use grep -rlI $'\xEF\xBB\xBF' . to ignore binary files. And also . better then * here.
P
Peter Mortensen

I had the same problem with the BOM appearing in some of my PHP files ().

If you use PhpStorm you can set at hotkey to remove it in Settings -> IDE Settings -> Keymap -> Main Menu - > File -> Remove BOM.


C
Community

In Notepad++, choose the "Encoding" menu, then "Encode in UTF-8 without BOM". Then save.

See Stack Overflow question How to make Notepad to save text in UTF-8 without BOM?.


P
Problem Solved

Open the PHP file under question, in Notepad++.

Click on Encoding at the top and change from "Encoding in UTF-8 without BOM" to just "Encoding in UTF-8". Save and overwrite the file on your server.


s
stealthyninja

Same problem, different solution.

One line in the PHP file was printing out XML headers (which use the same begin/end tags as PHP). Looks like the code within these tags set the encoding, and was executed within PHP which resulted in the strange characters. Either way here's the solution:

# Original
$xml_string = "&lt;?xml version=\"1.0\" encoding=\"UTF-8\"?&gt;";

# fixed
$xml_string = "<" . "?xml version=\"1.0\" encoding=\"UTF-8\"?" . ">";

P
Peter Mortensen

If you need to be able to remove the BOM from UTF-8 encoded files, you first need to get hold of an editor that is aware of them.

I personally use E Text Editor.

In the bottom right, there are options for character encoding, including the BOM tag. Load your file, deselect Byte Order Marker if it is selected, resave, and it should be done.

Alt text http://oth4.com/encoding.png

E is not free, but there is a free trial, and it is an excellent editor (limited TextMate compatibility).


The image link is broken.
P
Peter Mortensen

You can open it by PhpStorm and right-click on your file and click on Remove BOM...


P
Peter Mortensen

Here is another good solution for the problem with BOM. These are two VBScript (.vbs) scripts.

One for finding the BOM in a file and one for KILLING the damned BOM in the file. It works pretty fine and is easy to use.

Just create a .vbs file, and paste the following code in it.

You can use the VBScript script simply by dragging and dropping the suspicious file onto the .vbs file. It will tell you if there is a BOM or not.

' Heiko Jendreck - personal helpdesk & webdesign
' http://www.phw-jendreck.de
' 2010.05.10 Vers 1.0
'
' find_BOM.vbs
' ====================
' Kleines Hilfsmittel, welches das BOM finden soll
'
 Const UTF8_BOM = ""
 Const UTF16BE_BOM = "þÿ"
 Const UTF16LE_BOM = "ÿþ"
 Const ForReading = 1
 Const ForWriting = 2
 Dim fso
 Set fso = WScript.CreateObject("Scripting.FileSystemObject")
 Dim f
 f = WScript.Arguments.Item(0)
 Dim t
 t = fso.OpenTextFile(f, ForReading).ReadAll
 If Left(t, 3) = UTF8_BOM Then
     MsgBox "UTF-8-BOM detected!"
 ElseIf Left(t, 2) = UTF16BE_BOM Then
     MsgBox "UTF-16-BOM (Big Endian) detected!"
 ElseIf Left(t, 2) = UTF16LE_BOM Then
     MsgBox "UTF-16-BOM (Little Endian) detected!"
 Else
     MsgBox "No BOM detected!"
 End If

If it tells you there is BOM, go and create the second .vbs file with the following code and drag the suspicios file onto the .vbs file.

' Heiko Jendreck - personal helpdesk & webdesign
' http://www.phw-jendreck.de
' 2010.05.10 Vers 1.0
'
' kill_BOM.vbs
' ====================
' Kleines Hilfmittel, welches das gefundene BOM löschen soll
'
Const UTF8_BOM = ""
Const ForReading = 1
Const ForWriting = 2
Dim fso
Set fso = WScript.CreateObject("Scripting.FileSystemObject")
Dim f
f = WScript.Arguments.Item(0)
Dim t
t = fso.OpenTextFile(f, ForReading).ReadAll
If Left(t, 3) = UTF8_BOM Then
    fso.OpenTextFile(f, ForWriting).Write (Mid(t, 4))
    MsgBox "BOM gelöscht!"
Else
    MsgBox "Kein UTF-8-BOM vorhanden!"
End If

The code is from Heiko Jendreck.


G
Guillaume Renoult

In PHPStorm, for multiple files and BOM not necessarily at the beginning of the file, you can search \x{FEFF} (Regular Expression) and replace with nothing.


j
jiminy

Same problem, but it only affected one file so I just created a blank file, copy/pasted the code from the original file to the new file, and then replaced the original file. Not fancy but it worked.


C
Community

Use Total Commander to search for all BOMed files:

Elegant way to search for UTF-8 files with BOM?

Open these files in some proper editor (that recognizes BOM) like Eclipse.

Change the file's encoding to ISO (right click, properties).

Cut  from the beginning of the file, save

Change the file's encoding back to UTF-8

...and do not even think about using n...d again!


S
SkaJess

I had the same problem. The problem was because one of my php files was in utf-8 (the most important, the configuaration file which is included in all php files).

In my case, I had 2 different solutions which worked for me :

First, I changed the Apache Configuration by using AddDefaultCharsetDirective in configuration files (or in .htaccess). This solution forces Apache to use the correct encodage.

AddDefaultCharset ISO-8859-1

The second solution was to change the bad encoding of the php file.


B
Benjamin

Copy the text of your filename.css file. Close your css file. Rename it filename2.css to avoid a filename clash. In MS Notepad or Wordpad, create a new file. Paste the text into it. Save it as filename.css, selecting UTF-8 from the encoding options. Upload filename.css.


X
XisUnknown

This works for me!

def removeBOMs(fileName):
     BOMs = ['',#Bytes as CP1252 characters
    'þÿ',
    'ÿþ',
    '^@^@þÿ',
    'ÿþ^@^@',
    '+/v',
    '÷dL',
    'Ýsfs',
    'Ýsfs',
    '^Nþÿ',
    'ûî(',
    '„1•3']
     inputFile = open(fileName, 'r')
     contents = inputFile.read()
     for BOM in BOMs:
         if not BOM in contents:#no BOM in the file...
             pass
         else:
             newContents = contents.replace(BOM,'', 1)
             newFile = open(fileName, 'w')
             newFile.write(newContents)
             return None

P
Peter Mortensen

Check on your index.php, find "... charset=iso-8859-1" and replace it with "... charset=utf-8".

Maybe it'll work.