ChatGPT解决这个技术问题 Extra ChatGPT

Importing CSV with line breaks in Excel 2007

I'm working on a feature to export search results to a CSV file to be opened in Excel. One of the fields is a free-text field, which may contain line breaks, commas, quotations, etc. In order to counteract this, I have wrapped the field in double quotes (").

However, when I import the data into Excel 2007, set the appropriate delimiter, and set the text qualifier to double quote, the line breaks are still creating new records at the line breaks, where I would expect to see the entire text field in a single cell.

I've also tried replacing CR/LF (\r\n) with just CR (\r), and again with just LF (\n), but no luck.

Has anyone else encountered this behavior, and if so, how did you fix it?

TIA, -J

EDIT: Here's a quick file I wrote by hand to duplicate the problem.

ID,Name,Description "12345","Smith, Joe","Hey. My name is Joe."

When I import this into Excel 2007, I end up with a header row, and two records. Note that the comma in "Smith, Joe" is being handled properly. It's just the line breaks that are causing problems.

I've looked at the CSV file in Notepad++, and everything appears to be correct. I have other fields with commas, and they are being imported properly. It's just the line breaks that are causing problems.
I have issues with UTF8 .csv files with multi-line data and excel. I ended up just uploading the file to Google Docs, opening it into a google sheet, then downloading as a .xls file. Works well for me this way.

J
J Ashley

Excel (at least in Office 2007 on XP) can behave differently depending on whether a CSV file is imported by opening it from the File->Open menu or by double-clicking on the file in Explorer.

I have a CSV file that is in UTF-8 encoding and contains newlines in some cells. If I open this file from Excel's File->Open menu, the "import CSV" wizard pops up and the file cannot be correctly imported: the newlines start a new row even when quoted. If I open this file by double-clicking on it in an Explorer window, then it opens correctly without the intervention of the wizard.


Any idea how to get the same settings as with double clicking?
It's true! How strange.
If you are using german regional settings you have to use semicolon (;) instead of comma (,) in your csv for the double click to work...
it did not work for me. With "," as delimiter it opened everything in one column with double-click. With ";" as delimiter it was imported correctly except for the multi-line text fields, which were imported as several records. I have Excel 2010
@user1859022 I double that for Hungarian locale. actually any locale that uses comma as decimal separator has to use semicolon as field separator for the double-click csv open to work properly
T
Tim Stack

None of the suggested solutions worked for me.

What actually works (with any encoding):

Copy/paste the data from the csv-file (open in a text editor), then perform "text to columns" --> data gets transformed incorrectly.

The next stap is to go to the nearest empty column or empty worksheet and copy/paste again (same thing what you already have in your clipboard) --> automagically works now.


In my case this worked, in a way: it correctly collapsed the CSV to the single records but removed all data in a field past the newline.
This worked, any ideas why it doesn't work when importing the csv from excel?
I can confirm that this works, you can even paste more data in different sheets without repeating the "text to columns" command. This is useful if you need to import several files.
Why does this work but neither opening the CSV or adding it as Text Data with all the proper settings not work? Thanks for the tip. The Copy/Paste team needs to talk to the data import team!
Holy shit. this really works. and it kinda makes sense why. When making a "text to columns" Excel remembers the settings and it will auto transform. When you have the text already separated into rows it will look row by row and ignore new lines. I think that MS should include a checkbox wether to keep the behavior or rescan the data. I don't care, ... if freaking works
k
ketil

If you are doing this manually, download LibreOffice and use LibreOffice Calc to import your CSV. It does a much better job of stuff like this than any version of Excel I've tried, and it can save to XLS or XLSX as required if you need to transfer to Excel afterwards.

But if you're stuck with Excel and need a better fix, there seems to be a way. It seems to be locale dependent (which seems idiotic, in my humble opinion). I don't have Excel 2007, but I have Excel 2010, and the example given:

ID,Name,Description
"12345","Smith, Joe","Hey.
My name is Joe."

doesn't work. I wrote it in Notepad and chose Save as..., and next to the Save button you can choose the encoding. I chose UTF-8 as suggested, but with no luck. Changing the commas to semicolons worked for me, though. I didn't change anything else, and it just worked. So I changed the example to look like this, and chose the UTF-8 encoding when saving in Notepad:

ID;Name;Description
"12345";"Smith, Joe";"Hey.
My name is Joe."

But there's a catch! The only way it works is if you double-click the CSV file to open it in Excel. If I try to import data from text and chose this CSV, then it still fails on quoted newlines.

But there's another catch! The working field separator (comma in the original example, semicolon in my case) seems to depend on the system's Regional Settings (set under Control Panel -> Region and Language). In Norway, comma is the decimal separator. Excel seems to avoid this character and prefer a semicolon instead. I have access to another computer set to UK English locale, and on that computer, the first example with a comma separator works fine (only on doubleclick), and the one with semicolon actually fails! So much for interoperability. If you want to publish this CSV online and users may have Excel, I guess you have to publish both versions and suggest that people check which file gives the correct number of rows.

So all the details that I've been able to gather to get this to work are:

The file must be saved as UTF-8 with a BOM, which is what Notepad does when you chose UTF-8. I tried UTF-8 without BOM (can be switched easily in Notepad++), but then double-clicking the document fails. You must use a comma or a semicolon separator, but not the one that is the decimal separator in your Regional Settings. Perhaps other characters work, but I don't know which. You must quote fields that contain a newline with the " character. I've used Windows line-endings (\r\n) both in the text field and as a record separator, that works. You must double-click the file to open it, importing data from text doesn't work.

Hope this helps someone.


Also, the trick mentioned by @sdplus seems to work! I think what happens is that when you first paste and do a "text to columns" maneuver, you're configuring the quoting and field separator stuff in Excel. The second time you paste, it uses this configuration, and splits the data correctly into columns based on the configuration. But this seems to be a very manual approach.
yes, each time you Import Text or do a Text to Column, you recalibrate how copy/paste will work in the given session. it is even applied to new workbooks you create, until you close Excel. it can be frustrating, too. once you use a given separator for import, it will separate your text by that even if you just want to paste a sentence in a cell. you have to redo import with tab as a separator, or restart Excel to stop it.
Your trick really seems to work. But it looks like the semicolon has nothing to do with the solution. The problem is, that Excel treats CSV files differently, depending on regional settings. I'm from Germany, and for me CSV files from Excel always have semicolons instead of commas (the reason for this is, that in Germany the decimal seperator is comma instead of point). The real solution seems to be, that Excel loads CSV files totally different than all other text files. So CSV files that contains line breaks in between quotations seems to work. All other text files don’t.
@Martini, yes, I have Norwegian Excel and we also use comma as the decimal separator, so I've mentioned how this depends on the regional settings (though I referred to it as the locale). Perhaps I should rephrase for clarity.
This is the answer for all people in regions where comma is the decimal separator. Note that for these regions, Excel also uses semicolon as the formula argument separator (=FOO(1;2) instead of =FOO(1,2)), but clearly it is incorrect that Excel applies this to a file format parser (which other program parses a standard file format dependent on the locale???)
j
jeremyalan

I have finally found the problem!

It turns out that we were writing the file using Unicode encoding, rather than ASCII or UTF-8. Changing the encoding on the FileStream seems to solve the problem.

Thanks everyone for all your suggestions!


ASCII encoding didn't seem to fix the issue for me (on MacOS though), and I don't have a leading space and my field is quoted. The exact same doc imports fine in Google Docs. How frustrating. BTW, there is no such thing as a "Unicode" encoded text file. It has to be one of the implementations of Unicode (UTF-8, UTF-16, UTF-32, etc.)
Thanks for the solution. I was still curious what the answer is so I tried creating a csv with a line break in Excel and seeing what it saved. I turns out Excel uses only a line feed for a new line in a cell. If I try to create the same csv in Notepad, it will use a line feed + carriage return for the line break. So for line breaks in a single cell, make sure it's only using a line feed (LF or \n) and not a carriage return (CR or \r). Excel does use both to terminate a row.
ASCII encoding didn't fix the issue for me either - Excel 2000, Windows 7.
For OS X on Macintosh, save as "Windows Comma Separated (csv)". This adds newlines instead of line breaks. It wil be listed in the drop down menu for formats under "Specialty Formats".
Which Unicode encoding should be used (UTF-8, UTF-16) ?
M
Mazzy

Use Google Sheets and import the CSV file.

Then you can export that to use in Excel


Good tip! This is the most convenient conversion method if you are ok with uploading your CSV to a third party service (i.e. non-confidential data). Note that you may have to manually set the delimiter at importing. And you may need to adjust the cell size in the resulting Excel file for it to display correctly.
Also works with Excel in Office 365 in a browser. I could not properly open a CSV with line breaks inside of cells with the desktop Excel application (trying most suggestions from this page), but Excel on office.com could properly open it.
r
robotik

Short Answer

Remove the newline/linefeed characters (\n with Notepad++). Excel will still recognise the carriage return character (\r) to separate records.

Long Answer

As mentioned newline characters are supported inside CSV fields but Excel doesn't always handle them gracefully. I faced a similar issue with a third party CSV that possibly had encoding issues but didn't improve with encoding changes.

What worked for me was removing all newline characters (\n). This has the effect of collapsing fields to a single record assuming that your records are separated by the combination of a carriage return and a newline (CR/LF). Excel will then properly import the file and recognise new records by the carriage return.

Obviously a cleaner solution is to first replace the real newlines (\r\n) with a temporary character combination, replacing the newlines (\n) with your seperating character of choice (e.g. comma in a semicolon file) and then replacing the temporary characters with proper newlines again.


I had the opposite situation: \n between lines and \r\n inside values. Just stripped the latter in Notepad++.
I tried both and neither worked on Office pro plus 2013
J
Jeremy

If the field contains a leading space, Excel ignores the double quote as a text qualifier. The solution is to eliminate leading spaces between the comma (field separator) and double-quote. For example:

Broken: Name,Title,Description "John", "Mr.", "My detailed description"

Working: Name,Title,Description "John","Mr.","My detailed description"


I agree, however, I don't have any leading spaces in my output. Any ideas?
we need the line broken :(
R
Rock Rico

If anyone stumbling across this thread and is looking for a definitive answer here goes (credit to the person mentioning LibreOffice:

1) Install LibreOffice 2) Open Calc and import file 3) My txt file had the fields separated by , and character fields enclosed in " 4) save as ODS file 5) Open ODS file in Excel 6) Save as .xls(x) 7) Done. 8) This worked perfectly for me and saved me BIGTIME!


no need to save as ODS, LibreOffice can save xls(x) natively
P
Pikamander2

+1 on J Ashley's comment. I ran into this problem also. It turns out that Excel requires:

A newline character("\n") in the quoted string

A carriage return and newline between each row.

E.g.

"Test", "Multiline item\n
multiline item"\r\n
"Test2", "Multiline item\n
multiline item"\r\n

I used notepad ++ to delimit each row properly and to only use newlines in the string. Discovered this by creating multiline entries in a blank excel doc and opening the csv in notepad ++.


it worked for me with only newline character as both a multiline item and a row separator, once i set the field separator according to my locale
note: this didn't work in office pro plus 2013 I suspect different versions had different handling of this
A
Aaron Dake

Paste into Notepad++, select Encoding > Encode in ANSI, copy all again and paste into Excel :)


D
Dibs

I had a similar problem. I had some twitter data in MySQL. The data had Line feed( LF or \n) with in the data. I had a requirement of exporting the MySQL data into excel. The LF was messing up my import of csv file. So I did the following -

1. From MySQL exported to CSV with Record separator as CRLF
2. Opened the data in notepad++ 
3. Replaced CRLF (\r\n) with some string I am not expecting in the Data. I used ###~###! as replacement of CRLF
4. Replaced LF (\n) with Space
5. Replaced ###~###! with \r\n, so my record separator are back.
6. Saved and then imported into Excel

NOTE- While replacing CRLF or LF dont forget to Check Excended (\n,\r,\t... Checkbox [look at the left hand bottom of the Dialog Box)


K
Kirby

My experience with Excel 2010 on WinXP with French regional settings

the separator of your imported csv must correspond to the list separator of your regional settings (; in my case)

you must double click on the file from the explorer. don't open it from Excel


m
m000

Overview

Almost 10 years after the original post, Excel hasn't improved in importing CSV files. However, I found that it is much better in importing HTML tables. So, one can use Python to convert CSV to HTML and then import the resulting HTML to Excel.

The advantages of this approach are: (a) it works reliably, (b) you don't need to send your data to a third party service (e.g. Google sheets), (c) no extra "fat" installations required (LibreOffice, Numbers etc.) for most users, (d) higher level than meddling with CR/LF characters and BOM markers, (e) no need to fiddle with locale settings.

Steps

The following steps can be run on any bash-like shell as long as Python 3 is installed. Although Python can be used to directly read CSV, csvkit is used to do an intermediate conversion to JSON. This allows us to avoid having to deal with CSV intricacies in our Python code.

First, save the following script as json2html.py. The script reads a JSON file from stdin and dumps it as an HTML table:

#!/usr/bin/env python3
import sys, json, html

if __name__ == '__main__':
    header_emitted = False
    make_th = lambda s: "<th>%s</th>" % (html.escape(s if s else ""))
    make_td = lambda s: "<td>%s</td>" % (html.escape(s if s else ""))
    make_tr = lambda l, make_cell: "<tr>%s</tr>" % ( "".join([make_cell(v) for v in l]) )
    print("<html><body>\n<table>")
    for line in json.load(sys.stdin):
        lk, lv = zip(*line.items())
        if not header_emitted:
            print(make_tr(lk, make_th))
            header_emitted = True
        print(make_tr(lv, make_td))
    print("</table\n</body></html>")

Then, install csvkit in a virtual environment and use csvjson to feed the input file to our script. It is a good idea to disable cell type guessing with the -I argument:

$ virtualenv -p python3 pyenv
$ . ./pyenv/bin/activate
$ pip install csvkit
$ csvjson -I input.csv | python3 json2html.py > output.html

Now output.html can be imported in Excel. Line breaks in cells will have been preserved.

Optionally, you may want to cleanup your Python virtual environment:

$ deactivate
$ rm -rf pyenv

T
Tim

On MacOS try using Numbers

If you have access to Mac OS I have found that the Apple spreadsheet Numbers does a good job of unpicking a complex multi-line CSV file that Excel could not handle. Just open the .csv with Numbers and then export to Excel.


u
undefined

Excel is incredibly broken when dealing with CSVs. LibreOffice does a much better job. So, I found out that:

The file must be encoded in UTF-8 with BOM, so consider this for all the points below

The best result, by far, is achieved by opening it from File Explorer

If you open it from within Excel there are two possible outcomes: If it has only ASCII characters, it will most likely work If it has non-ASCII characters, it will mess your line breaks

If it has only ASCII characters, it will most likely work

If it has non-ASCII characters, it will mess your line breaks

It seems to be heavily dependent on the decimal separator configured in the OS's regional settings, so you have to select the right one

I would bet that it may also behave differently depending on OS and Office version


You're asserting LibreOffice is a better guesser than Excel, right? Excel asks all the right questions when importing text files, unless you tell it to guess.
Thank You! It helps me to convert my CSV from "UTF-8 without BOM" to "UTF-8 with BOM" (just simple "UTF-8" in menu) by Notepad++. Then I just opened it from Explorer and Excel showed it in proper way with correct symbols and correct line breaks inside cells. When I opened it in default "UTF-8 without BOM" encoding from Explorer Excel imported line breaks correctly, but displays non-latin symbols in wrong way. If I opened it from Excel, it showed encoding in right way, but didn't cope with line breaks.
S
SaSH_17

This is for Excel 2016:

Just had the same problem with line breaks inside a csv file with the Excel Wizard.

Afterwards I was trying it with the "New Query" Feature: Data -> New Query -> From File -> From CSV -> Choose the File -> Import -> Load

It was working perfectly and a very quick workaround for all of you that have the same problem.


I test but with a tab seperator and line feed inside cells : don't seem to work (file is OK on LibreOffice and GFoogle doc).The line feed inside cell goes to next line …
M
M-Peror

With Excel 2019 I had a similar problem when working with CSV files via Data -> Import from text file / CSV. Once the connection is made and the data is synced, it reported xx error(s) because of shifted columns caused by the line breaks.

I managed to solve this by

Edit the query (Query -> Edit) This opens the Power Query Editor Go to Start -> Advanced Editor This opens the query in text format, where line #2 had an instruction like Source = Csv.Document(File.Contents("my.csv"),[Delimiter=",", .... , QuoteStyle=QuoteStyle.None]), Change QuoteStyle.None to QuoteStyle.Csv Click Finish Apply & close

Documentation found here: https://docs.microsoft.com/en-us/powerquery-m/csv-document

NB. I since found where this is "hidden" in the UI. In the Power Query-editor, click Data source settings, Change source (bottom left), and the Line breaks combo should say Ignore line breaks between quotes.

NB2. Working from Dutch Excel here so my above-mentioned translations of button captions etc. may be a little off.


M
Martin

What just worked for me, importing into Excel directly provided that the import is done as a text format instead as csv format. M/


d
depassage

just create a new sheet with cells with linebreak, save it to csv then open it with an editor that can show the end of line characters (like notepad++). By doing that you will notice that a linebreak in a cell is coded with LF while a "real" end of line is code with CR LF. Voilà, now you know how to generate a "correct" csv file for excel.


u
user3861859

I also had this problem: ie., csv files (comma delimited, double quote delimited strings) with LF in quoted strings. These were downloaded Square files. I did a data import but instead of importing as text files, imported as "from HTML". This time it ignored the LF's in the quoted strings.


2
2003G35

This worked on Mac, using csv and opening the file in Excel.

Using python to write the csv file.

data= '"first line of cell a1\r 2nd line in cell a1\r 3rd line in cell a1","cell b1","1st line in cell c1\r 2nd line in cell c1"\n"first line in cell a2"\n'

file.write(data)


a
adax2000

In my case opening CSV in notepad++ and adding SEP="," as the first line allows me open CSV with line breaks and utf-8 in Excel without issues


I
Ionut

Replace the separator with TAB(\t) instead of comma(,). Then open the file in your editor (Notepad etc.), copy the content from there, then paste it in the Excel file.


Try this on large files :)
D
David Avikasis

Line breaks inside double quotes are perfectly fine according to CSV standard. The parsing of line breaks in Excel depends on the OS setting of list separator:

Windows: you need to set the list seperator to comma (Region and language » Formats » Advanced) Source: https://superuser.com/questions/238944/how-to-force-excel-to-open-csv-files-with-data-arranged-in-columns#answer-633302 Mac: Need to change the region to US (then to manually change back other settings to your preference) Source: https://answers.microsoft.com/en-us/mac/forum/macoffice2016-macexcel/line-separator-comma-semicolon-in-excel-2016-for/7db1b1a0-0300-44ba-ab9b-35d1c40159c6 (see NewmanLee's answer)

Don't forget to close Excel completely before trying again.

I've succesfully replicated the issue and was able to fix it using the above in both Max and Windows.


I don't think this works. I exported a CSV with line breaks in cells from Excel itself. Since locale didn't change, Excel should have been able to load it correctly. But it can't. It still messes with the line breaks in cells.