Splitting large csv files – the CSV Splitter
Carrying on with the theme of data files and databases, I am going to talk a bit about .csv files. “csv” stands for Comma Separated Variables, and is a popular format that is supported by many spreadsheet and database applications. I work a lot with csv files, opening them in Excel to manipulate them, or saving my Excel or Access files into csv to import them into other programs, but recently I ran into a little trouble.
I obtained a large csv file that I needed to open in Excel. The file had a total of 300,000 rows, so every time I tried to open it in Excel I would get the error: “File not loaded completely”. Excel has a limit of being able to handle only 65,536 rows in each sheet (what a strange number!). So if you have a csv file that has more rows than that, you’re in a bit of trouble. This is the help message that Excel gives you:
This error is usually encountered when an attempt to open a file with more than 65,536 rows or 256 columns is made. Excel is limited to 65,536 rows of data and 256 columns per worksheet. You can have many worksheets with this number of rows and columns, but they are usually capable of fitting into one workbook (file). The number of worksheets you can have per workbook is limited only by the amount of available memory your system has. By default, Excel can manage 3 worksheets, more if there is available memory to support the quantity of data.
Truncation of rows or columns in excess of the limit is automatic and is not configurable. This issue can usually be remedied by opening the source file with a text editor, such as Microsoft Office Word, and then saving the file off into multiple files with row or column counts within the limits of an Excel worksheet. These files can then be opened or imported into Excel worksheets.
If you are using a data format that does not support use of a text editor, it may be easier to import the data into Microsoft Office Access and then use the export feature of Access to import the data to an Excel format. Other methods of importing large source material into multiple worksheets are available, but may be more complex than using either a text editor or Access.
I was not in the mood to open up my csv file and start copying and pasting, so off I went to my old friend Google, and after some searching I found a program that was written for the express purpose of splitting large csv files into smaller, more manageable ones, and it worked perfectly for me. I was able to split my csv file of 300,000 rows into 5 files of 60,000 rows each in a flash and then open each of these files individually in Excel and play with them to my heart’s content.
The program is easy to run. You click on the exe file, named, ofcourse, csvsplitter.exe; it opens a little dialog box asking you how many rows you want in the individual pieces. When making your selection, remember to keep it under 65,536 if you’re going to be working in Excel! The default value is 50,000. You can also specify the maximum number of pieces you want. Hit Browse to select your file, check your settings, and click on Split Now! and voila, out come your csv files split into manageable pieces!
It’s a handy little utility to have in your collection, and you can get it at the creator’s website.
To download the file, you will need to register an account in the forums (it’s free to register), and then navigate to the tools and utilities thread. Click here for the thread where you can download the file .
The creator of this tool has been kind enough to share it freely, so please respect his request and don’t mirror this download on any other sites or offer it for download anywhere else without the owner’s express permission. If anyone wants to download it please direct them to the FXFisherman website.
OK, that’s it for now… I will be talking about my PHP progress soon!!