Splitting large csv files – the CSV Splitter

Carrying on with the theme of data files and databases, I am going to talk a bit about .csv files. “csv” stands for Comma Separated Variables, and is a popular format that is supported by many spreadsheet and database applications. I work a lot with csv files, opening them in Excel to manipulate them, or saving my Excel or Access files into csv to import them into other programs, but recently I ran into a little trouble.

I obtained a large csv file that I needed to open in Excel. The file had a total of 300,000 rows, so every time I tried to open it in Excel I would get the error: “File not loaded completely”. Excel has a limit of being able to handle only 65,536 rows in each sheet (what a strange number!). So if you have a csv file that has more rows than that, you’re in a bit of trouble. This is the help message that Excel gives you:

This error is usually encountered when an attempt to open a file with more than 65,536 rows or 256 columns is made. Excel is limited to 65,536 rows of data and 256 columns per worksheet. You can have many worksheets with this number of rows and columns, but they are usually capable of fitting into one workbook (file). The number of worksheets you can have per workbook is limited only by the amount of available memory your system has. By default, Excel can manage 3 worksheets, more if there is available memory to support the quantity of data.
Truncation of rows or columns in excess of the limit is automatic and is not configurable. This issue can usually be remedied by opening the source file with a text editor, such as Microsoft Office Word, and then saving the file off into multiple files with row or column counts within the limits of an Excel worksheet. These files can then be opened or imported into Excel worksheets.
If you are using a data format that does not support use of a text editor, it may be easier to import the data into Microsoft Office Access and then use the export feature of Access to import the data to an Excel format. Other methods of importing large source material into multiple worksheets are available, but may be more complex than using either a text editor or Access.

I was not in the mood to open up my csv file and start copying and pasting, so off I went to my old friend Google, and after some searching I found a program that was written for the express purpose of splitting large csv files into smaller, more manageable ones, and it worked perfectly for me. I was able to split my csv file of 300,000 rows into 5 files of 60,000 rows each in a flash and then open each of these files individually in Excel and play with them to my heart’s content.

The program is easy to run. You click on the exe file, named, ofcourse, csvsplitter.exe; it opens a little dialog box asking you how many rows you want in the individual pieces. When making your selection, remember to keep it under 65,536 if you’re going to be working in Excel! The default value is 50,000. You can also specify the maximum number of pieces you want. Hit Browse to select your file, check your settings, and click on Split Now! and voila, out come your csv files split into manageable pieces!

It’s a handy little utility to have in your collection, and you can get it at the creator’s website.

To download the file, you will need to register an account in the forums (it’s free to register), and then navigate to the tools and utilities thread. Click here for the thread where you can download the file .

The creator of this tool has been kind enough to share it freely, so please respect his request and don’t mirror this download on any other sites or offer it for download anywhere else without the owner’s express permission. If anyone wants to download it please direct them to the FXFisherman website.

OK, that’s it for now… I will be talking about my PHP progress soon!!

Some database tools – phpMyAdmin and HeidiSQL

In my last post I talked about dealing with a .sql file from the command line. This worked well, but there are other ways to interact with your databases that are GUI based, and perhaps easier for someone accustomed to dealing with the GUI interface. The two that I am currently playing with are PHPMyAdmin and Heidi SQL. Both are free to download and use, keeping with my theme of using free tools and utilities whenever I possibly can.

phpMyAdmin is a tool written in PHP intended to handle the administration of MySQL over the Web. Currently it can create and drop databases, create/drop/alter tables, delete/edit/add fields, execute any SQL statement, manage keys on fields, manage privileges,export data into various formats and is available in 52 languages.

HeidiSQL is an easy-to-use interface and a "working-horse" for web-developers using the popular MySQL-Database. It allows you to manage and browse your databases and tables from an intuitive Windows® interface.

With HeidiSQL you will also be able to
* generate nice SQL-exports
* synchronize tables between two databases
* manage user-privileges
* import text-files
* export table-data as CSV, HTML and XML
* browse and edit table-data using a comfortable grid
* batch-insert ascii or binary files into tables
* write queries with syntax-highlighting (next version will have also code-completion)
* monitor and kill client-processes
* and much more

I have worked with phpMyAdmin before on the web but now have it installed on my laptop along with the Apache Server, MySQL and PHP. I have never used Heidi SQL before so it will be interesting to see what these tools have to offer in terms of working with and manipulating databases.

As I wrote in response to Bo’s comment to my tutorial on working with .sql dump files, phpMyAdmin has been known to fail when importing larger databases, and this is where the command line comes in handy. For some, working from the command line is just preferable to using a GUI. However, not all hosts allow you to have shell access to your account, so it’s a good thing to get accustomed to using both the command line and GUI tools.

One thing about installing phpMyAdmin with PHP5 is that it’s a bit of a chore getting it to talk to MySQL. As I discovered after a lot of head banging, googling, pounding my keyboard, and a swearing, this is primarily a PHP installation problem.

Since v.5 onwards, PHP does not come inbuilt with support for MySQL as a default anymore, therefore if you have trouble connecting to MySQL from phpMyAdmin at your initial installation, chances are that the problem lies in the fact that your PHP is not talking to MySQL. There’s a lot of information in different forums about this, and google is your best friend.

My solution was to delete my manual installation of PHP, and install PHP5 using the msi installer from php.net with its default settings. If you enable all the extensions you will have trouble since some of them require other software to be installed. Your best bet is to install it as it is (selecting your version of the server of course), and then add extensions. You do this by going to “Add/Remove Programs” in Control Panel, choose Change the installation, and enable MySQL, MySQLi, Mcrypt, and Multi-Byte String to run from the hard drive. This step ensures that you don’t need to edit your php.ini file as suggested in many forums, and which you would need to do if you were installing manually. You should also add the extension folder (usually C:/Program Files/PHP/ext) to the PATH. While you are there double check that your installation folder (C:/Program Files/PHP/) is also in the path. Remember that each time you make a change to your PHP install you need to restart your server (Apache in my case) to load the new configuration. And whenever you make changes to your PATH variables, ideally you need to restart your computer for the changes to take effect.

You can test if your PHP install is talking to MySQL by creating and running a PHP file such as the one below, provided kindly to me by Bo.

mysql_connect("localhost", "mysql_username", "mysql_password") or die(mysql_error());
echo "MySQL connection successful.
";
mysql_select_db("mysql_db_name") or die(mysql_error());
echo "Database connection successful.";
?>

(You will need to insert your own values for “mysql_username”, “mysql_password” and “mysql_db_name”).

If your install was successful, you should see something like

MySQL connection successful.
Database connection successful.

After PHP is installed correctly and works with MySQL, you can then install (or run) phpMyAdmin following the instructions given on the web page.

Now that I have finally successfully installed phpMyAdmin, I have all the tools that I need to start doing some serious stuff, and I am going to start writing some code in PHP and also interacting with databases using the three tools that I have at my disposal – phpMyAdmin, Heidi SQL, and the good old command line.

Working with a .sql dump file – a newbie tutorial

I have some .sql dump files that I wanted to open and play with. After googling and doodling, I finally accepted that I was going to have to overcome my innate need for orderly progression and jump into doing stuff with mySQL before actually getting to that chapter in the book!!

So here are the steps I used. I have mySQL Server 5.0 installed on my laptop, and I will go step by step to describe what I did. You will need to use your directory path and file names to do this yourself.

My sql dump file was named directory.sql and was in a folder on my desktop. My mySQL install is under C:\Program Files\MySQL\MySQL Server 5.0. My ultimate goal is to get the data in the sql dump file into csv format. I know there are probably easier ways to do it, and many tools, but I figured going step by step through the method I am going to describe was a good way to entertain myself and learn some MySQL in the process.

So let’s do it:

1. First, of course, I got myself a cold coke.

2. I opened the mySQL command line and entered my root password. I have found that it’s easier to actually work from the Command Line Client in the mySQL menu on my Start menu than it is to actually open the command line and try to navigate to the directory where mySQL is installed.

Start -> All Programs -> mySQL -> mySQL Server 5.0 -> MySQL Command Line Client

If you have set root password, which you always should, the client will ask for your password and then proceed to the mysql> prompt.

2. Next step was to create the table that the data in my sql file would go into. To do this, I first opened the sql dump file in Wordpad and examined it to determine what to name the table. Don’t use notepad for this because it will be much harder to read. If you have Textpad installed it’s even better, but I used Wordpad. I then typed in the command prompt:

mysql> CREATE DATABASE MaryDIR;

3. If this is done correctly with no typos, you will get a success statement to the effect:

Query OK, 1 row affected <0.00 sec>

4. Switch to the database you just created using the USE command.

mysql> USE MaryDir;

You should see the text

Database Changed

5. So far so good. Next step is to determine if your sql file already has the statements to create the tables you need or if you have to do it yourself. In my case, the statements were already there, so it was simply a matter of invoking the sql file to create and populate the tables. For this you use the SOURCE command. You can make it easy and move your sql file to your mySQL directory, but to make it interesting, let’s leave the sql file on the desktop and specify the path:

mysql> SOURCE C:\Documents and Settings\Mueni\Desktop\directory\directory.sql;

Remember you need to replace these paths with your own path and filename.

You will then see on the screen a series of lines as mySQL executes the queries to create and populate your tables.

You can see the tables that have been created using:

mysql> SHOW TABLES;

Once the tables are listed, you can check the details of any particular table using the DESCRIBE command,. So for one of my tables named caterers I typed:

mysql> DESCRIBE Caterers;

So I now have a mySQL database that I can use. The mySQL databases are created, by default, in the data directory. So under C:\Program Files\MySQL\MySQL Server 5.0\data there is a folder titled MaryDir.

If you want a sample file to play with you can download one from the mySQL website here: http://dev.mysql.com/doc/

Next I will explore different ways to view and interact with the database I just created.

From the Archives

Since I have been completely swamped with work and haven’t really had time to do any more PHP learning, I decided to write about a program I wrote ages ago in 2ooo. I was looking through my old files and came across the manual that I wrote, plus all the source code which I somehow still have :-).

The program was written for a solar energy project that I was involved in. For my Masters research project I built and monitored the performance of a Compound Parabolic Concentrator for solar panels in equatorial latitudes. I will be posting more about that on my physics page and will add a link here. The purpose of the concentrator was to increase the illumination levels on a solar panel, subsequently increasing its efficiency. For the duration of my project, I collected all the data manually, which sometimes involved sitting out in the sun for hours!! There was also an element of error introduced by this method because I had to read five different meters each time (in 15 minute intervals). I then applied calculations to this data, did my analysis, etc.

After my thesis was done and I passed (of course :-)), my supervisor and I decided to try to automate the data collection process for future projects. We bought and installed a Computer Boards I/O card and wired various components to make the system work. (Computer Boards was renamed Measurement Computing). I programmed the system and tested it, and afterwards wrote a complete User and Technical Manual for it. The system was programmed in Visual Basic 3.0 on a PC running Windows 95, which I still believe to be one of the most stable OS’s MS has produced.

Quoting from the User Manual:

The PVPM system comprises three main components; the CPC/PV system, the interfacing component and the software component. The CPC/PV system comprises a PV panel equipped with low-concentration parabolic reflectors. The interfacing component includes a constant voltage electronic bridge, an IO card and a PC. The software component is a program that collects and interprets data from the CPC/PV system via the interface.
The bridge for which this program was designed is a constant voltage bridge. This means that it keeps the voltage reading across the panel at a constant value so that only the current varies. Moreover, the bridge converts the current reading into a voltage reading and steps down the voltages so that they can be fed into the computer. Conversion factors are provided with the bridge to enable the user to convert the readings back to the actual voltage and current values. These conversion values are:

Current: Multiply by 14.8
Voltage: Multiply by 8.5

The IO card for this program is a Computer Boards card. The details concerning this card are comprehensively covered in the technical manuals supplied with the card.
The output from the card is in counts, which the PVPM program then converts to voltages before the bridge conversion factors are applied.
The card comes with its own installation software, which is used for configuration purposes. Libraries for different programming languages are also provided in the Universal Library diskette. For this case, the software for the data access has been written in Visual Basic 3.0. The program has been left in its raw form (not compiled as stand alone) so as to enable future modifications when necessary. The setback with this, however, is that the program can only be run in a PC that has the Visual Basic 3.0 program installed and all the necessary files must be manually loaded. Details of this are given in the Technical manual that accompanies this user manual.

The user manual gives a screen-by-screen description of how to use the PVPM software while the technical manual gives the source code for the software.

The manual was written in MS Word, and I created all the images and icons using MS Paint.

Here are some screen shots of the program interface:
Single Channel Mode
Multi Channel Mode
Loading Image
This last image is what loaded first when the program ran, much as a flash image would load now.

The plan was to transition from VB3.0 and recode the system using LabView, and we bought and installed the software, but I left the institution before that part of the project begun. I lost contact with the research group after some time and I’m not even sure if they recoded the system and if the project is still running.

User and Technical Manual for the PVPM System

The Next Step – Collecting my tools

So now I have my laptop all set up so that I can write and test php code. My next step was to collect the remaining tools I need to start, so first I had to decide what I’m going to use to write and edit my code. I’ve always liked Notepad for HTML coding because it doesn’t leave behind any codes or stuff. It’s just a plain clean text editor. But the disadvantage with Notepad is that a lot of debuggers will report errors by line number. Notepad doesn’t insert line numbers, and the last thing I want to do is start counting lines of code. So I caved in and decided to find a better editor that’s more suited for programming.

Several options were suggested to me, but I decided to go with two of them for now, because I have commitment issues and like to have options :). My choices are Notepad++ and PHP Designer 2007 Personal, both free. I also have Programmers Notepad that I’ve used before for other stuff, I might try it for PHP too – alternatives are a good thing.

Finally, before getting down and dirty, I need some kind of goal, a project. I learn best when I have something that I am building, so I picked a book that approaches the learning process by creating something. The book is Beginning PHP5, Apache, and MySQL Web Development (Programmer to Programmer), and I’m going to start with that and see if it works for me. I do have some of my own projects in mind, but they’re still wisps of ideas, so I’m going to borrow the author’s projects for now. There are other excellent books out there, but this is one that I grabbed off the shelf at the Free Library.