Parsing CSV files into a DataTable using Regular Expressions-C#

Sometimes, you find the need to parse CSV files dynamically into a DataTable. CSV files can be simple or complex. The simple CSV file would just have the fields separated by commas. The complexity comes in when the data itself has embedded commas, quotes, or line breaks. This solution, written in C#, and using the .NET Framework’s Regular Expression object, can handle such complex data. This has been tested to work with CSV formatted files saved from Microsoft Excel.

To begin with, make sure that you have the following namespaces:

using System.Data;
using System.IO;
using System.Text.RegularExpressions;
Next, we build a function that parses any CSV input string into a DataTable:

public static DataTable ParseCSV(string inputString)
{

    DataTable dt = new DataTable();
    // declare the Regular Expression that will match versus the input string
    Regex re = new
Regex("((?<field>[^\",\\r\\n]+)|\"(?<field>([^\"]|\"\")+)\")(,|(?<rowbreak>\\r\\n|\\n|$))");

    ArrayList colArray = new ArrayList();
    ArrayList rowArray = new ArrayList();

    int colCount = 0;
    int maxColCount = 0;
    string rowbreak = "";
    string field = "";

    MatchCollection mc = re.Matches(inputString);
    foreach (Match m in mc)
    {

        // retrieve the field and replace two double-quotes with a single double-quote
        field = m.Result("${field}").Replace("\"\"", "\"");

        rowbreak = m.Result("${rowbreak}");

        if (field.Length > 0)
        {
            colArray.Add(field);
            colCount++;
        }

        if (rowbreak.Length > 0)
        {

            // add the column array to the row Array List
            rowArray.Add(colArray.ToArray());

            // create a new Array List to hold the field values
            colArray = new ArrayList();

            if (colCount > maxColCount)
                maxColCount = colCount;

            colCount = 0;
        }
    }

    if (rowbreak.Length == 0)
    {
        // this is executed when the last line doesn't
        // end with a line break
        rowArray.Add(colArray.ToArray());
        if (colCount > maxColCount)
            maxColCount = colCount;
    }

    // create the columns for the table
    for (int i = 0; i < maxColCount; i++)
        dt.Columns.Add(String.Format("col{0:000}", i));
    // convert the row Array List into an Array object for easier access
    Array ra = rowArray.ToArray();
    for (int i = 0; i < ra.Length; i++)
    {

        // create a new DataRow
        DataRow dr = dt.NewRow();

        // convert the column Array List into an Array object for easier access
        Array ca = (Array)(ra.GetValue(i));

        // add each field into the new DataRow
        for (int j = 0; j < ca.Length; j++)
            dr[j] = ca.GetValue(j);

        // add the new DataRow to the DataTable
        dt.Rows.Add(dr);
    }

    // in case no data was parsed, create a single column
    if (dt.Columns.Count == 0)
        dt.Columns.Add("NoData");

    return dt;
}

Now that we have a parser for converting a string into a DataTable, all we need now is a function that will read the content from a CSV file and pass it to our ParseCSV function:

public DataTable ParseCSVFile(string path)
 {

     string inputString = "";

     // check that the file exists before opening it
     if (File.Exists(path))
     {

         StreamReader sr = new StreamReader(path);
         inputString = sr.ReadToEnd();
         sr.Close();

     }

     return ParseCSV(inputString);
 }

And now you can quickly fill a DataGrid with data coming off the CSV file:

protected System.Web.UI.WebControls.DataGrid DataGrid1;

private void Page_Load(object sender, System.EventArgs e) {
		
		// call the parser
		DataTable dt=ParseCSVFile(Server.MapPath("./demo.csv"));

		// bind the resulting DataTable to a DataGrid Web Control
		DataGrid1.DataSource=dt;
		DataGrid1.DataBind();
}
Next Post Previous Post
No Comment
Add Comment
comment url