I recently wrote a routine to export a client’s newsletter subscribers as a CSV file. There are a little over 7000 records in the database now and that is expected to increase at a steady pace. I have been wanting a good reason to reach into Java and do some fancy string manipulation, as I’ve read it can be quite a bit faster than in native CFML. So this looked like a perfect chance to try it. Here’s my first take in plain old CF. I had some additional quote and comma checking in there but I’ve left it out for readability.

<cfset csvstr = "Email Address,First Name,Last Name" & Chr(13) & Chr(10)>
<cfloop query="qrycowusers">
	<cfset csvstr = csvstr & email & ",">

	<cfset csvstr = csvstr & fname & ",">

	<cfset csvstr = csvstr & lname>

	<cfset csvstr = csvstr & Chr(13) & Chr(10)>
</cfloop>

I pushed this code to production and ran it on the larger, production database. It took just over 70 seconds!. Thats a long time for the client to be waiting for their CSV file. Next I rewrote this using Java’s StringBuffer class.

<cfset csvstr = createObject("java","java.lang.StringBuffer")>
<cfset csvstr.append("Email Address,First Name,Last Name" & Chr(13) & Chr(10) )>

<cfloop query="qrycowusers">
	<cfset csvstr.append(email & ",")>

	<cfset csvstr.append(fname & ",")>

	<cfset csvstr.append(lname)>

	<cfset csvstr.append(Chr(13) & Chr(10))>
</cfloop>

I copied this to production and ran it. It took less than 1 second. I ran both versions many times to be sure, and the results were consistent. I was expecting it to be faster, but not that much faster. I think the CF version may have been so extremely slow because I was doing 4 separate cfsets there. If I did it all with one it may have been a good deal faster. But I wanted to have a cfif around each set that checked for the existence of a comma, and if it found one to surround the value with double quotes.

The reason the Java way is faster is because ColdFusion strings are based on the Java String object. Java strings are immutable, that means they can’t be modified. So every time you do something like mystr = mystr & "appended content", ColdFusion is creating a new String object. As you can imagine, this can be very slow once to get into a something that ends up creating a lot of String objects.

I did run into one snag that took a minute to work around. I wanted to output this chunk of csv data to the browser as a downloadable file. This is usually done by by writing the data to a file, then using cfcontent to serve it. But I wanted to skip the writing to disk step. New in CFMX 7 was the “variable” attribute of cfcontent. That allows you to pass data to cfcontent and it will serve it to the browser with whatever mime type you specify. This is great for dynamically creating an image (with cfchart for example), then serving it to the browser. But it only works with binary data. Ok – we can work with that. So my pure ColdFusion version used this:

<cfheader name="Content-Disposition"
value="attachment; filename=ReportUsersExport.csv">
<cfcontent type="application/msexcel"
variable="#ToBinary( ToBase64(csvstr) )#" reset="Yes">

But that won’t work with our StringBuffer object – CF will throw an error about not being able to convert that object to binary. The solution is to use the StringBuffer’s toString() method to turn it into what ColdFusion is expecting:

<cfheader name="Content-Disposition"
value="attachment; filename=ReportUsersExport.csv">
<cfcontent type="application/msexcel"
variable="#ToBinary( ToBase64(csvstr.toString()) )#" reset="Yes">

For small amounts of data, this Java hybrid is of no consequence. On my development box with 5 records, the two files took almost the exact same amount of time. In fact the Java version might even have been slower. But when you get to a larger record set, this can save you a lot of time.

12 Comments

  1. Qasim Rasheed says:

    I developed a UDF that can convert ColdFusion Query to CSV using Java StringBuffer that you can find here.

    http://www.cflib.org/udf.cfm?ID=1197

    Thanks for an informative post.

  2. Axel says:

    That is good stuff ryan, I really like your writings

  3. Ed says:

    Good post, thanks! You can also access Java Stringbuffers natively in CF using string concatenation via cfsavecontent instead of ‘&’-driven string concatenation.

  4. tony petruzzi says:

    i remember when I first started using stringbuffer and was floored with how much performance I gained.

    @ed

    Give us an example. I’ve tried using cfsavecontent both and while it did give a performance increase it was no where’s near what stringbuffer gave.

  5. James Moberg says:

    I believe a simple cfsavecontent example would look like this (without quote and comma checking):

    Email Address,First Name,Last Name
    #email#,#fname#,#lname#

    How does this method compare to using java stringbuffer?

    On a recent project, I had to be able to export 100,000’s of records. Using CFMX, MSSQL and BCP.exe (via CFEXECUTE), the CSV export was almost instantaneous compared to trying to generate the export using CFMX or java (which was timing out and consuming too much RAM.)

    BCP (Bulk Copy Program) for Microsoft SQL Server
    http://msdn2.microsoft.com/en-us/library/aa174646(SQL.80).aspx

  6. James Moberg says:

    Sorry… it didn’t keep the HTML and the reply form doesn’t offer the ability to preview a post (like Slashdot.org does) or automatically escape HTML using HTMLEditFormat()-like function.

    <cfsavecontent variable=”csvstr”>Email Address,First Name,Last Name<cfoutput query=”qrycowusers”>
    #email#,#fname#,#lname#</cfoutput></cfsavecontent>

  7. Michael Sharman says:

    Hi Ryan,

    If you method chain the Init() method when creating your StringBuffer object, you don’t need to use ToString() when outputting your variable.

    i.e.

    csvstr = createObject(“java”,”java.lang.StringBuffer”).init();

    Great post, keep em’ coming!

  8. Ryan Stille says:

    Thats interesting Michael, can you explain why that works?

  9. Mayoorathen says:

    Well this code works with only CF 7. Can any suggest me a code will work with CF 6.1?

    Thanks

  10. Alex Ismail says:

    Great post, which covered exactly what I needed.

    Could you share how to handle fields that contain commas with the Java class solution?

    Thank you very much,

    Alex

  11. Ryan Stille says:

    Alex, if the value contains a comma you need to enclose the whole string in double quotes (that is the most common way of handling it).  I left this out of the example code for clarity.

    For example you might change this line above:
    <cfset csvstr.append(fname & ",")>

    To do this instead:
    <cfif Find("," , fname)>
        <cfset csvstr.append("""#fname#""" & ",")>
    <cfelse>
        <cfset csvstr.append(fname & ",")>
    </cfif>

  12. Alex Ismail says:

    Thank you, Ryan.

    Meanwhile I found this version, which works too:

    <cfset csvstr.append(Chr(34) & FirstName & Chr(34) & ",")>