Invalid CSV Escaping & PHP Serialization in form 'Form Fill' Activities in Bulk Activity Export | Community
Skip to main content
Level 2
July 15, 2025
Solved

Invalid CSV Escaping & PHP Serialization in form 'Form Fill' Activities in Bulk Activity Export

  • July 15, 2025
  • 2 replies
  • 1483 views

⚠️  This post has been edited by a moderator for accuracy.

 

When exporting Marketo Activity, specifically Form Fill activities, we have encountered two issues that cause problems when trying to reliably parse the exported CSV file.

 

1. 'checksumFields' Value Contains Unescaped Commas

 

Inside the attributes field, there is a Form Fields entry. This contains a checksumFields value which is a comma-separated list of field names. However, this is not properly escaped for CSV (RFC 4180). As a result:

  • Commas are interpreted as field delimiters.

  • CSV parsers break and produce misaligned columns.

  • Downstream systems cannot reliably import these exports.

Example:

"checksumFields":"FirstName,LastName,Title,Company,Email"

 Because of this, CSV readers see FirstName, LastName, etc., as separate fields.

 

2. 'Form Fields' is PHP Serialized, Not JSON

 

Unlike the rest of the attributes object (which uses JSON-like syntax), the Form Fields value is PHP serialized.
Example snippet:

""Form Fields"":""a:11:{s:6:\"module\";s:11:\"leadCapture\";s:6:\"action\";s:5:\"save2\";s:9:\"FirstName\";s:4:\"John\";s:8:\"LastName\";s:5:\"Doe\";s:5:\"Title\";s:7:\"Manager\";s:7:\"Company\";s:9:\"ExampleCo\";s:5:\"Email\";s:24:\"john.doe@example.com\";s:6:\"formid\";s:4:\"1628\";s:10:\"munchkinId\";s:11:\"000-XXX-000\";s:8:\"_mkt_trk\";s:53:\"id:000-XXX-000&token:_mch-example.com-1719224514646-57494\";s:7:\"formVid\";s:4:\"1628\";s:13:\"_mktoReferrer\";s:58:\"https://www.example.com/en/events/example-event\";s:14:\"checksumFields\";s:69:\"FirstName,LastName,Title,Company,Email,formid,munchkinId,_mkt_trk,formVid,_mktoReferrer\";s:8:\"checksum\";s:64:\"abc123def456ghi789jkl012mno345pqr678stu901vwx234yz567abc890def\";}""

Problems with this approach:

  • PHP serialization is not universally readable outside PHP.

  • Serialization outputs : and ; characters which don't interact well with CSV exports.

  • It introduces complexity for developers working in other languages.

  • It’s inconsistent with the surrounding JSON-style attributes.

 

Impact on Standard CSV Parsing

 

The current CSV export is not RFC 4180 compliant due to the unescaped commas within the 'checksumFields' data. This causes failures with any standard CSV parsing tools, which interpret these commas as new columns. We are saving the export exactly as provided by Marketo via the Bulk Activity Extract API — the issue exists in the raw export data.

 

To work around this, we have switched to using TSV exports for Activity Extracts, which avoid this issue because tabs do not appear within the serialized data.

Ideally, Marketo should ensure CSV exports follow the RFC 4180 standard by escaping commas correctly. Additionally, using JSON encoding for 'Form Fields' would bring consistency and improve interoperability.

 

 

I’m sharing this in the hopes it helps others encountering the same issues when working with these exports.

This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.
Best answer by SanfordWhiteman

You’re mistaken about RFC 4180 validation. The CSV is RFC-compliant and a proper CSV parser has no problem with it.

 

The attributes field is wrapped in unescaped double quotation marks. It’s not mandatory for a field to be wrapped in dquotes unless it contains characters that would create a delimiter collision. In this case, attributes absolutely does contain such characters, e.g. commas. That’s why it’s wrapped. Once it’s wrapped, the internal commas have no special meaning. It’s not true that “commas are interpreted as field delimiters.” 

 

attributes is a JSON-stringified object. JSON itself contains both unescaped dquotes (") and JSON-escaped dquotes (\"). In both cases the dquote is doubled to "" because that’s how you CSV-escape dquotes. As they’re now CSV-escaped, they do not function as delimiters; they’re the same as any other character. Likewise, as noted above, the commas within the value do not function as delimiters.

 

Note even if attributes were a simple string, all the above would still apply. It so happens that attributes is a JSON object that when parsed, has properties that need to be PHP-unserialized. The PHP-inside-JSON may be distracting, but it doesn’t change the fundamentals: the commas are not interpreted as field delimiters unless your CSV parser is badly broken.

2 replies

SanfordWhiteman
SanfordWhitemanAccepted solution
Level 10
July 15, 2025

You’re mistaken about RFC 4180 validation. The CSV is RFC-compliant and a proper CSV parser has no problem with it.

 

The attributes field is wrapped in unescaped double quotation marks. It’s not mandatory for a field to be wrapped in dquotes unless it contains characters that would create a delimiter collision. In this case, attributes absolutely does contain such characters, e.g. commas. That’s why it’s wrapped. Once it’s wrapped, the internal commas have no special meaning. It’s not true that “commas are interpreted as field delimiters.” 

 

attributes is a JSON-stringified object. JSON itself contains both unescaped dquotes (") and JSON-escaped dquotes (\"). In both cases the dquote is doubled to "" because that’s how you CSV-escape dquotes. As they’re now CSV-escaped, they do not function as delimiters; they’re the same as any other character. Likewise, as noted above, the commas within the value do not function as delimiters.

 

Note even if attributes were a simple string, all the above would still apply. It so happens that attributes is a JSON object that when parsed, has properties that need to be PHP-unserialized. The PHP-inside-JSON may be distracting, but it doesn’t change the fundamentals: the commas are not interpreted as field delimiters unless your CSV parser is badly broken.

Level 2
July 16, 2025

Thanks very much for clarifying — you're absolutely right regarding RFC 4180. I appreciate the correction, and I can see now that I was incorrect on that point. The CSV structure itself is compliant, and I agree that commas inside quoted fields are, in principle, not a problem for a compliant parser.

 

What prompted my original post was the practical issue we encountered when using some standard CSV parsers — notably PHP’s native fgetcsv() and SplFileObject in CSV mode. In our case, rows containing checksumFields inside the Form Fields serialized data led to the parser treating the comma-separated values as separate columns, which caused headers like Email and FirstName to show up unexpectedly as subsequent column values instead of staying within the 'attributes' field. It seems this behaviour arose due to how the inner escaping interacts with parser expectations, despite the CSV being technically valid. I also tried a few other CSV parsing libraries, with similar results (clearly not the right ones!).

 

To clarify, this was using the CSV data exactly as provided by Marketo via the Bulk Activity Extract API — we saved the response directly to a CSV file with no modification and then passed that file into the parser. The problem wasn’t introduced by any intermediate handling.

 

All other activity types have been parsing perfectly with our processes for some time now; it’s only when we started to include Form Fill activity data that the formatting caused our export processing jobs to start failing when such records were present. In my search for a solution, I was incorrectly led to believe that the serialization itself was making the CSV non-compliant. Switching to TSV resolved this issue for us because there’s no delimiter conflict — it’s been a simple and practical solution when dealing with these complex serialized fields.

 

Interestingly, opening the same CSV in Excel works without issue, which reinforces your point that the export conforms to CSV standards.

 

I’m not sure of the benefits of using PHP serialization here in this context, but it definitely adds complexity for anyone trying to work with this data. If this field were JSON-encoded like the rest of the attributes, this wouldn’t really be an issue at all.

SanfordWhiteman
Level 10
July 16, 2025

What prompted my original post was the practical issue we encountered when using some standard CSV parsers — notably PHP’s native fgetcsv() and SplFileObject in CSV mode. In our case, rows containing checksumFields inside the Form Fields serialized data led to the parser treating the comma-separated values as separate columns, which caused headers like Email and FirstName to show up unexpectedly as subsequent column values instead of staying within the 'attributes' field. It seems this behaviour arose due to how the inner escaping interacts with parser expectations, despite the CSV being technically valid. I also tried a few other CSV parsing libraries, with similar results (clearly not the right ones!).

I found it hard to believe a function as venerable as PHP’s fgetcsv() didn’t understand basic CSV escaping.

 

So I put together this bit of PHP that shows it doesn’t have a problem with the Filled Out Form export, provided you pass the right arguments of course:

<?php $row = -1; $header; if (($handle = fopen("sample_export.csv", "r")) !== FALSE) { while (($data = fgetcsv($handle, null, ",", "\"", "")) !== FALSE) { if(++$row == 0){ $header = $data; continue; } $rowObj = array_combine(array_values($header), array_values($data)); $attributes = json_decode($rowObj["attributes"]); $formFields = unserialize($attributes->{"Form Fields"}); echo "Form fields for data row ${row}" . PHP_EOL; echo str_repeat("=",80) . PHP_EOL; var_export($formFields); echo PHP_EOL . PHP_EOL; } fclose($handle); }

 

Sample output:

Form fields for data row 1 ================================================================================ array ( 'module' => 'leadCapture', 'action' => 'save2', 'Unsubscribed' => 'yes', 'Email' => 'test1@example.com', 'formid' => '1357', 'lpId' => '219', 'subId' => '203', 'munchkinId' => '410-XOR-673', 'lpurl' => '//pages.vaneck.com/AUUnsubscribe.html?cr={creative}&kw={keyword}', 'cr' => '', 'kw' => '', 'q' => '', '_mkt_trk' => 'id:410-XOR-673&token:_mch-vaneck.com.au-m9Nu6QH22eHRTnpEdjMP79qT5DFVtjbq', 'formVid' => '1357', 'mkt_tok' => 'G0rMYjuuJ3XgcjtB0WhKetYzKqQhwG39bjxPyzbC9jzvbMFiLtLCM9bYKgetx4Yed3NXNm9daAxGM3GVSiq5emxHbJeEV7Mr5VEHEcSS5RbFBUm4xrQBDuSRKi26gYq886R', '_mktoReferrer' => 'https://pages.vaneck.com.au/auunsubscribe.html?mkt_unsubscribe=1&mkt_tok=G0rMYjuuJ3XgcjtB0WhKetYzKqQhwG39bjxPyzbC9jzvbMFiLtLCM9bYKgetx4Yed3NXNm9daAxGM3GVSiq5emxHbJeEV7Mr5VEHEcSS5RbFBUm4xrQBDuSRKi26gYq886R', 'checksumFields' => 'Unsubscribed,Email,formid,lpId,subId,munchkinId,lpurl,cr,kw,q,_mkt_trk,formVid,mkt_tok,_mktoReferrer', 'checksum' => '757c309b14bc0ced72dc7969d852f77f8970bbcf43d2fbc8cc09a955cb3ae94d', 'formServiceRequestId31337' => '4498#197294d1252', ) Form fields for data row 2 ================================================================================ array ( 'module' => 'leadCapture', 'action' => 'save2', 'Unsubscribed' => 'yes', 'Email' => 'test2@example.com', 'formid' => '1357', 'lpId' => '219', 'subId' => '203', 'munchkinId' => '410-XOR-673', 'lpurl' => '//pages.vaneck.com/AUUnsubscribe.html?cr={creative}&kw={keyword}', 'cr' => '', 'kw' => '', 'q' => '', '_mkt_trk' => 'id:410-XOR-673&token:_mch-vaneck.com.au-3SCxxRf7HEf5mJ1ykuaUKQNb16wX9BEb', 'formVid' => '1357', 'mkt_tok' => '7FEuRL5uRKXPatf4HbW1yyYKGiQDn4CLivGRRjSgNhkahzC14EzjAgxNExh1pM359hLxh9DHa5mXyTqLCX33pmLA408Mf8ikggauyVb8iZU6UYQm0t6n6R6NGeMMn81k2iN', '_mktoReferrer' => 'https://pages.vaneck.com.au/auunsubscribe.html?mkt_unsubscribe=1&mkt_tok=7FEuRL5uRKXPatf4HbW1yyYKGiQDn4CLivGRRjSgNhkahzC14EzjAgxNExh1pM359hLxh9DHa5mXyTqLCX33pmLA408Mf8ikggauyVb8iZU6UYQm0t6n6R6NGeMMn81k2iN', 'checksumFields' => 'Unsubscribed,Email,formid,lpId,subId,munchkinId,lpurl,cr,kw,q,_mkt_trk,formVid,mkt_tok,_mktoReferrer', 'checksum' => 'f0766eea05c0ebd5d291a81657912451403964898f2cff659931c197d2a83101', 'formServiceRequestId31337' => '129a3#1972a189e1d', )

 

SanfordWhiteman
Level 10
July 15, 2025

Serialization outputs : and ; characters which don't interact well with CSV exports.


While the unfamiliarity of PHP serialization is undeniable, can’t see how this claim is defensible.

 

: and ; are child’s play for CSV parsers, as they have no special meaning. (And of course they exist in JSON as well.)

 

Based on your earlier comments, I think you’re using a broken CSV parser.

Jo_Pitts1
Community Advisor
Community Advisor
July 16, 2025

@ct_wc_jdownham I have to agree with @sanfordwhiteman here.

I've seen interesting issues in CSV files for a global company with different delimiters being used, and different characters being used to denote 10^3 multipliers in numbers.  An import CSV parser wasn't handling this well at all.  It wasn't the fault of the people generating the CSV files, but the fault of the parser interpreting them.

 

However a well written parser should be able to take configuration details to correctly parse this regardless of geography.

 

I wonder - where in the world are you?  If you generate a CSV file by doing a Save-As in Excel what do you see as delimiters?

 

Cheers

Jo

SanfordWhiteman
Level 10
July 16, 2025

I've seen interesting issues in CSV files for a global company with different delimiters being used, and different characters being used to denote 10^3 multipliers in numbers.  An import CSV parser wasn't handling this well at all.  It wasn't the fault of the people generating the CSV files, but the fault of the parser interpreting them.


I’fairness that problem starts with Excel still calling something a CSV even though it uses different regional delimiters. An RFC 4180-only parser is right to be confused by that.

 

Excel seems to treat “CSV” kind of like “KFC” or “SAT” where the “comma” is no longer literal. 😑