Removing Carriage Return Line Feeds from Excel Source in SSIS

For a project I'm working on, the goal is to bring in some data from an Excel file.  And for 3 of the tabs, the data flowed okay.  For the fourth tab, some of the field lengths exceeded 1000 characters.

And SSIS scans the first 8 rows to determine the correct field length and limited the rows to 255 every time, even after changing the length manually, it reverted back.

In order to get around this, there's a registry setting you can apply to let it guess the correct field length by scanning the entire column, I blogged about that here.

That wasn't the end of the story.  Next issue, almost every row of data for 5 fields contained a Carriage Return / Line Feed.  What that does, it breaks the field out into multiple rows, because it uses the CR\LF as the end of row indicator.

I worked with a team mate, our Architect on the project and found a working solution.

Basically, the Excel tab still gets read in with as usual, but then we do a data conversion to DT_WSTR instead of DT_NText:

Then we do Derived Column to REPLACE the "\r\n" fields with " ":

And that corrects the problem and the fields export correctly to the text file.  Thanks to my team mate for helping out on this one~!

So those two issues are now fixed.  On to the next set of issues.

No comments:

Post a Comment