Saturday, February 23, 2013

How MSBuild could make a file inparseable

Problem: A text file that was being parsed from an assembly being part of a developing system. In the developer environment, this worked great. After having deployed the text file and the necessary assemblies to a different location using MSBuild, where MSBuild also replaced a couple of strings inside the text file, parsing would no longer work.
  • Looking at the file in a text editor (i use Notepad++) confirmed that the edited version looked fine.
  • Editing the file in the developer environment manually (not running MSBuild on it) worked fine - the file was parseable afterwards.
Obviously, MSBuild made the file inparseable So how could MSBuild make the text file inparseable? The command touching the file was
     RegExPattern="Something" Replacement="SomethingElse"
Only when tracing the code parsing the file it became clear to me that the file now contained some extra characters at the beginning of the file. Then it occurred to me:

Solution: MSBuild changed the file encoding. To make sure MSBuild used the right encoding, I had to add one more key/value pair to the MSBuild tag mentioned above:
I found this by inspecting the file encoding on the source and destination text files. My source file was reported as ANSI, whileas my destination file was reported as UTF-8. It was however not as simple as putting "ANSI" as the TextEncoding, as described in this excellent StackOverflow article, which also lead me on the right path to "Windows-1252".

In my opinion, MSBuild should have retained the original encoding on the files it touches instead of  defaulting it into something unwanted. But then again, that's wat keeps bread on my table... Thanks, MS...