This article addresses a common issue developers encounter when working with byte arrays containing a byte order mark (BOM). When using basic conversion methods like Encoding.UTF8.GetString(byteData), the BOM becomes visible in the resulting text—typically an unintended outcome.
The Problem with Simple Conversion
The straightforward approach of converting byte data to strings often fails because developers don't account for encoding metadata. Various hacks are applied like checking if the byte array starts with a UTF8 BOM and skip it.
Understanding Encoding Context
Files and streams require encoding detection. While files rely on BOMs, web content uses HTTP headers with the Content-Type field that includes charset information to specify the text encoding used.
Recommended Solution
Rather than applying workarounds, I recommend using C#'s System.IO.StreamReader class, which supports most common BOM's and using the StreamReader will result in not having the BOM in the converted string.
The StreamReader class reads byte arrays with optional BOM support by checking Content-Type headers and applying appropriate encoding before conversion.