validation - How to check whether a file is valid UTF-8? -


i'm processing data files supposed valid utf-8 aren't, causes parser (not under control) fail. i'd add stage of pre-validating data utf-8 well-formedness, i've not yet found utility this.

there's web service @ w3c appears dead, , i've found windows-only validation tool reports invalid utf-8 files doesn't report lines/characters fix.

i'd happy either tool can drop in , use (ideally cross-platform), or ruby/perl script can make part of data loading process.

you can use gnu iconv:

$ iconv -f utf-8 your_file -o /dev/null 

or older versions of iconv, such on macos:

$ iconv -f utf-8 your_file > /dev/null; echo $? 

the command return 0 if file converted successfully, , 1 if not. additionally, print out byte offset invalid byte sequence occurred.

edit: output encoding doesn't have specified, assumed utf-8.


Comments

Popular posts from this blog

c++ - How do I get a multi line tooltip in MFC -

asp.net - In javascript how to find the height and width -

c# - DataTable to EnumerableRowCollection -