encoding - Error reading UTF-8 file in Java -
i trying read in sentences file contains unicode characters. print out string reason messes unicode characters
this code have:
public static string readsentence(string resourcename) { string sentence = null; try { inputstream refstream = classloader .getsystemresourceasstream(resourcename); bufferedreader br = new bufferedreader(new inputstreamreader( refstream, charset.forname("utf-8"))); sentence = br.readline(); } catch (ioexception e) { throw new runtimeexception("cannot read sentence: " + resourcename); } return sentence.trim(); }
the problem in way string being output.
i suggest confirm correctly reading unicode characters doing this:
for (char c : sentence.tochararray()) { system.err.println("char '" + ch + "' unicode codepoint " + ((int) ch))); }
and see if unicode codepoints correct characters being messed up. if correct, problem output side: if not, input side.
Comments
Post a Comment