Nokogiri, open-uri, and Unicode Characters

Summary: When feeding UTF-8 to Nokogiri through open-uri, use open(…).read and pass the resulting string to Nokogiri. Analysis: If I fetch the page using curl, the headers properly show Content-Type: text/html; charset=UTF-8 and the file content includes valid UTF-8, e.g. “GenealogĂ­a de Jesucristo”. But even with a magic comment on the Ruby file and setting … Read more