XPath axis, get all following nodes until

Use: (//h2[. = ‘Foo bar’])[1]/following-sibling::p [1 = count(preceding-sibling::h2[1] | (//h2[. = ‘Foo bar’])[1])] In case it is guaranteed that every h2 has a distinct value, this may be simplified to: //h2[. = ‘Foo bar’]/following-sibling::p [1 = count(preceding-sibling::h2[1] | ../h2[. = ‘Foo bar’])] This means: Select all p elements that are following siblings of the h2 … Read more

Print an XML document without the XML header line at the top

The simplest way to get the XML for a Document without the leading “PI” (processing instruction) is to call to_s on the root element instead of the document itself: require ‘nokogiri’ doc = Nokogiri.XML(‘<hello world=”true” />’) puts doc #=> <?xml version=”1.0″?> #=> <hello world=”true”/> puts doc.root #=> <hello world=”true”/> The ‘correct’ way to do it … Read more

How do I parse an HTML table with Nokogiri?

#!/usr/bin/ruby1.8 require ‘nokogiri’ require ‘pp’ html = <<-EOS (The HTML from the question goes here) EOS doc = Nokogiri::HTML(html) rows = doc.xpath(‘//table/tbody[@id=”threadbits_forum_251″]/tr’) details = rows.collect do |row| detail = {} [ [:title, ‘td[3]/div[1]/a/text()’], [:name, ‘td[3]/div[2]/span/a/text()’], [:date, ‘td[4]/text()’], [:time, ‘td[4]/span/text()’], [:number, ‘td[5]/a/text()’], [:views, ‘td[6]/text()’], ].each do |name, xpath| detail[name] = row.at_xpath(xpath).to_s.strip end detail end pp details … Read more

nokogiri gem installation error

2020 April 6th Update: macOS Catalina 10.15 gem install nokogiri — –use-system-libraries=true –with-xml2-include=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk/usr/include/libxml2/ macOS Mojave 10.14 gem install nokogiri — –use-system-libraries=true –with-xml2-include=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/libxml2/ macOS High Sierra 10.13 gem install nokogiri — –use-system-libraries=true –with-xml2-include=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.13.sdk/usr/include/libxml2/ macOS Sierra 10.12: gem install nokogiri — –use-system-libraries=true –with-xml2-include=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/libxml2/ OS X El Capitan 10.11 gem install nokogiri — –use-system-libraries=true –with-xml2-include=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.11.sdk/usr/include/libxml2/ Consider to add … Read more

Error to install Nokogiri on OSX 10.9 Maverick?

You can also install Nokogiri on Mac OS X 10.9 Mavericks with full XCode Install using: gem install nokogiri — –with-xml2-include=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.9.sdk/usr/include/libxml2 Update For those using Yosemite the following command will work: gem install nokogiri — –with-xml2-include=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.10.sdk/usr/include/libxml2 –use-system-libraries or, it might actually be in your MacOSX10.11.sdk folder (mine was as of 18-Sep-2015) anyways, so even if … Read more

‘require’: cannot load such file — ‘nokogiri\nokogiri’ (LoadError) when running `rails server`

Nokogiri doesn’t support Ruby 2.2 on Windows yet. The next release will. See https://github.com/sparklemotion/nokogiri/issues/1256 Nokogiri doesn’t support native builds (e.g. with devkit) on Windows. Instead it provides gems containing prebuilt DLLs. There’s a discussion which you may want to join or watch on the topic of devkit build support here: https://github.com/sparklemotion/nokogiri/issues/1190

How do I pretty-print HTML with Nokogiri?

The answer by @mislav is somewhat wrong. Nokogiri does support pretty-printing if you: Parse the document as XML Instruct Nokogiri to ignore whitespace-only nodes (“blanks”) during parsing Use to_xhtml or to_xml to specify pretty-printing parameters In action: html=”<section> <h1>Main Section 1</h1><p>Intro</p> <section> <h2>Subhead 1.1</h2><p>Meat</p><p>MOAR MEAT</p> </section><section> <h2>Subhead 1.2</h2><p>Meat</p> </section></section>” require ‘nokogiri’ doc = Nokogiri::XML(html,&:noblanks) puts … Read more