How to retrieve comments from within an XML Document in PHP

SimpleXML cannot handle comments, but the DOM extension can. Here’s how you can extract all the comments. You just have to adapt the XPath expression to target the node you want. $doc = new DOMDocument; $doc->loadXML( ‘<doc> <node><!– First node –></node> <node><!– Second node –></node> </doc>’ ); $xpath = new DOMXPath($doc); foreach ($xpath->query(‘//comment()’) as $comment) … Read more

Javascript: extract URLs from string (inc. querystring) and return array

I just use URI.js — makes it easy. var source = “Hello www.example.com,\n” + “http://google.com is a search engine, like http://www.bing.com\n” + “http://exämple.org/foo.html?baz=la#bumm is an IDN URL,\n” + “http://123.123.123.123/foo.html is IPv4 and ” + “http://fe80:0000:0000:0000:0204:61ff:fe9d:f156/foobar.html is IPv6.\n” + “links can also be in parens (http://example.org) ” + “or quotes »http://example.org«.”; var result = URI.withinString(source, function(url) … Read more

Extracting image from PDF with /CCITTFaxDecode filter

Actually, vbcrlfuser’s answer did help me, but the code was not quite correct for the current version of BitMiracle.LibTiff.NET, as I could download it. In the current version, equivalent code looks like this: using iTextSharp.text.pdf; using BitMiracle.LibTiff.Classic; … Tiff tiff = Tiff.Open(“C:\\test.tif”, “w”); tiff.SetField(TiffTag.IMAGEWIDTH, UInt32.Parse(pd.Get(PdfName.WIDTH).ToString())); tiff.SetField(TiffTag.IMAGELENGTH, UInt32.Parse(pd.Get(PdfName.HEIGHT).ToString())); tiff.SetField(TiffTag.COMPRESSION, Compression.CCITTFAX4); tiff.SetField(TiffTag.BITSPERSAMPLE, UInt32.Parse(pd.Get(PdfName.BITSPERCOMPONENT).ToString())); tiff.SetField(TiffTag.SAMPLESPERPIXEL, 1); tiff.WriteRawStrip(0, raw, … Read more

PHP String Manipulation: Extract hrefs

You can use PHPs DOMDocument library to parse XML and/or HTML. Something like the following should do the trick, to get the href attribute from a string of HTML. $html=”<h1>Doctors</h1> <a title=”C – G” href=”https://stackoverflow.com/questions/4702987/linkl.html”>C – G</a> <a title=”G – K” href=”link2.html”>G – K</a> <a title=”K – M” href=”link3.html”>K – M</a>”; $hrefs = array(); $dom … Read more

Extract string before “|” [duplicate]

We can use sub sub(“\\|.*”, “”, str1) #[1] “ABC” Or with strsplit strsplit(str1, “[|]”)[[1]][1] #[1] “ABC” Update If we use the data from @hrbrmstr sub(“\\|.*”, “”, df$V1) #[1] “ABC” “ABCD” “ABCDE” “DEF” “GHI” “BCDE” These are all base R methods. No external packages used. data str1 <- “ABC|DEF|GHI ABCD|EFG|HIJK ABCDE|FGHI|JKL DEF|GHIJ|KLM GHI|JKLM|NO|PQRS BCDE|FGHI|JKL”

Extract files from zip without keeping the structure using python ZipFile?

This opens file handles of members of the zip archive, extracts the filename and copies it to a target file (that’s how ZipFile.extract works, without taking care of subdirectories). import os import shutil import zipfile my_dir = r”D:\Download” my_zip = r”D:\Download\my_file.zip” with zipfile.ZipFile(my_zip) as zip_file: for member in zip_file.namelist(): filename = os.path.basename(member) # skip directories … Read more

How do you extract a url from a string using python?

There may be few ways to do this but the cleanest would be to use regex >>> myString = “This is a link http://www.google.com” >>> print re.search(“(?P<url>https?://[^\s]+)”, myString).group(“url”) http://www.google.com If there can be multiple links you can use something similar to below >>> myString = “These are the links http://www.google.com and http://stackoverflow.com/questions/839994/extracting-a-url-in-python” >>> print re.findall(r'(https?://[^\s]+)’, … Read more