How to get list of files/directories of an directory URL?

There are some conditions:

  1. The server must have enabled directory listing in order for you to see the content of it.
  2. There is no way I know of (no API or HTTP verb) to retrieve the listing, and so the listing is generally shown as a normal HTML page
  3. You will have to parse this HTML page in order to find the entries.

The parsing can be done easily using a lib like JSoup.

For example, using JSoup you can fetch the documents at url http://howto.unixdev.net/ like this:

import java.io.IOException;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;

public class Sample {
    public static void main(String[] args) throws IOException {
        Document doc = Jsoup.connect("http://howto.unixdev.net").get();
        for (Element file : doc.select("td.right td a")) {
            System.out.println(file.attr("href"));
        }
    }
}

Will output:

beignets.html
beignets.pdf
bsd-pam-ldap.html
ddns-updates.html
Debian_on_HP_dv6z.html
dextop-slackware.html
dirlist.html
downloads/
ldif/
Linux-SharePoint.html
rhfc3-apt.html
rhfc3-apt.tar.bz2
SUNWdsee-Debian.html
SUNWdtdte-b69.html
SUNWdtdte-b69.tar.bz2
tcshrc.html
Test_LVM_Trim_Ext4.html
Tru64-CS20-HOWTO.html

As for your sample url http://java.sun.com/j2se/1.5/pdf this is a page not found, so I think you’re out of luck.

Leave a Comment