One method for doing this would be to use java’s Set
interface; read each line as a string, add it to the set, then do a removeAll()
with the second set on the first set, thus retaining the rows which differ. This, of course, assumes that there are no duplicate rows in the files.
// using FileUtils to read in the files.
HashSet<String> f1 = new HashSet<String>(FileUtils.readLines("file1.csv"));
HashSet<String> f2 = new HashSet<String>(FileUtils.readLines("file2.csv"));
f1.removeAll(f2); // f1 now contains only the lines which are not in f2
Update
Okay, so you have a PK field. I’ll just assume you know how to get that from your string; use openCSV or regex or whatever you want. Make an actual HashMap
instead of a HashSet
as above, use the PK as the key and the row as the value.
HashMap<String, String> f1 = new HashMap<String, String>();
HashMap<String, String> f2 = new HashMap<String, String>();
// read f1, f2; use PK field as the key
List<String> deleted = new ArrayList<String>();
List<String> updated = new ArrayList<String>();
for(Map.Entry<String, String> entry : f1.keySet()) {
if(!f2.containsKey(entry.getKey()) {
deleted.add(entry.getValue());
} else {
if(!f2.get(entry.getKey().equals(f1.getValue())) {
updated.add(f1.getValue());
}
}
}
for(String key : f1.keySet()) {
f2.remove(key);
}
// f2 now contains only "new" rows