Case class equality in Apache Spark

This is a known issue with Spark REPL. You can find more details in SPARK-2620. It affects multiple operations in Spark REPL including most of transformations on the PairwiseRDDs. For example: case class Foo(x: Int) val foos = Seq(Foo(1), Foo(1), Foo(2), Foo(2)) foos.distinct.size // Int = 2 val foosRdd = sc.parallelize(foos, 4) foosRdd.distinct.count // Long … Read more

byte[] array pattern search

May I suggest something that doesn’t involve creating strings, copying arrays or unsafe code: using System; using System.Collections.Generic; static class ByteArrayRocks { static readonly int[] Empty = new int[0]; public static int[] Locate (this byte[] self, byte[] candidate) { if (IsEmptyLocate(self, candidate)) return Empty; var list = new List<int>(); for (int i = 0; i … Read more

Does PostgreSQL support “accent insensitive” collations?

Use the unaccent module for that – which is completely different from what you are linking to. unaccent is a text search dictionary that removes accents (diacritic signs) from lexemes. Install once per database with: CREATE EXTENSION unaccent; If you get an error like: ERROR: could not open extension control file “/usr/share/postgresql/<version>/extension/unaccent.control”: No such file … Read more

How to select lines between two marker patterns which may occur multiple times with awk/sed

Use awk with a flag to trigger the print when necessary: $ awk ‘/abc/{flag=1;next}/mno/{flag=0}flag’ file def1 ghi1 jkl1 def2 ghi2 jkl2 How does this work? /abc/ matches lines having this text, as well as /mno/ does. /abc/{flag=1;next} sets the flag when the text abc is found. Then, it skips the line. /mno/{flag=0} unsets the flag … Read more

How to print lines between two patterns, inclusive or exclusive (in sed, AWK or Perl)?

Print lines between PAT1 and PAT2 $ awk ‘/PAT1/,/PAT2/’ file PAT1 3 – first block 4 PAT2 PAT1 7 – second block PAT2 PAT1 10 – third block Or, using variables: awk ‘/PAT1/{flag=1} flag; /PAT2/{flag=0}’ file How does this work? /PAT1/ matches lines having this text, as well as /PAT2/ does. /PAT1/{flag=1} sets the flag … Read more