How do I Combine or Merge Small ORC files into Larger ORC file?

You do not need to re-invent the wheel.

ALTER TABLE table_name [PARTITION partition_spec] CONCATENATE can be used to merge small ORC files into a larger file since Hive 0.14.0. The merge happens at the stripe level, which avoids decompressing and decoding the data. It works fast. I’d suggest to create an external table partitioned by day (partitions are directories), then merge them all specifying PARTITION (day_column) as a partition spec.

See here: LanguageManual+ORC

Leave a Comment