OOZIE: properties defined in file referenced in global job-xml not visible in workflow.xml

OK, you are making two big mistakes.

1. Let’s start with a quick exegesis of some parts of the Oozie documentation (V4.2)

Workflow Functional Specification

  • has a section 19 about Global Configuration
  • has sections 3.2.x about core Action types i.e. MapReduce, Pig, Java, etc.
  • the XML schema specification clearly shows the <global> element

Sqoop action Extension

  • does not make any mention of Global parameters
  • has its own XML schema specification, which evolves at its own pace, and is not up-to-date with the Workflow schema

In other words: the Sqoop action is a plug-in as far as the Oozie server is concerned. It does not support 100% of the “newer” functionalities, including the <global> thing that was introduced in Workflow schema V0.4

2. You don’t understand the distinction between properties and parameters — and I don’t blame you, the Oozie docs are confused and confusing.

Parameters are used by Oozie to run text substitutions in properties, in commands, etc. You define their values as literals, either at submission time with the -config argument, or in the <parameters> element at Workflow level. And by “literal” I mean that you cannot make reference to a parameter in another parameter. The value is just immutable text, used as-is.

Properties are Java properties passed to the jobs that Oozie starts. You can set them either at submission time with the -config argument — yes, it’s a mess, the Oozie parser has to sort out which params have a well-known property name and which ones are just params — or in the <global> Workflow element — but they will not be propagated in all “extensions”, as you have discovered the hard way — or in the <property> Action element or inside an XML file defined with <job-xml> element, either at global Workflow level or at local Action level.

Two things to note:

  • when properties are defined multiple times with multiple (conflicting) values, there has to be a precedence rule but I’m not too sure
  • properties defined explicitly inside Oozie may have their value defined dynamically, using parameters and EL functions; but properties defined inside <job-xml> files must be literals because Oozie does not have access to them (it just passes the file content to the Hadoop Configuration constructor at run-time)

What does it mean for you? Well, your script tells Oozie to pass “hidden” properties to the JVM running the Sqoop job, at run-time, through a <job-xml>.
But you were expecting Oozie to parse a list of parameters and use them, at compile time, to define some properties. That won’t happen.

Leave a Comment