What is the algorithm used by the ORA_HASH function?

another system that appears to use ORA_HASH

Well, if it “appears to use” then it makes sense to do a bit of reverse engineering and check what exactly is called and disassemble code of the function.

If you, however, want to dive into Oracle internals then following may help.

First of all, you have to figure out what internal C function is called.
To do that you can execute some long running code in one session.
I did run this

select avg(ora_hash(rownum)) id from
(select rownum from dual connect by rownum <= 1e4),
(select rownum from dual connect by rownum <= 1e4);

It can be PL/SQL code as well, you just need to make sure that you constantly call ora_hash.

While it’s running

I tested on Windows and looks like that ora_hash is …->evaopn2()->evahash()->…

Now let’s google for evahash. We got extremely lucky because there is a header file on official site https://oss.oracle.com/projects/ocfs-tools/src/branches/new-dir-format/libocfs/Linux/inc/ocfshash.h with link to evahash.

And finally there is page with actual C code http://burtleburtle.net/bob/hash/evahash.html

So far so good, we remember that we can use external C function in Oracle if we build it into library (DLL on Windows).

For example on my Win x64 if I change function signature to

extern "C" ub4 hash( ub1 *k, ub4 length, ub4 initval)

it can be successfully executed from Oracle.
But, as you see, signature a bit differs from ora_hash in Oracle. This function accepts value, its length and initval (may be seed) while signature in Oracle is ora_hash(expr, max_bucket, seed_value).

Let’s try to test
Oracle

SQL> select ora_hash(utl_raw.cast_to_raw('0'), power(2, 32) - 1, 0) oh1,
  2         ora_hash('0', power(2, 32) - 1, 0) oh2,
  3         ora_hash(0, power(2, 32) - 1, 0) oh3,
  4         ora_hash(chr(0), power(2, 32) - 1, 0) oh4
  5    from dual;

       OH1        OH2        OH3        OH4
---------- ---------- ---------- ----------
3517341953 3517341953 1475158189 4056412421

C

int main()
{
    ub1 ta[] = {0};
    ub1* t = ta;
    cout << hash(t, 1, 0) << endl;
    ub1 ta0[] = {'0'};
    ub1* t0 = ta0;
    cout << hash(t0, 1, 0) << endl;
    return 0;
}

1843378377
4052366646

None of the numbers matches.
So what is the problem?
ora_hash accepts parameters of almost any type (for example select ora_hash(sys.odcinumberlist(1,2,3)) from dual) while C function accepts value as array of bytes. This means that some conversion happens before function call.
Thus before using mentioned C hash function you have to figure out how actual value is transformed before passing to it.

You can proceed with reverse engineering of Oracle binaries using IDA PRO + hex rays but that may take days. Not to mention platform specific details.

So if you want to imitate ora_hash, the easiest option would be to install Oracle express edition and use it to call ora_hash.

I hope that was interesting. Good luck.

Update

ora_hash and dbms_utility.get_hash_value can be mapped to each other (see https://jonathanlewis.wordpress.com/2009/11/21/ora_hash-function/)

SQL> select dbms_utility.get_hash_value('0', 0 + 1, 1e6 + 1) ha1,
  2         ora_hash('0', 1e6, 0) + 1 ha2
  3    from dual;

       HA1        HA2
---------- ----------
    338437     338437

If we unwrap package body of dbms_utility we will see following declaration

  function get_hash_value(name varchar2, base number, hash_size number)
    return number is
  begin
    return(icd_hash(name, base, hash_size));
  end;

and

  function icd_hash(name      varchar2,
                    base      binary_integer,
                    hash_size binary_integer) return binary_integer;
  pragma interface(c, icd_hash);

Let’s google for icd_hash and we can find that it’s mapped to _psdhsh (https://yurichev.com/blog/50/). Now it’s time to disassemble oracle.exe and extract code for _psdhsh from it. Maybe I’ll spend some time on this next year.

Leave a Comment