The best way to construct a function with memory

Let’s assume our slow function has the signature f[x, y].

Pure In-memory Approach

If you are satisfied with an in-memory cache, the simplest thing to do would be to use memoization:

Clear@fmem
fmem[x_, y_] := fmem[x, y] = f[x, y]

This adds a definition to itself every time it is called with a combination of arguments that it has not seen before.

File-backed In-memory Approach

However, if you are running out of memory or suffering kernel crashes during the long computation, you will want to back this cache with some kind of persistence. The simplest thing would be to keep a running log file:

$runningLogFile = "/some/directory/runningLog.txt";

Clear@flog
flog[x_, y_] := flog[x, y] = f[x, y] /.
  v_ :> (PutAppend[Unevaluated[flog[x, y] = v;], $runningLogFile]; v)

If[FileExistsQ[$runningLogFile]
, Get[$runningLogFile]
, Export[$runningLogFile, "", "Text"];
]

flog is the same as fmem, except that it also writes an entry into the running log that can be used to restore the cached definition in a later session. The last expression reloads those definitions when it finds an existing log file (or creates the file if it does not exist).

The textual nature of the log file is convenient when manual intervention is required. Be aware that the textual representation of floating-point numbers introduces unavoidable round-off errors, so you may get slightly different results after reloading the values from the log file. If this is of great concern, you might consider using the binary DumpSave feature although I will leave the details of that approach to the reader as it is not quite as convenient for keeping an incremental log.

SQL Approach

If memory is really tight, and you want to avoid having a large in-memory cache to make room for the other computations, the previous strategy might not be appropriate. In that case, you might consider using Mathematica’s built-in SQL database to store the cache completely externally:

fsql[x_, y_] :=
  loadCachedValue[x, y] /. $Failed :> saveCachedValue[x, y, f[x, y]]

I define loadCachedValue and saveCachedValue below. The basic idea is to create an SQL table where each row holds an x, y, f triple. The SQL table is queried every time a value is needed. Note that this approach is substantially slower than the in-memory cache, so it makes the most sense when the computation of f takes much longer than the SQL access time. The SQL approach does not suffer from the round-off errors that afflicted the text log file approach.

The definitions of loadCachedValue and saveCachedValue now follow, along with some other useful helper functions:

Needs["DatabaseLink`"]

$cacheFile = "/some/directory/cache.hsqldb";

openCacheConnection[] :=
  $cache = OpenSQLConnection[JDBC["HSQL(Standalone)", $cacheFile]]

closeCacheConnection[] :=
  CloseSQLConnection[$cache]

createCache[] :=
  SQLExecute[$cache,
    "CREATE TABLE cached_values (x float, y float, f float)
     ALTER TABLE cached_values ADD CONSTRAINT pk_cached_values PRIMARY KEY (x, y)"
  ]

saveCachedValue[x_, y_, value_] :=
  ( SQLExecute[$cache,
      "INSERT INTO cached_values (x, y, f) VALUES (?, ?, ?)", {x, y, value}
    ]
  ; value
  )

loadCachedValue[x_, y_] :=
  SQLExecute[$cache,
    "SELECT f FROM cached_values WHERE x = ? AND y = ?", {x, y}
  ] /. {{{v_}} :> v, {} :> $Failed}

replaceCachedValue[x_, y_, value_] :=
  SQLExecute[$cache,
    "UPDATE cached_values SET f = ? WHERE x = ? AND y = ?", {value, x, y}
  ]

clearCache[] :=
  SQLExecute[$cache,
    "DELETE FROM cached_values"
  ]

showCache[minX_, maxX_, minY_, maxY_] :=
  SQLExecute[$cache,
    "SELECT *
     FROM cached_values
     WHERE x BETWEEN ? AND ?
     AND y BETWEEN ? AND ?
     ORDER BY x, y"
  , {minX, maxX, minY, maxY}
  , "ShowColumnHeadings" -> True
  ] // TableForm

This SQL code uses floating point values as primary keys. This is normally a questionable practice in SQL but works fine in the present context.

You must call openCacheConnection[] before attempting to use any of these functions. You should call closeCacheConnection[] after you have finished. One time only, you must call createCache[] to initialize the SQL database. replaceCachedValue, clearCache and showCache are provided for manual interventions.

Leave a Comment