Can PostgreSQL have a uniqueness constraint on array elements?

The righteous path

You might want to reconsider normalizing your schema. It is not necessary for everyone to “join for even the simplest query”. Create a VIEW for that.

Table could look like this:

CREATE TABLE hostname (
  hostname_id serial PRIMARY KEY
, host_id     int  REFERENCES host(host_id) ON UPDATE CASCADE ON DELETE CASCADE
, hostname    text UNIQUE
);

The surrogate primary key hostname_id is optional. I prefer to have one. In your case hostname could be the primary key. But many operations are faster with a simple, small integer key. Create a foreign key constraint to link to the table host.
Create a view like this:

CREATE VIEW v_host AS
SELECT h.*
     , array_agg(hn.hostname) AS hostnames
--   , string_agg(hn.hostname, ', ') AS hostnames  -- text instead of array
FROM   host h
JOIN   hostname hn USING (host_id)
GROUP  BY h.host_id;   -- works in v9.1+

Starting with pg 9.1, the primary key in the GROUP BY covers all columns of that table in the SELECT list. The release notes for version 9.1:

Allow non-GROUP BY columns in the query target list when the primary
key is specified in the GROUP BY clause

Queries can use the view like a table. Searching for a hostname will be much faster this way:

SELECT *
FROM   host h
JOIN   hostname hn USING (host_id)
WHERE  hn.hostname="foobar";

^{Provided you have an index on host(host_id), which should be the case as it should be the primary key. Plus, the UNIQUE constraint on hostname(hostname) implements the other needed index automatically.}

In Postgres 9.2+ a multicolumn index would be even better if you can get an index-only scan out of it:

CREATE INDEX hn_multi_idx ON hostname (hostname, host_id);

Starting with Postgres 9.3, you could use a MATERIALIZED VIEW, circumstances permitting. Especially if you read much more often than you write to the table.

The dark side (what you actually asked)

If I can’t convince you of the righteous path, here is some assistance for the dark side:

Here is a demo how to enforce uniqueness of hostnames. I use a table hostname to collect hostnames and a trigger on the table host to keep it up to date. Unique violations raise an exception and abort the operation.

CREATE TABLE host(hostnames text[]);
CREATE TABLE hostname(hostname text PRIMARY KEY);  --  pk enforces uniqueness

Trigger function:

CREATE OR REPLACE FUNCTION trg_host_insupdelbef()
  RETURNS trigger
  LANGUAGE plpgsql AS
$func$
BEGIN
   -- split UPDATE into DELETE & INSERT
   IF TG_OP = 'UPDATE' THEN
      IF OLD.hostnames IS DISTINCT FROM NEW.hostnames THEN -- keep going
      ELSE
         RETURN NEW;  -- exit, nothing to do
      END IF;
   END IF;

   IF TG_OP IN ('DELETE', 'UPDATE') THEN
      DELETE FROM hostname h
      USING  unnest(OLD.hostnames) d(x)
      WHERE  h.hostname = d.x;

      IF TG_OP = 'DELETE' THEN RETURN OLD;  -- exit, we are done
      END IF;
   END IF;

   -- control only reaches here for INSERT or UPDATE (with actual changes)
   INSERT INTO hostname(hostname)
   SELECT h
   FROM   unnest(NEW.hostnames) h;

   RETURN NEW;
END
$func$;

Trigger:

CREATE TRIGGER host_insupdelbef
BEFORE INSERT OR DELETE OR UPDATE OF hostnames ON host
FOR EACH ROW EXECUTE FUNCTION trg_host_insupdelbef();

SQL Fiddle with test run.

Use a GIN index on the array column host.hostnames and array operators to work with it:

The righteous path

The dark side (what you actually asked)

More Related Contents:

Leave a Comment Cancel reply