Multiple schemas versus enormous tables [closed]

I want to see which method is more efficient in terms of querying in the database.

In a multi-tenant database, querying is only part of the problem. Other parts of the problem are cost, data isolation and protection, maintenance, and disaster recovery. These are significant; you can’t consider only query efficiency in a multi-tenant database.

Multi-tenant solutions range from one database per tenant (shared nothing) to one row per tenant (shared everything).

“Shared nothing”, “separate database”, or one database per tenant

  • Most expensive per client. (Large numbers of clients imply large numbers of servers.)
  • Highest degree of data isolation.
  • Disaster recovery for a single tenant is simple and straightforward.
  • Maintenance is theoretically harder, because changes need to be carried out in every database. But your dbms might easily support running stored procedures in each database. (SQL Server has an undocumented system stored procedure, sp_msforeachdb, for example. You can probably write your own.)
  • “Shared nothing” is the most easily customizable, too, but that also raises more maintenance issues. For example, each tenant might have a different pattern of usage, which suggests each tenant might need some indexes that other tenants wouldn’t. That’s trivial to do with “shared nothing”; impossible with “shared everything” (below).
  • Lowest number of rows per table. Querying speed is near optimal.

“Shared everything”, or “shared schema”, or “one database per planet”

  • Least expensive per tenant.
  • Lowest degree of data isolation. Every table has a column that identifies which tenant a row belongs to. Since tenant rows are mixed in every table, it’s relatively simple to accidentally expose other tenant’s data.
  • Disaster recovery for a single tenant is relatively complicated; you have to restore individual rows in many tables. On the other hand, a single-tenant disaster is relatively unusual. Most disasters will probably affect all tenants.
  • Structural maintenance is simpler, given that all tenants share the tables. It increases the communication load, though, because you have to communicate and coordinate each change with every tenant. It’s not easily customizable.
  • Highest number of rows per table. Quick querying is harder, but it depends on how many tenants and how many rows. You could easily tip over into VLDB territory.

Between “shared nothing” and “shared everything” is “shared schema”.

“Shared schema”

  • Tenants share a database, but each tenant has it’s own named schema. Cost falls between “shared nothing” and “shared everything”; big systems typically need fewer servers than “shared nothing”, more servers than “shared everything”.
  • Much better isolation than “shared everything”. Not quite as much isolation as “shared nothing”. (You can GRANT and REVOKE permissions on schemas.)
  • Disaster recovery for a single tenant require restoring one of many schemas. This is either relatively easy or fairly hard, depending on your dbms.
  • Maintenance is easier than “shared nothing”; not as easy as “shared everything”. It’s relatively simple to write a stored procedure that will execute in each schema in a database. It’s easier to share common tables among tenants than with “shared nothing”.
  • Usually more active tenants per server than “shared nothing”, which means they share (degrade) more resources. But not as bad as “shared everything”.

Microsoft has a good article on multi-tenant architecture with more details. (The link is to just one page of a multi-page document. Microsoft has since removed that page; the link is now to a copy in archive.org.)

Leave a Comment