Permission Systems for Enterprise That Scale

68 points by eliocs 9 hours ago on hackernews | 27 comments

Many startups eventually gravitate towards enterprise customers for bigger tickets and long-term contracts. As enterprise customers start using your product, they soon demand advanced permission systems to manage their different user roles and access levels. A naive implementation of permission checks works perfectly fine at first, but as they use your platform more and more, the amount of data, users, and relationships will put that implementation to the test. Soon your biggest paying customer will be threatening to churn because your app is just too slow.

To illustrate this problem, I've set up a simple example. Imagine an app with folders and files where an admin can see all folders and files, but standard users can only see files and folders they've created or that have been shared with them.

The Naive Approach (read-time permission queries)

Your first intuition will be to query the database on every request to calculate the permissions and the data that the user can access. First, you check the role of the user: is the user an admin or a standard user? If the user is not an admin, then query for all the resources the user has created or that have been shared with them.

The first part is straightforward: if you are an admin, return all resources.

const user = await sqlQueryOne(`SELECT * FROM users WHERE id = ?`, [
  userId,
]);

// Admin: Fetch all resources
if (user.type === "admin") {
  return await sqlQuery(`SELECT * FROM resources`);
}

For standard users, you need to query for resources created by them. If the resource is a folder, you will have access to all of its descendants.

const accessibleResources = [];

const ownedResources = await sqlQuery(
  `SELECT * FROM resources WHERE owner_id = ?`,
  [userId],
);

// For each owned resource, fetch descendants recursively
for (const resource of ownedResources) {
  accessibleResources.push(resource);

  const descendants = await sqlQuery(
    `
    WITH RECURSIVE tree AS (
      SELECT id FROM resources WHERE parent_id = ?
      UNION ALL
      SELECT r.id FROM resources r JOIN tree t ON r.parent_id = t.id
    )
    SELECT r.* FROM resources r JOIN tree t ON r.id = t.id
  `,
    [resource.id],
  );
  accessibleResources.push(...descendants);
}

We can already see recursive queries appearing. As we own more resources with deeper nesting, the queries will get slower.

Finally, we include all shared resources, their descendants if a folder was shared, plus their ancestors to show the full path.

const sharedResources = await sqlQuery(
  `
  SELECT r.*
  FROM shares s
  JOIN resources r ON s.resource_id = r.id
  WHERE s.user_id = ?
`,
  [userId],
);

// For each shared resource, fetch ancestors and descendants
for (const resource of sharedResources) {
  accessibleResources.push(resource);

  // Query ancestors to show full path
  const ancestors = await sqlQuery(
    `
    WITH RECURSIVE ancestors AS (
      SELECT parent_id FROM resources WHERE id = ?
      UNION ALL
      SELECT r.parent_id FROM resources r JOIN ancestors a ON r.id = a.parent_id
      WHERE r.parent_id IS NOT NULL
    )
    SELECT r.* FROM resources r JOIN ancestors a ON r.id = a.parent_id
  `,
    [resource.id],
  );
  accessibleResources.push(...ancestors);

  // Another query for descendants of each shared resource
  const descendants = await sqlQuery(
    `
    WITH RECURSIVE tree AS (
      SELECT id FROM resources WHERE parent_id = ?
      UNION ALL
      SELECT r.id FROM resources r JOIN tree t ON r.parent_id = t.id
    )
    SELECT r.* FROM resources r JOIN tree t ON r.id = t.id
  `,
    [resource.id],
  );
  accessibleResources.push(...descendants);
}

The code is easy to follow, but the number of queries—especially recursive ones with nested folder structures—will quickly become a bottleneck as you scale.

RBAC (write-time permission queries)

Because we normally develop apps that read more than they write, it makes sense to optimize for reads. What does this look like? With Role Based Access Control we store the pre-computed permissions in the database, linking users with resources:

-- RBAC: Pre-computed permissions
-- access_type: 'owner' (full control), 'shared' (read only), 'path_only' (visible but no access)
CREATE TABLE permissions (
  user_id INTEGER NOT NULL,
  resource_id INTEGER NOT NULL,
  access_type TEXT NOT NULL,
  PRIMARY KEY (user_id, resource_id),
  FOREIGN KEY (user_id) REFERENCES users(id),
  FOREIGN KEY (resource_id) REFERENCES resources(id)
);

Now, when querying for resources, we can simply do:

const resources = await sqlQuery(
  `SELECT r.* FROM resources r
   JOIN permissions p ON r.id = p.resource_id
   WHERE p.user_id = ?`,
  [userId],
);

That's it! No recursive queries, no multiple queries, just a simple join. It's as fast as it gets, and easy to index too.

Let's see how to maintain the permissions table. On every new resource, we need to grant the owner full access:

await sqlRun(
  `INSERT OR IGNORE INTO permissions (user_id, resource_id, access_type) VALUES (?, ?, 'owner')`,
  [ownerId, resourceId],
);

That was easy, but some more complex logic is needed when sharing resources:

// Add the shared resource itself with 'shared' access
await sqlRun(
  `INSERT OR IGNORE INTO permissions (user_id, resource_id, access_type) VALUES (?, ?, 'shared')`,
  [userId, resourceId],
);

// Add ancestors for path visibility (path_only - no actual access)
const ancestorIds = await getAncestorIds(resourceId);
for (const ancestorId of ancestorIds) {
  await sqlRun(
    `INSERT OR IGNORE INTO permissions (user_id, resource_id, access_type) VALUES (?, ?, 'path_only')`,
    [userId, ancestorId],
  );
}

// Add all descendants with 'shared' access
const descendantIds = await getDescendantIds(resourceId);
for (const descId of descendantIds) {
  await sqlRun(
    `INSERT OR IGNORE INTO permissions (user_id, resource_id, access_type) VALUES (?, ?, 'shared')`,
    [userId, descId],
  );
}

You can clearly see the trade-off here: we moved the complexity from read-time to write-time. Every time a resource is created or shared, we need to update the permissions table accordingly. We added a point of failure, as the permissions table can get out of sync with the actual data.

Real-world example: PostHog

Although this is a simple example, while browsing the PostHog source code I found the exact same approach. They have an AccessControl model that stores pre-computed permissions:

# PostHog: ee/models/rbac/access_control.py
class AccessControl(models.Model):
    team = models.ForeignKey("posthog.Team", on_delete=models.CASCADE)
    access_level = models.CharField(max_length=32)  # 'none', 'viewer', 'editor'
    resource = models.CharField(max_length=32)      # 'dashboard', 'feature_flag', etc.
    resource_id = models.CharField(null=True)       # null = all, UUID = specific object

    # Can be scoped to a user or a role
    organization_member = models.ForeignKey(..., null=True)
    role = models.ForeignKey("Role", null=True)

When listing resources, they simply filter the queryset based on these pre-computed permissions:

# PostHog: posthog/rbac/user_access_control.py
def filter_queryset_by_access_level(self, queryset, ...):
    # Get pre-computed access controls from cache
    access_controls = self._get_access_controls(filters)

    # Build sets of allowed/blocked resource IDs
    for access_control in access_controls:
        if access_control.access_level != NO_ACCESS_LEVEL:
            allowed_resource_ids.add(access_control.resource_id)
        else:
            blocked_resource_ids.add(access_control.resource_id)

    # Filter queryset - no recursive queries needed!
    return queryset.filter(id__in=allowed_resource_ids)

They also cache the access controls during a request to avoid repeated database lookups:

# PostHog: posthog/rbac/user_access_control.py
def _get_access_controls(self, filters):
    key = json.dumps(filters, sort_keys=True)
    if key not in self._cache:
        self._cache[key] = list(AccessControl.objects.filter(...))
    return self._cache[key]

The pattern is clear: pre-compute permissions at write-time, store them in a dedicated table, and filter with simple lookups at read-time.

What about ABAC?

If you research ways to implement permission systems, you will often find Attribute-Based Access Control as a recommended approach. ABAC is great when you need to make complex decisions on whether a user can access a specific resource or not. Figma showed how they handle permissions, and this is an example of ABAC in action.

The main difference from our naive approach is that ABAC offers a declarative way of defining the rules and policies behind the permission checks. The engine will convert those policies into queries and code to execute them at read-time. Here is an example of how those rules could look for our example:

# Rule 1: Admins can access everything
allow if {
    input.user.type == "admin"
}

# Rule 2: Owners can access their resources
allow if {
    input.resource.owner_id == input.user.id
}

# Rule 3: Users can access resources shared with them
allow if {
    some share in data.shares
    share.resource_id == input.resource.id
    share.user_id == input.user.id
}

# ...

This approach is very clear and composable. It works great for single-resource access checks: "can user X access resource Y?" It struggles when listing resources, as we would need to execute the policies for each resource and can't directly query the resources table with simple filters.

Optimizing for hierarchical data

Both approaches still rely on recursive queries to traverse the folder hierarchy. In SaaS applications with deeply nested structures, these recursive queries can also become a bottleneck. There are two common patterns to eliminate them.

Materialized paths

A simple alternative is storing the full path as a string column on each resource.

ALTER TABLE resources ADD COLUMN path TEXT;

-- Example paths:
-- /1/           (root folder, id=1)
-- /1/5/         (child of root, id=5)
-- /1/5/12/      (grandchild, id=12)

Finding descendants becomes a prefix search.

// All descendants of resource with path '/1/5/'
const descendants = await sqlQuery(
  `SELECT * FROM resources WHERE path LIKE ?`,
  [resource.path + "%"],
);

// All ancestors by parsing the path
const ancestorIds = resource.path.split("/").filter(Boolean);
const ancestors = await sqlQuery(
  `SELECT * FROM resources WHERE id IN (${ancestorIds.join(",")})`,
);

Materialized paths are easier to implement but can be tricky when moving resources between folders, as you need to update the path of all descendants. Closure tables handle moves more gracefully.

Closure tables

A closure table pre-computes all ancestor-descendant relationships. Instead of traversing the tree at query time, you store every possible path—another trade-off between read and write times.

CREATE TABLE resource_closure (
  ancestor_id INTEGER NOT NULL,
  descendant_id INTEGER NOT NULL,
  depth INTEGER NOT NULL,
  PRIMARY KEY (ancestor_id, descendant_id),
  FOREIGN KEY (ancestor_id) REFERENCES resources(id),
  FOREIGN KEY (descendant_id) REFERENCES resources(id)
);

-- /1/           (root folder, id=1)
INSERT INTO resource_closure VALUES (1, 1, 0);

-- /1/5/         (child of root, id=5)
INSERT INTO resource_closure VALUES (5, 5, 0);
INSERT INTO resource_closure VALUES (1, 5, 1);

-- /1/5/12/      (grandchild, id=12)
INSERT INTO resource_closure VALUES (12, 12, 0);
INSERT INTO resource_closure VALUES (5, 12, 1);
INSERT INTO resource_closure VALUES (1, 12, 2);

With this table, finding all descendants becomes a simple join with no recursion.

// All descendants of a resource (instant lookup)
const descendants = await sqlQuery(
  `SELECT r.* FROM resources r
   JOIN resource_closure c ON r.id = c.descendant_id
   WHERE c.ancestor_id = ? AND c.depth > 0`,
  [resourceId],
);

// All ancestors of a resource (for showing the path)
const ancestors = await sqlQuery(
  `SELECT r.* FROM resources r
   JOIN resource_closure c ON r.id = c.ancestor_id
   WHERE c.descendant_id = ? AND c.depth > 0
   ORDER BY c.depth DESC`,
  [resourceId],
);

The trade-off is write complexity. When creating a resource, you must insert closure entries for all ancestors.

// When creating a new resource under parentId
await sqlRun(
  `INSERT INTO resource_closure (ancestor_id, descendant_id, depth)
   SELECT ancestor_id, ?, depth + 1
   FROM resource_closure
   WHERE descendant_id = ?
   UNION ALL
   SELECT ?, ?, 0`,
  [newResourceId, parentId, newResourceId, newResourceId],
);

Conclusion

The core trade-off is clear: you can pay the cost at read-time with recursive queries, or at write-time by maintaining a permissions index. Starting with the naive approach makes sense it's simpler to implement and debug.

But when your enterprise customers start experiencing slow load times, the RBAC approach becomes worth the added complexity. The performance difference is significant enough that it can eliminate the need for other workarounds like aggressive caching or complex pagination.

The main risk with pre-computed permissions is data getting out of sync. Plan for this by building a rebuild script from day one that can recompute all permissions from the source of truth. When bugs happen, and they will, you'll be glad you have it.