learnings, nerdisms, bicycles
securing web applications is hard. developers must consider a diverse set of risks when delivering systems that handle sensitive data. a cursory glance over the owasp top 10 reveals that securing a system is not a matter of "installing this" or "turning this feature on." rather, securing an application involves auditing and evaluating design patterns, which is often difficult due to lack of process formalities. row-level security (rls) is just one design pattern that can improve part of an application's data-security story. we'll explore how.
really easy.
we haven't made our technical lives much easier on some of these fronts. let's focus on just one of the OWASP risks, broken access control. two of the many possible side-effects of bad access control is exposure of data and undesired mutation of data. historically, software engineering teams have:
bummer. so what does bad access control look like in practice? here are some potential snippets:
router.route("/api/person?fields=birthday", handler)
# ^ whoops, can anyone query everyone's birthday?
# what about authentication?
@authenticated(roles="member")
router.route("/api/person?fields=birthday", handler)
# ^ whoops, this may limit who can access the
# data to memebers only, but did enforcing
# authentication fix the risk of over-exposure?
# what about authorization?
# handler.py - option 1
def handler():
return psycopg.query("select birthday from persons")
# ^ whoops, we're still exposing everyone's
# birthday, even if the user is an authenticated
# member. authorization needs a where clause
in just a few lines of code, we saw a variety of naive places were developers
may have easily over exposed data. it's worth noting that there are--more
commonly than not--layers in our applications, large call stacks separating our
entrypoints from our data resolvers. this gap often makes validation of the
resultant queries difficult. the @authenticated
decorator could feasibly give
a false positive to junior developers, implying that this data flow is "secure",
when--as discussed before--there is much more to securing a service than meets
the eye.
this example is purely hypothetical, and more simple than compelling for what comes next. make no mistake though, as it appears that these types of security leaks are rampant on the open web. we found a simiar leak at the office last month from a team of expert and respected engineers.
broken access control is challenging and real.
rls let's you protect your data records within the db. until recently,
database access models did not support access constraints at the record level.
access constraints were generally available on higher level primitives. this
means that you could set RBAC roles for a query, but if a user was querying say
the persons
tables, she'd have access to all persons, which is problematic
as demonstrated in our above snippets.
"so is row-level security is going to save the day?"
yup, pretty much. what i'm hear to say is that rls is a really good idea, and you ought at least consider it at design time if you own a data layer.
if i can get you to click this postgres rls link and expose you to the concept, that alone is a big win.
let's walk through an example. you can also complete this exercise whilst following along with the super-commerce codebase. feel free to clone it down, execute it, and tinker as we read through it below.
here's the plan. we will:
our app will have customers that can make purchases of items from a store.
create table items (
id serial unique,
name varchar
);
create table customers (
id serial unique,
first_name varchar
);
create table purchases (
id serial,
item_id int not null references items (id),
customer_id int not null references customers (id)
);
postgres=# \d purchases
Table "public.purchases"
Column | Type | Collation | Nullable | Default
-------------+---------+-----------+----------+---------------------------------------
id | integer | | not null | nextval('purchases_id_seq'::regclass)
item_id | integer | | not null |
customer_id | integer | | not null |
Foreign-key constraints:
"purchases_customer_id_fkey" FOREIGN KEY (customer_id) REFERENCES customers(id)
"purchases_item_id_fkey" FOREIGN KEY (item_id) REFERENCES items(id)
all looks correct!
it's ok for anyone to query all of the items
in the database. the more they
see the better! our store wants people to be able to buy everything! however,
thus, let us apply rls on those two tables:
-- we 1st need a non-super-user role for our app
create user commerce;
-- the following statements are the core of what
-- we need to have rock solid, high perf rls!
alter table customers enable row level security;
alter table purchases enable row level security;
create policy customers_crud on customers
using (id = get_commerce_user_id());
-- allow the customer to read and update her record.
-- in our demo, the commerce_user_id is passed into
-- the query context via postgres config.
-- more on that later!
create policy purchases_crud on purchases
using (customer_id = get_commerce_user_id() or is_admin())
with check (is_admin());
-- allow the customer to read his purchase records,
-- but only allow an admins to edit/create records.
-- @warn - `is_admin` is in the using clause as well because
-- `insert ... returns` sql statements get checked
-- with `with check` _and_ `using` conditions!
in a nutshell, one must:
select
query inclusion, the using(...)
conditional must yield true
update/insert
permissions, the with check(...)
conditional must
yield true
in truth, rls policies aren't quite as simple as that, but that's 80% of what you need to know!
you may have noticed that my conditionals referenced a couple of pgplsql
functions. the two functions, is_admin()
and get_commerce_user_id()
, may be
viewed in the appendix at the end of the post. omitted for brevity.
-- finally make sure our user can access our tables/fns/etc.
-- this is standard database bootstrapping stuff.
grant all on all tables in schema public to commerce;
grant all on all sequences in schema public to commerce;
grant execute on all functions in schema public to commerce;
revoke update (id) on table customers from public;
revoke update (id) on table items from public;
revoke update (id) on table purchases from public;
insert into items (name) values
('nintendo64'),
('pizza'),
('cacao-nibs'),
('fancy-water');
insert into customers (first_name) values
('c-bear'),
('jamal'),
('jessica'),
('chaz');
insert into purchases (customer_id, item_id) values
(1, 4),
(1, 2), -- now that's a meal, c-bear! pizza & la croix. #hipster
(2, 1), -- mario 64, great game. enjoy, jamal!
(3, 3), -- tasty nibs for jessica
(4, 3); -- tasty nibs for chaz
now that we have data loaded in, let's test if rls is really protecting our datas. let's log into the database with our non-super role:
$ psql --host localhost -p 5432 --username=commerce postgres
now, let's try and access all of the records in each table:
postgres=> select * from customers;
id | first_name
----+------------
(0 rows)
postgres=> select * from purchases;
id | item_id | customer_id
----+---------+-------------
(0 rows)
postgres=> select * from items;
id | name
----+-------------
1 | nintendo64
2 | pizza
3 | cacao-nibs
4 | fancy-water
(4 rows)
good! we see items, but because the transaction has no user_id
set, the rls
protected customers
& purchases
queries yielded no resuts.
what if we set a user_id
and run our queries again with the user c-bear
configured? i know c-bear
has id === 1
just by inference based on seed data
insert order. more commonly, a user id would be provided by an identity
provider, auth service, oauth thingy, jwt, or perhaps even provided by the same
database. for the purposes of demonstration, it doesn't matter where the
user_id
comes from--it just matters that all parts of the application/system
have a common way to reference users.
postgres=> select set_config('commerce.user_id', '1', false);
set_config
------------
1
(1 row)
postgres=> select * from purchases;
id | item_id | customer_id
----+---------+-------------
1 | 4 | 1 -- c-bear's fancy-water
2 | 2 | 1 -- c-bear's pizza
(2 rows)
postgres=> select * from customers;
id | first_name
----+------------
1 | c-bear
(1 row)
this is awesome! now we see records that belong just to c-bear
.
to summarize, by default, we saw that our queries yielded no data. safety first!
once we specified the user as c-bear, we saw only his data. we observed the
rls policies active in all queries. we saw no data leak through for jamal,
jessica, or chaz anywhere, anytime. what's also great about this approach is
that we don't have to write carefully crafted sql to get specific user datas. i
just wrote a basic select * from <table>
and i got a reduced, user-centric
set. we probably should write it for performance reasons (e.g. wire in
effective where
clauses), but we don't need to.
now, how do i feel about the design?
this access model is robust. application code simply needs to provide a user and the db handles the rest. there's no risky application control flows dictating resultant data sets--just flat conditionals at the record level. i concede that it is feasible to mess up configuring rls policies to begin with. i argue, however, that if culturally a team sets a precedence that data is secured at the database level, it's intuitive for engineers to see & expect policies in migrations and in table defintions. further, policies intrinsically tend to be much more robust than application code. policies are often type-checked, cross-referenced against database columns at migration time, and have very little logic. thus, engineers have some positive data points suggesting robustness is achieved.
auditing table policies is also much easier than auditing application code for security patterns. policies may be reviewed by simply scanning a table+policy list. auditing application code takes creativity and control flow traversals--sounds hard!
we did a bunch of great work in the database. does it pay dividends in application code? you know it does!
here's the plan. we will:
/graphql
and /graphiql
routes to the app
select
s) and mutations (i.e.
insert/update
s) ready to roll from the serverpgSettings
, which ultimately calls
postgres' set_config(...)
function for all of our rls key-values for
each query!the app is so small, here's the whole thing:
import Koa from "koa";
import { postgraphile as createPostgaphileMiddleware } from "postgraphile";
import { IncomingMessage } from "http";
const log = (...args: any[]) => console.log("[super-commerce]", ...args);
async function start() {
const app = new Koa();
app.use(
createPostgaphileMiddleware(
"postgres://commerce@localhost:5432/postgres",
"public",
{
graphiql: true,
pgSettings: async (req) => {
const { userId, isAdmin } = await getUser(req);
return {
"commerce.is_admin": isAdmin,
"commerce.user_id": userId,
};
},
},
),
);
app.listen(3000);
log("server listening on 3000");
log("visit http://localhost:3000/graphiql to play");
}
const getUser = async (req: IncomingMessage) => {
// in a real app, you'd probably have an auth middleware,
// which may do something like:
// const userId = await authClient.decodeAndValidatedUser({
// jwt: req.headers.jwt
// })
// @warning - *never actually do this!*
// for simplicty sake--let's pretend that the
// authorized user_id is in the cookie.
// 1 c-bear, 2 jamal, 3 jessica, 4 chaz
const cookie = req.headers.cookie || "";
const userId = cookie.match(/user_id=(\d+)/i)
? parseInt(cookie.match(/user_id=(\d+)/i)![1], 10)
: 0;
return {
isAdmin: !!cookie.match(/is_admin/i),
userId,
};
};
start();
50 LOC, not so bad! when the server runs, it also exposes a graphiql instance, so we can play with our data!
let's play around.
document.cookie='user_id=1'
to fake
authorize as c-bear{
allCustomers {
edges {
node {
id
firstName
}
}
}
allPurchases {
edges {
node {
itemByItemId {
name
}
customerByCustomerId {
firstName
}
}
}
}
allItems {
edges {
node {
id
name
}
}
}
}
as shown in the query results & screenshot above, it's clear that c-bear can only see his purchased pizza and fancy-water from the store.
how about trying as a different user?
user_id
cookie =2
, which corresponds to jamalin a real commerce setting, only the commerce point of sale (pos) should log a
transaction. a user otherwise could log a transaction without actually paying,
initiate a return, and thus steal money! let's declare that the pos system is an
admin
user. in practice, the pos system would probably have a reduced role.
document.cookie=''
to reset us to a clean
statemutation {
createPurchase(input: { purchase: { customerId: 1, itemId: 1 } }) {
purchase {
id
}
}
}
we expect this to fail due to a rls violation. to add a purchase, we needed to be an admin. sure enough, the query shows a "new row violates row-level security policy" error message:
document.cookie='is_admin=1'
{
"data": {
"createPurchase": {
"purchase": {
"id": 6
}
}
}
}
very little access control is present in our app, and that's a feature. we
need to make an authc/authz call to validate our user, get her id & roles (e.g.
is_admin
), and pump that result directly into our db calls. this is the only
flow where falible human programmers can mess up our in-database security,
perhaps by passing the wrong user. such a failure is unlikely, and likely easy
to rectify if a bug were to ever sprout.
try it out!
create or replace function is_admin() returns bool as $$
begin
if current_setting('commerce.is_admin', true) = 'true' then
return true;
end if;
return false;
end;
$$ language plpgsql stable security definer;
create function get_commerce_user_id() returns int as $$
declare
user_id_str varchar;
begin
user_id_str := current_setting('commerce.user_id', true);
if user_id_str is null or user_id_str = '' then
return 0; -- postgres serial ids start at 1
end if;
return user_id_str::int;
end;
$$ language plpgsql stable security definer;