Brian Bancroft

Using Postgres Views to Build Datasets for QGIS

04 September, 2018 | Tutorial

Recently, I wanted to show a demographic breakdown of key parts of a town on Vancouver Island to my wife, but was unsure of what to use to display, and quickly. I had this database containing all of the longform data from the 2016 short form census that I build using this handy repo, and in the same database, I had the shapefile import for the Dissemination Areas across Canada.

This tutorial shows you how to use Postgres Views to capture data views, how to use it specifcially with the output of the census database, how to take it one step further with Materialize Views, and how to use the results in QGIS.

Figuring it Out

A Postgres view is a queryable snapshot of a specific query. Another way to think about it is that each and every time you make a query in the database, you’re creating a temporary table. When you build a view, this table that results from a normal query persists either until you remove it, or until the end of the session (if the option to do so is selected). This is useful in prototyping data, but not so much in retrieval in a production environment as it carries out the entire base query each time the view or a subset of it is called.

Building it

The basics of view creation are as easy as prepending a standard SELECT-query with CREATE VIEW. Here’s an abstract example:

CREATE VIEW test_view(
col_1,
col_2,
col_3
) AS (
SELECT
--- SQL Query goes here
)

The big thing is that the output of the select-query needs to output three columns, ideally aliased AS col_1col_2, and col_3. This can modified in many ways. But we’re going to do it now with the census database built from the example in the start.

Setting up the Sample Query

In this query, what we want to do is build a view that contains the proportion of couples who have children in disssemination areas across Canada. For this, we’re going to need to build alias tables for couples_with_children, and couples_without_children, and we’re going to need to tee that up with the geographic data for Canada.

SELECTING COUPLES WITH CHILDREN

In the normalized census setup, we have a column called dim, which stands for the dimension, or line-item. If you look at any geographic census profile at the dissemintation area level, you’ll see a table. Each one of these line items corresponds to a dim at a given table.

In this case, dim=81 represents the total number of couples in a given area:

SELECT
geo_id AS dauid,
dim,
value_total AS total
FROM
short_form_2016_da
WHERE
dim = 81;

This took around ~12222ms 

sql-result-1

SELECTING COUPLES WITHOUT CHILDREN

Second verse, same as the first. Just instead of the dim ID being 81, it’s 83.

SELECT
geo_id AS dauid,
dim,
value_total AS total
FROM
short_form_2016_da
WHERE
dim = 83;

This also took around ~12222ms 

sql-results-2

Setting up the Join

We have two queries up now. Here, we want to join it with a geographic data table (which is also created in the census repo). We do that by building the following:

WITH couples_total AS(
SELECT
geo_id AS dauid,
dim,
value_total AS total
FROM
short_form_2016_da
WHERE
dim = 81
AND
value_total > 0
),
couples_children AS(
SELECT
geo_id AS dauid,
dim,
value_total AS total
FROM
short_form_2016_da
WHERE
dim = 83
AND
value_total > 0
)
SELECT
geo.ogc_fid AS id,
total.dauid AS dauid,
(100 * children.total / total.total) AS ratio,
geo.wkb_geometry AS geom
FROM
couples_total total
JOIN
couples_children children
ON
total.dauid = children.dauid
JOIN
canada_da_2016 geo
ON
total.dauid = geo.dauid
;

When I run this query, this is what I get:

sql-results-3

It takes awhile (~26s) to process. Keep in mind that if you try this at home, you’re dealing with the entirety of Canada!

The last thing is that we’re going to want to limit the results to where the total is greater than zero. How would you limit that? I’ll leave this as a thought exercise. The answer is down in the page!

We’re good now. We can set up the view…

Setting up the View

Okay. We have a query now. At this point, you’ll recall that creating a view as easy as wrapping the query around CREATE VIEW:

CREATE VIEW test_view(
col_1,
col_2,
col_3
) AS (
SELECT
--- SQL Query goes here
)

Here’s how I do this for this particular query:

CREATE VIEW ratio_childbearing_couples(
id,
dauid,
ratio,
geom
)
AS (
WITH couples_total AS(
SELECT
geo_id AS dauid,
dim,
value_total AS total
FROM
short_form_2016_da
WHERE
dim = 81
AND
value_total > 0
),
couples_children AS(
SELECT
geo_id AS dauid,
dim,
value_total AS total
FROM
short_form_2016_da
WHERE
dim = 83
AND
value_total > 0
)
SELECT
geo.ogc_fid AS id,
total.dauid AS dauid,
(100 * children.total / total.total) AS ratio,
geo.wkb_geometry AS geom
FROM
couples_total total
JOIN
couples_children children
ON
total.dauid = children.dauid
JOIN
canada_da_2016 geo
ON
total.dauid = geo.dauid
);

Once you do this, you’ll notice that it takes almost no time to build the view. In my case, it took 27ms.

Verifying the View

At this point, you can verify the data as if this is a table.

SELECT * FROM ratio_childbearing_couples;

You should receive the results, but it takes a whopping 26s! This is good for one-offs to determine whether a view is good when testing, but it’s no suitable way to subject people to production data…

Materialized View

This is a good time to bring up Materialized Views. This is similar to a view, but it saves the results of the query as the view, as opposed to executing the query each time the VIEWis called. If you’re building for production, this is an excellent tool for displaying data, and we’re going to do this now…

First, we’re going to drop the plain view:

DROP VIEW ratio_childbearing_couples;

Next, we’re going to build the materialized view. This is going to take awhile:

CREATE MATERIALIZED VIEW ratio_childbearing_couples(
id,
dauid,
ratio,
geom
)
AS (
WITH couples_total AS(
SELECT
geo_id AS dauid,
dim,
value_total AS total
FROM
short_form_2016_da
WHERE
dim = 81
AND
value_total > 0
),
couples_children AS(
SELECT
geo_id AS dauid,
dim,
value_total AS total
FROM
short_form_2016_da
WHERE
dim = 83
AND
value_total > 0
)
SELECT
geo.ogc_fid AS id,
total.dauid AS dauid,
(100 * children.total / total.total) AS ratio,
geo.wkb_geometry AS geom
FROM
couples_total total
JOIN
couples_children children
ON
total.dauid = children.dauid
JOIN
canada_da_2016 geo
ON
total.dauid = geo.dauid
);

What we are doing is creating a static table that contains not the query instructions to capture the data, but the content across all of Canada. If space isn’t a problem in your database, this is ideally what you want…

SELECT 54893 Time: 20863.456 ms (00:20.863)

Bam! you’ve created your first materialized view. Testing it should let you see how much faster this is at runtime:

SELECT * FROM ratio_childbearing_couples;

For me, it took only 2 seconds, which is a big improvement from 20!

Conclusion

In this post, we’ve done a lot!

  • We’ve built some queries,

  • We’ve also used them to make aliased tables,

  • We’ve used their results as a VIEW; and

  • We’ve taken this view, and made it into a MATERIALIZED VIEW

Congratulations if you’ve followed through this far. This is the sort of work you can carry out if you want to build and check out demographic data quickly, and share it across your team!

home
portfolio
posts
contact