Dataviews for Analytics

What are dataviews?

Dataviews allow us to maintain an Elentra "datamart" useful for analysts, data pipelines, and connecting third-party business intelligence tools.

  • Two sets of SQL views: vs_* and vc_*.

  • Simple views (v_s_*) are straight representations of the underlying tables, with soft deletes factored out.

  • Complex views (v_c_*) join together two or more tables to create a useful analytical table, for example a table representing individual A&E evaluation form responses.

We can also, optionally, use dataviews to provide a governance layer of security and privacy as well as to maintain institutionally-developed data dictionaries by enforcing column naming conventions.

Example: We can give one group of users or set of models access to evaluation data without also giving them access to assessment data, even though the data points are read out of the same underlying tables.

Example: We can choose to refer to "Curriculum Tracks" as "Regional Campuses" or some other descriptor that better aligns with our institutional semantics.

  1. Create a new elentra_analytics database on your primary and staging database servers.

You will need to log into your database servers with a root account that has privileges to create databases to perform this action.

CREATE DATABASE `elentra_analytics`;
SELECT USER, HOST FROM mysql.db WHERE USER = 'elentra' AND HOST = '%' AND DB = 'elentra\_analytics';
GRANT CREATE ROUTINE, CREATE VIEW, ALTER, SHOW VIEW, CREATE, ALTER ROUTINE, EVENT, INSERT, SELECT, DELETE, TRIGGER, REFERENCES, UPDATE, DROP, EXECUTE, LOCK TABLES, CREATE TEMPORARY TABLES, INDEX ON `elentra\_analytics`.* TO 'elentra'@'%';
REVOKE GRANT OPTION ON `elentra\_analytics`.* FROM 'elentra'@'%';
  1. Add the following configuration to config.inc.php for each environment you wish to support (e.g., config-staging.inc.php, config-production.inc.php).

'analytics_database' => 'elentra_analytics',

If you do not specify an analytics database, the dataviews will be housed in your elentra_me database. Best practice is to go head and setup a new database schema, as it will leave ad-hoc backups in a more portable state.

  1. Merge the new line from core settings.inc.php into your environment(s) settings files (e.g., settings-production.inc.php):

define("ANALYTICS_DATABASE", $config->database->analytics_database ?? $config->database->entrada_database); // The name of the database to use for dataviews. Falls back to Elentra ME if not defined in config.inc.php.

Execute dataviews

You can add (or remove) dataviews your environment manually by running:

php elentra dataviews --up
php elentra dataviews --down

You can also add this to your deployment (directly after the migrations task is a good place).

SQL views need to be refreshed when the underlying tables change, so you will want to add this if you move beyond the evaluation stage.

desc('Once: Refresh Dataviews (on first host in stage only)');
task('deploy:dataviews', function () {
    // If it's not the first-run, refresh the dataviews and show the tty output.
    if (!input()->getOption('first-run')) {
        run('php {{release_path}}/elentra dataviews --up', ['timeout' => null, 'tty' => true]);
    }
})->once();

Optional: Extend the "scrub.sh" script

The utils directory (see below) contains a replicate.sh script which is an extension of the Elentra scrub.sh script used to move production data to staging or other target environments.

You will want to configure replicate.sh and use it to replace your existing scrub.sh.

  • Support for replacing DEFINER permissions to make the .sql backups portable across servers with different credentials

  • Support for targeting a third "reporting" environment (production>staging PLUS either production>reporting OR staging>reporting)

  • NOTE: a reference line is included to exclude the statistics table from the reporting run.

  • It is commented out, but useful if you want to speed up mysqldump/restore for testing purposes.

Working with dataviews

  • The dataviews directory lives in www-root/core/library/Dataviews

  • The local directory is not maintained in the upstream Consortium Core repository, and is available for institutions to optionally wrap "core" views with their own column naming conventions or to develop institutional-specific views.

  • The utils directory contains a number of useful scripts for generating views, grant statements, etc.

dataviews/
    core/
        up/
            v_c_*_views.up.sql
            v_s_*_views.up.sql
        down/
            v_c_*_views.down.sql
            v_s_*_views.down.sql
    /local
        up/
            my_views.up.sql
        down/
            my_views.down.sql
    /utils
        columns.sql     (generate explicit column names for a given table)
        grants.sql      (generate statements to grant access to views matching a prefix pattern)
        readme.md       (this file!)
        replicate.sh    (extensions to scrub.sh)
        roles.sql       (create a set of db roles to match your governance framework)
        users.sql       (create db users and grant roles - useful in your reporting server for data devs or pipeline auth)
        views.sql       (generate view statements using a prefix for table names matching a pattern)

Note to query authors: The string AUTH_DATABASE in the .sql files is substituted with the Elentra AUTH_DATABASE constant before the queries are executed.

  • So you can write a query like SELECT firstname FROM AUTH_DATABASE.user_data and it will do the right thing.

When targeting tables in the elentra_me database no schema prefix is necessary.

  • So you can write a query like SELECT event_id FROM events and it will also do the right thing, regardless of which schema you are using to house your views.

However, CREATE VIEW should use the target schema, and this is done for you if you use the views.sql utility

  • For example: CREATE OR REPLACE SQL SECURITY DEFINER VIEW ANALYTICS_DATABASE.v_s_courses

Last updated