Dataviews for Analytics
Dataviews allow us to maintain an Elentra "datamart" useful for analysts, data pipelines, and connecting third-party business intelligence tools.
- Two sets of SQL views: vs_* and vc_*.
- Simple views (v_s_*) are straight representations of the underlying tables, with soft deletes factored out.
- Complex views (v_c_*) join together two or more tables to create a useful analytical table, for example a table representing individual A&E evaluation form responses.
We can also, optionally, use dataviews to provide a governance layer of security and privacy as well as to maintain institutionally-developed data dictionaries by enforcing column naming conventions.
Example: We can give one group of users or set of models access to evaluation data without also giving them access to assessment data, even though the data points are read out of the same underlying tables.
Example: We can choose to refer to "Curriculum Tracks" as "Regional Campuses" or some other descriptor that better aligns with our institutional semantics.
- 1.Create a new
elentra_analytics
database on your primary and staging database servers.
You will need to log into your database servers with a
root
account that has privileges to create databases to perform this action.CREATE DATABASE `elentra_analytics`;
SELECT USER, HOST FROM mysql.db WHERE USER = 'elentra' AND HOST = '%' AND DB = 'elentra\_analytics';
GRANT CREATE ROUTINE, CREATE VIEW, ALTER, SHOW VIEW, CREATE, ALTER ROUTINE, EVENT, INSERT, SELECT, DELETE, TRIGGER, REFERENCES, UPDATE, DROP, EXECUTE, LOCK TABLES, CREATE TEMPORARY TABLES, INDEX ON `elentra\_analytics`.* TO 'elentra'@'%';
REVOKE GRANT OPTION ON `elentra\_analytics`.* FROM 'elentra'@'%';
- 2.Add the following configuration to config.inc.php for each environment you wish to support (e.g., config-staging.inc.php, config-production.inc.php).
'analytics_database' => 'elentra_analytics',
If you do not specify an analytics database, the dataviews will be housed in your
elentra_me
database. Best practice is to go head and setup a new database schema, as it will leave ad-hoc backups in a more portable state.- 3.Merge the new line from core settings.inc.php into your environment(s) settings files (e.g., settings-production.inc.php):
define("ANALYTICS_DATABASE", $config->database->analytics_database ?? $config->database->entrada_database); // The name of the database to use for dataviews. Falls back to Elentra ME if not defined in config.inc.php.
You can add (or remove) dataviews your environment manually by running:
php elentra dataviews --up
php elentra dataviews --down
You can also add this to your deployment (directly after the migrations task is a good place).
SQL views need to be refreshed when the underlying tables change, so you will want to add this if you move beyond the evaluation stage.
desc('Once: Refresh Dataviews (on first host in stage only)');
task('deploy:dataviews', function () {
// If it's not the first-run, refresh the dataviews and show the tty output.
if (!input()->getOption('first-run')) {
run('php {{release_path}}/elentra dataviews --up', ['timeout' => null, 'tty' => true]);
}
})->once();
The
utils
directory (see below) contains a replicate.sh
script which is an extension of the Elentra scrub.sh
script used to move production data to staging or other target environments.You will want to configure
replicate.sh
and use it to replace your existing scrub.sh
.- Support for replacing DEFINER permissions to make the .sql backups portable across servers with different credentials
- Support for targeting a third "reporting" environment (production>staging PLUS either production>reporting OR staging>reporting)
- NOTE: a reference line is included to exclude the statistics table from the reporting run.
- It is commented out, but useful if you want to speed up mysqldump/restore for testing purposes.
- The
dataviews
directory lives inwww-root/core/library/Dataviews
- The
local
directory is not maintained in the upstream Consortium Core repository, and is available for institutions to optionally wrap "core" views with their own column naming conventions or to develop institutional-specific views. - The
utils
directory contains a number of useful scripts for generating views, grant statements, etc.
dataviews/
core/
up/
v_c_*_views.up.sql
v_s_*_views.up.sql
down/
v_c_*_views.down.sql
v_s_*_views.down.sql
/local
up/
my_views.up.sql
down/
my_views.down.sql
/utils
columns.sql (generate explicit column names for a given table)
grants.sql (generate statements to grant access to views matching a prefix pattern)
readme.md (this file!)
replicate.sh (extensions to scrub.sh)
roles.sql (create a set of db roles to match your governance framework)
users.sql (create db users and grant roles - useful in your reporting server for data devs or pipeline auth)
views.sql (generate view statements using a prefix for table names matching a pattern)
Note to query authors: The string
AUTH_DATABASE
in the .sql files is substituted with the Elentra AUTH_DATABASE constant before the queries are executed.- So you can write a query like
SELECT firstname FROM AUTH_DATABASE.user_data
and it will do the right thing.
When targeting tables in the
elentra_me
database no schema prefix is necessary.- So you can write a query like
SELECT event_id FROM events
and it will also do the right thing, regardless of which schema you are using to house your views.
However,
CREATE VIEW
should use the target schema, and this is done for you if you use the views.sql
utility- For example:
CREATE OR REPLACE SQL SECURITY DEFINER VIEW
ANALYTICS_DATABASE.v_s_courses
Last modified 1mo ago