For Developers

Documentation of the development of the ACEP Data Catalog.

For more information and guides, visit the official CKAN Documentation

Data Sources Overview

Developing the Data Catalog

The ACEP Data Catalog is run on a VM hosted by RCS. Extensions can be updated by pushing to the acepportal-ckan GitHub repository. After pushing, changes take ~30 min to update on the main site.

Basic Docker Commands

List all running containers:

  • docker ps -a

There are 5 containers that run the data catalog

  • acep-ckan-cont
  • acep-db-cont
  • acep-redis-cont
  • acep-solr-cont
  • acep-datapusher-cont

Spin up Application

  • docker compose up

This will turn the terminal into an output stream for the docker containers.

TIP: I recommend keeping two terminal open: one for the output stream so you can see errors, and another to run other commands in

Rebuild and spin up containers:

  • docker compose up -d --build

Run this command after installing a new extension and adding it to the .env file.

Go into container:

  • docker exec -it [container_name] /bin/bash

Or if bash is not installed in the container:

  • docker exec -it [container_name] /bin/sh

Or if in a bash terminal:

  • docker exec -it [container_name] bash

Restart a container:

  • docker restart [container_name]

Restart the acep-ckan-cont container after making changes to non-HTML files. Changes in html files can be seen by refreshing the webpage.

Take Down Application

  • docker compose down

Clean up Project

  • docker compose down --rmi all -v --remove-orphans

This removes all containers, images, and volumes associated with a project. Only do this if you want to clean up your environment and reset the containers.

Creating a Local Instance

Creating a local version of the data catalog is a useful tool for developing and testing new features.

  1. Install Docker: https://www.docker.com/get-started/

  2. Clone the ACEP CKAN repository from Github: https://github.com/UAF-RCS/acepportal-ckan.git

  3. Create the .env file inside the main acepportal-ckan folder. Copy the contents from the .env.example file.

  4. Specify the location of the source files, storage files, backups, etc. in the .env file. You will move those files to these locations in the next steps. For example:

    # CKAN Mounts Directory
    CKAN_EXTENSIONS_MOUNT=./ckan-extension
    SRC_EXTENSIONS_PATH=/srv/app/src_extensions
    CKAN_SOURCE_MOUNT=./ckan-src/src
    CKAN_STORAGE_MOUNT=./ckan-src/storage
    CKAN_INI_MOUNT=./ckan-src/ckan.ini
  5. To create a replica of the current main Data Catalog, copy over the source files, storage files, ckan.ini file, and database backups from the VM. These files are located on the VM inside /opt/ckan/backups. Use scp to copy the files onto your machine. These backups are created everyday: replace [date] with the most recent date in the format yyyymmdd.

    Inside of acepportal-ckan/ckan-src run the following

    • scp user@portal.lab.acep.uaf.edu:/opt/ckan/backups/app_[date].tar.bz2 .
    • scp user@portal.lab.acep.uaf.edu:/opt/ckan/backups/app_storage_[date].tar.bz2 .
    • scp user@portal.lab.acep.uaf.edu:/opt/ckan/acepportal-ckan/ckan-src/ckan.ini
  6. Use tar to decompress the source and storage tar files

    • tar -jxvf app_[date].tar.bz2
    • tar -jxvf app_storage_[date].tar.bz2

    Decompressing the app_storage tar file should create a folder called ckan containing the folders resources, storage, and webassets. Rename the ckan folder to storage.
    This should result in the directory structure specified in ckan-src/README.txt

  7. Create a backups folder alongside the acepportal-ckan repository on your machine. Specify the name in the BACKUP_TO setting in the .env file.

    # Backups 
    BACKUP_TO=../../[backups folder name]
  8. Run the following commands inside the backups folder to copy over the database and datastore.

    • scp user@portal.lab.acep.uaf.edu:/opt/ckan/backups/ckandb_[date].tar .
    • scp user@portal.lab.acep.uaf.edu:/opt/ckan/backups/datastore_[date].tar .
  9. Inside of the ckan.ini file, set the ckan.site_url setting to the localhost url as so:

    ckan.site_url = http://127.0.0.1:5000
  10. Build the containers using,

    • docker compose up
  11. Once the containers are up, use the import_database.sh bash script to import the database.

    • bash import_database.sh
  12. Rebuild the CKAN search index.

    • docker exec -it acep-ckan-cont /bin/bash
    • cd /srv/app
    • ckan search-index rebuild

Create a New Extension

  1. Enter the acep-ckan-cont Docker container
  • docker exec -it acep-ckan-cont /bin/bash and run the following command
  • ckan generate extension -o /srv/app/src/ckan-extension This will create an extension in the ckan-extension folder which can be edited outside of the container.
  1. Add the extension name to the CKAN_PLUGINS list in the .env file.
  2. Run docker compose up -d --build ckan

Install an Extension

  1. Ensure that the extension supports CKAN 2.10.4 and Python 3.10 Clone the extension repository into the ckan-extension folder.
  2. Ensure that all dependencies for the extension are listed in requirements.txt or a similar file.
  3. Add the extension name to the CKAN_PLUGINS list in the .env file.
  4. Run docker compose up -d --build ckan

Updating the Main Site

To add a feature from your local instance to the main Data Catalog,

  1. Push the files to the acepportal-ckan GitHub repository.
  2. Wait about 30 min. for the changes to be pulled to VM.
  3. If you have added a new extension, SSH into the VM and add the extension name to the .env file.
  • ssh user@portal.lab.acep.uaf.edu
  • cd /opt/ckan/acepportal-ckan
  • vi .env
  1. After installing new extensions or making other changes, you may need to restart the acep-ckan-cont container to make them take effect. Inside the VM, run
  • docker restart acep-ckan-cont

Extensions

Currently Installed

ckanext-customtheme

Author: Jenae Matson
Purpose: Add custom theming and features for the CKAN instance, including

  • ACEP logos, colors, and fonts
  • Home page layout, images, and featured dataset
  • Changed font weight of Register button
  • Added tags to search page display
  • HTML file for About page text
  • Removed social media links from dataset/resources pages
  • Added support contact info to dataset sidebar
  • Added default blank option to add-to-group dropdown menu

ckanext-dcat

Link: https://github.com/ckan/ckanext-dcat
Purpose: Rework metadata to conform to DCAT standard.
Modifications:

  • The file schemas/acep_dcat_fields.yaml was created to define the metadata fields for the catalog.
  • The file templates/scheming/form_snippets/publisher.html was created to define the dynamic dropdown menu in the Publisher metadata field.

ckanext-faqpage

Author: Jenae Matson
Purpose: Create an FAQ page linked in the masthead with collapsible boxes for questions and answers.

ckanext-geoview

Link: https://github.com/ckan/ckanext-geoview
Purpose: Created resource views for geojson and other geo-data file types. We have implemented the OpenLayers Viewer.

ckanext-githubrepopreview

Link: https://github.com/DataShades/ckanext-githubrepopreview
Purpose: Provide a view for GitHub repository resources.
Modifications: This extension was created for an older version of CKAN, so the following changes were made to make it work with version 2.10:

  • In the file plugin.py, replace the line from lib import parse with the following
from urllib.parse import urlparse

def parse(input_url, some_flag):
    parsed_info = {}
    parsed_url = urlparse(input_url)
    domain = parsed_url.netloc
    path_parts = parsed_url.path.strip('/').split('/')
    
    parsed_info['domain'] = domain
    parsed_info['owner'] = path_parts[0] if len(path_parts) > 0 else None
    parsed_info['repo'] = path_parts[1] if len(path_parts) > 1 else None
    
    return parsed_info
  • In the file templates/githubrepo.html, delete the following lines
{%- block styles %}
    {% resource g.main_css[6:] %}
{% endblock %}
{%- block scripts %}
    {% resource 'base/main' %}
    {% resource 'base/ckan' %}
    {% if g.tracking_enabled %}
        {% resource 'base/tracking.js' %}
    {% endif %}
{% endblock -%}

ckanext-package-group-permissions

Link: https://github.com/salsadigitalauorg/ckanext-package-group-permissions
Purpose: Allows all editors and admins to add datasets to any group, without having to be added as members to each group.
Modifications: This extension was created and works with CKAN 2.9. This instance is version 2.10, so the extension requires some small modifications to work. The following changes were made to the original extension:

  • In the file plugin.py, change the member_create function to the following
def member_create(self, next_auth, context, data_dict):
    """
    This code is largely borrowed from /src/ckan/ckan/logic/auth/create.py
    With a modification to allow users to add datasets to any group
    :param context:
    :param data_dict:
    :return:
    """
    group = logic_auth.get_group_object(context, data_dict)

    authorized = False
    if not group.is_organization and data_dict.get('object_type') == 'package':
        authorized = helpers.user_has_admin_access(include_editor_access=True)

    if not authorized:
        # Fallback to the default CKAN behaviour
        return next_auth(context, data_dict)
    else:
        return {'success': True}
  • In the the file templates/package/group_list.html, add the line { h.csrf_input() } to the beginning of the two post forms, as follows
{% if groups %}
<form class="add-to-group" method="post">
    {{ h.csrf_input() }}
    ...
</form>
{% endif %}
{% if c.pkg_dict.groups %}
<form method="post">
    {{ h.csrf_input() }}
    ...
{% endif %}

ckanext-pdfview

Link: https://github.com/ckan/ckanext-pdfview
Purpose: Provide a view for pdf resources.

ckanext-restrictpublish

Author: Jenae Matson
Purpose: Restrict the ability to change the visibility of a dataset to admins only. Datasets posted by editors default to private.

ckanext-scheming

Link: https://github.com/ckan/ckanext-scheming
Purpose: Allows for the creation of alternate metadata templates (schemas) defined by .yaml or .json files.
Modifications: Some of the automatically calculated resource fields were manually re-added to be displayed. In the file templates/scheming/package/resource_read.html, below {%- block resource_license -%} add the following

{%- block resource_size -%}
<tr>
    <th scope="row">{{ _('Size') }}</th>
    <td>{{ res.size or _('unknown') }} bytes</td>
</tr>
{%- endblock -%}
{%- block resource_datastore -%}
<tr>
    <th scope="row">{{ _('Datastore active') }}</th>
    <td>{{ res.datastore_active or _('unknown') }}</td>
</tr>
{%- endblock -%}

Adding Alternate Schemas with ckanext-scheming

  1. Create a .yaml or .json file in the folder ckanext-scheming/ckanext/scheming to define the metadata schema. See extension documentation for more information and examples.
  2. In ckan.ini, add your schema(s) to the scheming.dataset_schemas config option. For example:
scheming.dataset_schemas = ckanext.scheming:arctic_dataset.json ckanext.scheming:geo_dataset.json
  1. The new dataset creation form is located at a url defined by the schema type name. For example, the creation form for datasets of type arctic-dataset is located at /arctic-dataset/new. You can define a new Add Dataset button using this new url.

Attempted Extensions

ckanext-spatial

Link: https://github.com/ckan/ckanext-spatial
Purpose: This extension adds the ability to search for datasets on a map widget, as well as a dataset extent map widget on the dataset page, provided correct geospatial metadata.
Problems: This extension is not currently installed due to the following,

  • Configuring map tiles for ckanext-spatial caused the map tiles for ckanext-geoview to disappear.
  • Datasets with the required spatial metadata were not searchable on the map search widget, although the dataset extent widet worked correctly.

ckanext-oidc-pkce

Link: https://github.com/DataShades/ckanext-oidc-pkce/tree/master
Purpose: This extension allows for users to be authenticated through an external application when they login.
Problems: Ideally users on the ACEP Data Catalog would be able to login using their UA login credentials through Google Authentication. This extension installs correctly, but does not seem to support Google Authentication.