Setting up Lochness

The following items below are the step by step instructions to semi-automatically setup the lochness environment. It is specific for AMP-SCZ project, but it could also work for any other projects, if their main subject database is maintained with REDCap or RPMS system.

If you would like to set up your environment from scratch, please see the pages for

1. Create a template directory

lochness_create_template.py will help you create a starting point for your lochness. config.yml, PHOENIX, and keyring files, as well as two bash scripts for encrypting the keyring file and running the sync, will be created.

For Pronet network

lochness_create_template.py \
    --outdir /data/pronet/data_sync_pronet \
    --studies PronetLA PronetOR PronetBI PronetNL PronetNC PronetSD \
              PronetCA PronetYA PronetSF PronetPA PronetSI PronetPI \
              PronetNN PronetIR PronetTE PronetGA PronetWU PronetHA \
              PronetMT PronetKC PronetPV PronetMA PronetCM PronetMU \
              PronetSH PronetSL \
    --sources redcap upenn box xnat mindlamp \
    --email kevincho@bwh.harvard.edu \
    --poll_interval 86400 \
    --ssh_host 123.456.789 \
    --ssh_user kc244 \
    --lochness_sync_send \
    --s3 \
    --s3_selective_sync surveys mri phone eeg actigraphy

For Prescient network

lochness_create_template.py \
    --outdir /data/prescient/data_sync_prescient \
    --studies PrescientME PrescientSG PrescientAD PrescientAM PrescientBM \
              PrescientCL PrescientCP PrescientHC PrescientJE PrescientGW \
              PrescientLS \
    --sources rpms upenn mediaflux mindlamp \
    --email kevincho@bwh.harvard.edu \
    --poll_interval 86400 \
    --ssh_host 123.456.789 \
    --ssh_user kc244 \
    --lochness_sync_send \
    --s3 \
    --s3_selective_sync surveys mri phone eeg actigraphy

Note

Add --enter_password option if you want lochness_create_template.py to add your credentials to the template keyring file.

Running the command above will create a directory which will look like this

/data/pronet/data_sync_pronet
├── 1_encrypt_command.sh
├── 2_sync_command.sh
├── PHOENIX
│   ├── GENERAL
│   │   ├── PronetAB
│   │   │   └── PronetAB_metadata.csv
│   │   ├── ...
│   │   └── PronetYA
│   │       └── PronetYA_metadata.csv
│   └── PROTECTED
│       ├── PronetAB
│       ├── ...
│       └── PronetYA
├── config.yml
├── lochness.json
└── pii_convert.csv

Note

To see detailed options of lochness_create_template.py

lochness_create_template.py -h

Step 1 completed.

2. Edit credentials in the template keyring file

Connecting to various external data sources (REDCap, XNAT, Box, etc.) often requires a myriad of connection details e.g., URLs, usernames, passwords, API tokens, etc. Lochness will only read these pieces of information from an encrypted JSON file that we refer to as the keyring.

These information needs be added to the lochness.json template

cd /data/pronet/data_sync_pronet  # the template directory created above
vim lochness.json

lochness.json file looks like below. Add credentials to the fields marked with *****

{
  "lochness": {
    "REDCAP": {
      "PronetLA": {
        "redcap.Pronet": [
          "Pronet"
        ],
        "redcap.UPENN": [
          "UPENN"
        ]
      },
      ...,
    },
    "SECRETS": {
      "PronetLA": "LOCHNESS_SECRETS",
      ...,
    }
  },
  "redcap.UPENN": {
    "URL": "*****",
    "API_TOKEN": {
      "UPENN": "*****"
    }
  },
  "redcap.Pronet": {
    "URL": "*****",
    "API_TOKEN": {
      "Pronet": "*****"
    }
  },
  "xnat.PronetLA": {
    "URL": "*****",
    "USERNAME": "*****",
    "PASSWORD": "*****"
  },
  ...,
  "box.PronetLA": {
    "CLIENT_ID": "*****",
    "CLIENT_SECRET": "*****",
    "ENTERPRISE_ID": "*****"
  },
  ...,
  "mindlamp.PronetLA": {
    "URL": "*****",
    "ACCESS_KEY": "*****",
    "SECRET_KEY": "*****"
  },
  ...,
}

Note

If you have used --enter_password option when creating the template files, just check through your credentials if they are correctly entered to the keyring.json file.

Example of completed lochness.json

{
  "lochness": {
    "REDCAP": {
      "PronetLA": {
        "redcap.Pronet": [
          "Pronet"
        ],
        "redcap.UPENN": [
          "UPENN"
        ]
      },
      ...,
    },
    "SECRETS": {
      "PronetLA": "LOCHNESS_SECRETS",
      ...,
    }
  },
  "redcap.UPENN": {
    "URL": "https://redcap.med.upenn.edu",
    "API_TOKEN": {
      "UPENN": "BC6BEF2D2369BC8FE1233CAAAB20378D"
    }
  },
  "redcap.Pronet": {
    "URL": "https://redcapynh-p11.ynhh.org"
    "API_TOKEN": {
      "Pronet": "AFBDCCD55934EE947A388541EED6A216"
    }
  },
  "xnat.PronetLA": {
    "URL": "https://xnat.med.yale.edu",
    "USERNAME": "kcho",
    "PASSWORD": "whrkddlr8*90"
  },
  ...,
  "box.PronetLA": {
    "CLIENT_ID": "e19fltqp9f9ftv4dydqjius4w20072cr",
    "CLIENT_SECRET": "LrkDwYZvA49Q4dXVGv3g4aaSy4SQRobz",
    "ENTERPRISE_ID": "756591"
  },
  ...,
  "mindlamp.PronetLA": {
    "URL": "mindlamp.orygen.org.au",
    "ACCESS_KEY": "kcho",
    "SECRET_KEY": "0c5b0a5af972b2a1b2d6cd299dc37703c22e8ddd5dfd15f0d83ca7a1cb8bcce7"
  },
  ...,
}

Note

If you’re using Google SMTP in sending out email, you need to add "email_sender_pw": "PasswordForYourGoogleAccount"

For an example,

"SECRETS": {
  "PronetLA": "LOCHNESS_SECRETS",
  ...,
}
"email_sender_pw": "aaoiweytyEfhag189e7"

3. Encrypt lochness.json to make a keyring file

Once required credentials are added to the template lochness.json keyring file, it must be encrypted using a passphrase. At the moment, Lochness only supports encrypting and decrypting files (including the keyring) using the cryptease library. This library should be installed automatically when you install Lochness, but you can install it separately on another machine as well.

Encrypt the temporary keyring file by running

crypt.py --encrypt lochness.json -o .lochness.enc

Note

Or you could run 2_sync_command.sh, which contains the same command

bash 1_encrypt_command.sh

Attention

I’ll leave it up to you to decide on which device you want to encrypt this file. I will only recommend discarding the decrypted version as soon as possible.

4. Edit configuration file

config.yml file contains details of options to be used in Lochness.

vim config.yml

Project name

Name of the project. This string will be included in the daily email summary.

project_name: ProNET
or
project_name: Prescient

REDCap or RPMS database column names

Update names of the REDCap or RPMS columns that contain unique subject ID and consent date of each stubject.

For REDCap

redcap_id_colname: chric_record_id
redcap_consent_colname: chric_consent_date

For RPMS

RPMS_PATH: /mnt/prescient/RPMS_incoming
RPMS_id_colname: subjectkey
RPMS_consent_colname: Consent

Note

RPMS_PATH is the directory where RPMS exports database as multiple csv files.

If there is a limit on how much data you can download in a given time on your REDCap server, please see data entry trigger.

Amazon Web Services S3 bucket

Update AWS s3 bucket name to your s3 bucket name and root directory

AWS_BUCKET_NAME: pronet-test
AWS_BUCKET_ROOT: TEST_PHOENIX_ROOT_PRONET

Remove old & already s3-transferred files

Lochness will read the following information from the configuration file to remove already transferred files from the local PHOENIX directory.

days_to_keep: 100
removed_df_loc: /mnt/prescient/Prescient_data/PHOENIX/removed_files.csv
removed_phoenix_root: prescient/Prescient_data/track_removed_files_PHOENIX

Box

See here for how to configure Box source. Then, the configure file should have a box session that states which file patterns to look for in each study.

base is the root of the data directory for the study under Box. If your data for PronetAB is saved under ProNET/PronetAB under the root of Box source, the base for this study should be ProNET/PronetAB.

delete_on_success is an option for removing the source files on the Box once lochness successfully downloads them. True or False

file_patterns takes list of different datatypes to be captured in from the Box. data_dir of each datatype is the name of the root directory that has subject directories for this datatype. And each datatype can have more than one product of files to look for. For an example, for interviews datatype, open, psychs, and transcripts products are searched for each individual. out_dir can be specified if the files need to be saved under a specific subdirectory for a product.

box:
    PronetAB:
        base: ProNET/PronetAB
        delete_on_success: False
        file_patterns:
            actigraphy:
                   - vendor: Activinsights
                     product: GENEActiv
                     data_dir: PronetAB_Actigraphy
                     pattern: '*.*'
            eeg:
                   - product: eeg
                     data_dir: PronetAB_EEG
                     pattern: '*.*'
            interviews:
                   - product: open
                     data_dir: PronetAB_Interviews/OPEN
                     out_dir: open
                     pattern: '*.*'
                   - product: psychs
                     data_dir: PronetAB_Interviews/PSYCHS
                     out_dir: psychs
                     pattern: '*.*'
                   - product: transcripts
                     data_dir: PronetAB_Interviews/transcripts/Approved
                     out_dir: transcripts
                     pattern: '*.*'

See here for an example of output PHOENIX structure from this configuration.

Mediaflux

See here for how to configure Box. Then, the configure file should have a box session that states which file patterns to look for in each study.

box:
    PrescientAB:
        base: ProNET/PrescientAB
        delete_on_success: False
        file_patterns:
            actigraphy:
                   - vendor: Activinsights
                     product: GENEActiv
                     data_dir: PrescientAB_Actigraphy
                     pattern: '*.*'
            eeg:
                   - product: eeg
                     data_dir: PrescientAB_EEG
                     pattern: '*.*'
            mri:
                   - product: mri
                     data_dir: PrescientAB_MRI
                     pattern: '*.*'
            interviews:
                   - product: open
                     data_dir: PrescientAB_Interviews/OPEN
                     out_dir: open
                     pattern: '*.*'
                   - product: psychs
                     data_dir: PrescientAB_Interviews/PSYCHS
                     out_dir: psychs
                     pattern: '*.*'
                   - product: transcripts
                     data_dir: PrescientAB_Interviews/transcripts/Approved
                     out_dir: transcripts
                     pattern: '*.*'

See here for an example of output PHOENIX structure from this configuration.

Email function

Update sender and notify fields. sender should be the google email configured for sending emails with its relevant credentials in the keyring file. List of emails, to which lochness should send the email should be added under __global__ field with - marking each email.

sender: kevincho.lochness@gmail.com
notify:
    __global__:
        - kevincho@bwh.harvard.edu
        - another.person.to.receive.email.1@u24.com
        - another.person.to.receive.email.2@u24.com

Now, your Lochness configuration is complete and ready to run!