Quick start¶
Lochness provides a single command line tool (daemon) to periodically poll and download data from various web services into a local directory. Out of the box there is support for pulling data from a multitude of data sources including REDCap, XNAT, Dropbox, Box, Mediaflux, RPMS, external hard drives, and more.
Installation¶
Just use pip
$ pip install ampscz-lochness
For the most recent AMP-SCZ lochness install and debugging, see Installation.
Also, Amazon Web Service (AWS) commandline tool needs be configured
$ sudo apt-get install awscli
and configure AWS CLI with Pronet or Prescient credentials.
$ aws configure
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-east
Default output format [None]: json
Setup from a template¶
Follow the steps below to semi-automatically setup lochness environment.
Run
lochness_create_template.pyto create a template
For Pronet network
$ lochness_create_template.py \
--outdir /data/pronet/data_sync_pronet \
--studies PronetLA PronetOR PronetBI PronetNL PronetNC PronetSD \
PronetCA PronetYA PronetSF PronetPA PronetSI PronetPI \
PronetNN PronetIR PronetTE PronetGA PronetWU PronetHA \
PronetMT PronetKC PronetPV PronetMA PronetCM PronetMU \
PronetSH PronetSL \
--sources redcap upenn box xnat mindlamp \
--email kevincho@bwh.harvard.edu \
--poll_interval 86400 \
--ssh_host 123.456.789 \
--ssh_user kc244 \
--lochness_sync_send \
--s3 \
--s3_selective_sync surveys mri phone eeg actigraphy
For Prescient network
$ lochness_create_template.py \
--outdir /data/prescient/data/sync_prescient \
--studies PrescientLA PrescientSL PrescientLA \
--sources rpms upenn mediaflux mindlamp \
--email kevincho@bwh.harvard.edu \
--poll_interval 86400 \
--ssh_host 123.456.789 \
--ssh_user kc244 \
--lochness_sync_send \
--s3 \
--s3_selective_sync surveys mri phone eeg actigraphy
Running lochness_create_template.py will create a This will create a
directory which will look like this
/data/pronet/data_sync_pronet
├── 1_encrypt_command.sh
├── 2_sync_command.sh
├── PHOENIX
│ ├── GENERAL
│ │ ├── PronetAB
│ │ │ └── PronetAB_metadata.csv
│ │ ├── ...
│ │ └── PronetYA
│ │ └── PronetYA_metadata.csv
│ └── PROTECTED
│ ├── PronetAB
│ ├── ...
│ └── PronetYA
├── config.yml
├── lochness.json
└── pii_convert.csv
Edit credentials in
lochness.jsoncd /data/pronet/data_sync_pronet vim lochness.json
lochness.json file looks like below. Add information to field labelled with
*****
{
"lochness": {
"REDCAP": {
"PronetLA": {
"redcap.Pronet": [
"Pronet"
],
"redcap.UPENN": [
"UPENN"
]
},
...,
"PronetSL": {
"redcap.Pronet": [
"Pronet"
],
"redcap.UPENN": [
"UPENN"
]
}
},
"SECRETS": {
"PronetLA": "LOCHNESS_SECRETS",
...,
}
},
"redcap.UPENN": {
"URL": "*****",
"API_TOKEN": {
"UPENN": "*****"
}
},
"redcap.Pronet": {
"URL": "*****",
"API_TOKEN": {
"Pronet": "*****"
}
},
"xnat.PronetLA": {
"URL": "*****",
"USERNAME": "*****",
"PASSWORD": "*****"
},
...,
"box.PronetLA": {
"CLIENT_ID": "*****",
"CLIENT_SECRET": "*****",
"ENTERPRISE_ID": "*****"
},
...,
"mindlamp.PronetLA": {
"URL": "*****",
"ACCESS_KEY": "*****",
"SECRET_KEY": "*****"
},
...,
}
Example of completed lochness.json
{
"lochness": {
"REDCAP": {
"PronetLA": {
"redcap.Pronet": [
"Pronet"
],
"redcap.UPENN": [
"UPENN"
]
},
...,
"PronetSL": {
"redcap.Pronet": [
"Pronet"
],
"redcap.UPENN": [
"UPENN"
]
}
},
"SECRETS": {
"PronetLA": "LOCHNESS_SECRETS",
...,
}
},
"redcap.UPENN": {
"URL": "https://redcap.med.upenn.edu",
"API_TOKEN": {
"UPENN": "BC6BEF2D2369BC8FE1233CAAAB20378D"
}
},
"redcap.Pronet": {
"URL": "https://redcapynh-p11.ynhh.org"
"API_TOKEN": {
"Pronet": "AFBDCCD55934EE947A388541EED6A216"
}
},
"xnat.PronetLA": {
"URL": "https://xnat.med.yale.edu",
"USERNAME": "kcho",
"PASSWORD": "whrkddlr8*90"
},
...,
"box.PronetLA": {
"CLIENT_ID": "e19fltqp9f9ftv4dydqjius4w20072cr",
"CLIENT_SECRET": "LrkDwYZvA49Q4dXVGv3g4aaSy4SQRobz",
"ENTERPRISE_ID": "756591"
},
...,
"mindlamp.PronetLA": {
"URL": "mindlamp.orygen.org.au",
"ACCESS_KEY": "kcho",
"SECRET_KEY": "0c5b0a5af972b2a1b2d6cd299dc37703c22e8ddd5dfd15f0d83ca7a1cb8bcce7"
},
...,
}
Encrypt
lochness.jsonby running$ bash 2_sync_command.sh
Then remove lochness.json for security
$ rm lochness.json
Edit
config.yml:$ vim config.yml
Edit AWS s3 bucket name and root directory
AWS_BUCKET_NAME: pronet-test
AWS_BUCKET_ROOT: TEST_PHOENIX_ROOT_PRONET
Edit base field for Box structure
box:
PronetLA:
base: ProNET/PronetLA
delete_on_success: False
file_patterns:
actigraphy:
- vendor: Activinsights
product: GENEActiv
data_dir: PronetLA_Actigraphy
pattern: '*.*'
eeg:
- product: eeg
data_dir: PronetLA_EEG
pattern: '*.*'
interviews:
- product: open
data_dir: PronetLA_Interviews/OPEN
out_dir: open
pattern: '*.*'
- product: psychs
data_dir: PronetLA_Interviews/PSYCHS
out_dir: psychs
pattern: '*.*'
- product: transcripts
data_dir: PronetLA_Interviews/transcripts/Approved
out_dir: transcripts
pattern: '*.*'
Run sync.py¶
Execute sync.py script to have lochness to continuously sync data
sync.py -c /data/pronet/data_sync_pronet/config.yml \
--studies PronetLA PronetOR PronetBI PronetNL PronetNC PronetSD \
PronetCA PronetYA PronetSF PronetPA PronetSI PronetPI \
PronetNN PronetIR PronetTE PronetGA PronetWU PronetHA \
PronetMT PronetKC PronetPV PronetMA PronetCM PronetMU \
PronetSH PronetSL \
--source redcap upenn box xnat mindlamp \
--lochness_sync_send --s3 \
--debug --continuous
This will run lochness sync function for each site (studies) for all
measures (source). It will upload newly downloaded data to the s3 bucket
after each data sweep for all sources. Then, this sync.py function will
execute these functions again after poll_interval stated in the
config.yml.
lochness_create_template.py creates a template bash script that could be
used.
bash 2_sync_command.sh
Example PHOENIX-BIDS structure¶
U24 uses PHOENIX-BIDS structure, which is slightly different from the
PHOENIX structure. PHOENIX-BIDS was used to have more similarity to the
BIDS structure, while maintaining protected vs general and raw
vs processed concept of the PHOENIX.
Summary of the structure
<protected>/<study>/<processed>/<subject>/<datatypes>
PHOENIX/
├── PROTECTED
│ └── PronetAB
│ ├── raw
│ │ ├── AB00001
│ │ │ ├── surveys
│ │ │ │ └── AB00001.Pronet.json
│ │ │ └── ...
│ │ └── ...
│ └── processed
│ └── ...
└── GENERAL
└── ...
Different levels of the structure
Level 1: General or Protected:
PHOENIX/
├── GENERAL
└── PROTECTED
Level 2: Sites (studies)
PHOENIX/
├── GENERAL
│ ├── PronetAB
│ ├── ...
│ └── PronetYA
└── PROTECTED
├── PronetAB
├── ...
└── PronetYA
Level 3: Raw or Processed
PHOENIX/
├── GENERAL
│ ├── PronetAB
│ │ ├── PronetAB_metadata.csv
│ │ ├── raw
│ │ └── processed
│ ├── ...
│ └── PronetYA
│ ├── PronetYA_metadata.csv
│ ├── raw
│ └── processed
└── PROTECTED
├── PronetAB
│ ├── raw
│ └── processed
├── ...
└── PronetYA
├── raw
└── processed
Level 4: Subject
PHOENIX/
├── GENERAL
│ └── PronetAB
│ ├── raw
│ │ ├── AB00001
│ │ ├── AB00002
│ │ └── AB00003
│ └── processed
│ ├── AB00001
│ ├── AB00002
│ └── AB00003
└── PROTECTED
└── ...
Level 5: Data types
PHOENIX/
├── PROTECTED
│ └── PronetAB
│ ├── raw
│ │ ├── AB00001
│ │ │ ├── surveys
│ │ │ │ └── AB00001.Pronet.json
│ │ │ ├── mri
│ │ │ │ ├── AB00001.Pronet.Run_sheet_mri.csv
│ │ │ │ └── AB00001_MR_2022_01_01_1
│ │ │ ├── eeg
│ │ │ │ ├── AB00001.Pronet.Run_sheet_eeg.csv
│ │ │ │ └── AB00001_eeg_20220101.zip
│ │ │ ├── interviews
│ │ │ │ ├── open
│ │ │ │ ├── psychs
│ │ │ │ └── transcripts
│ │ │ └── actigraphy
│ │ └── ...
│ └── processed
│ └── ...
└── GENERAL
└── ...
Manual Setup¶
Connecting to various external data sources (Beiwe, XNAT, Dropbox, etc.) often requires a myriad of connection details e.g., URLs, usernames, passwords, API tokens, etc. Lochness will only read these pieces of information from an encrypted JSON file that we refer to as the keyring. Here’s an example of a decrypted keyring file
{
"lochness": {
"REDCAP": {
"example": {
"redcap.example": [
"example"
]
}
},
"SECRETS": {
"example": "quick brown fox jumped over lazy dog"
}
},
"redcap.example": {
"URL": "https://redcap.partners.org/redcap",
"API_TOKEN": {
"example": "681BBE7CCA0C879EE5**********"
}
},
"beiwe.example": {
"URL": "https://beiwe.example.org",
"ACCESS_KEY": "...",
"SECRET_KEY": "..."
},
"xnat.example": {
"URL": "https://chpe-xnat.example.harvard.edu",
"USERNAME": "...",
"PASSWORD": "..."
},
"box.example": {
"CLIENT_ID": "...",
"CLIENT_SECRET": "...",
"API_TOKEN": "..."
},
"mediaflux.example": {
"HOST": "mediaflux.researchsoftware.unimelb.edu.au",
"PORT": "443",
"TRANSPORT": "https",
"TOKEN": "...",
"DOMAIN": "...",
"USER": "...",
"PASSWORD": "..."
},
"mindlamp.example": {
"URL": "...",
"ACCESS_KEY": "...",
"SECRET_KEY": "..."
},
"daris.example": {
"URL": "...",
"TOKEN": "...",
"PROJECT_CID": "..."
},
"rpms.example": {
"RPMS_PATH": "..."
}
}
This file must be encrypted using a passphrase. At the moment, Lochness only
supports encrypting and decrypting files (including the keyring) using the
cryptease library. This library
should be installed automatically when you install Lochness, but you can
install it separately on another machine as well. Here is how you would use
cryptease to encrypt the keyring file
crypt.py --encrypt ~/.lochness.json --output-file ~/.lochness.enc
Attention
I’ll leave it up to you to decide on which device you want to encrypt this file. I will only recommend discarding the decrypted version as soon as possible.
PHOENIX¶
Lochness will download your data into a directory structure informally known as
PHOENIX. For a detailed overview of PHOENIX, please read through the
PHOENIX documentation. You need to initialize the directory structure
manually, or by using the provided phoenix-generator.py command line tool that will
be installed with Lochness. To use the command line tool, simply provide a study name
using the -s|--study argument and a base filesystem location
phoenix-generator.py --study example ./PHOENIX
The above command will generate the following directory tree
PHOENIX/
├── GENERAL
│ └── example
│ └── example_metadata.csv
└── PROTECTED
└── example
Basic usage¶
The primary command line utility for Lochness is sync.py. When you invoke this
tool, you will be prompted for the passphrase that you used to encrypt your
keyring. To sidestep the password prompt, you can use an environment
variable NRG_KEYRING_PASS.
metadata files¶
The sync.py tool is driven largely off the PHOENIX metadata files. For an
in-depth look at these metadata files, please read the
metadata files section from the PHOENIX documentation.
configuration file¶
Before you can successfully run sync.py, you need to provide the location
to a configuration file using -c|--config
sync.py -c /path/to/config.yaml
There is an example configuration file within the Lochness repository under
etc/config.yaml. To learn more about what each configuration option
means, please read the configuration file documentation.
data sources¶
By default, Lochness will download data from all supported data sources. If
you want to restrict Lochness to only download specific data sources, you can
provide the --source argument
sync.py -c config.yml --source beiwe
sync.py -c config.yml --source xnat box
additional help¶
To see all of the command line arguments available, use the --help argument
sync.py --help