Lochness sync.py function in detail
sync.py is the main commandline shell script, which executes the data
sync pipelines of Lochness. This page goes through what the sync.py does
in more detail, so user can have a deeper understanding of the mechanims.
Loads configuration file, then keyring file
Location of a configuration yaml file is one of the required input to the
sync.py. As briefly explained in the
Setting up lochness, unique information for
the server, various data sources, and etc. are included in the yaml file and
this file is loaded first, when sync.py is executed.
Please see the configuration file section for more information.
The encrypted keyring file, location of should have been included in
the configuration file, is also loaded by sync.py.
Please see the keyring file section for more information.
Now Lochness is ready to pull the files.
Creates and updates metadata for each site
Lochness first needs to pull information from REDCap or RPMS to
get the list of subject IDs registered for each site (study). Using the
information loaded from the configuration and keyring files, Lochness looks
for unique subject IDs, consent date, and their unique mindlamp ID registered
in the REDCap or RPMS database. This step will create a
{site}_metadata.csv file under each GENERAL site directory.
PHOENIX
└── GENERAL
├── PronetAB
│ └── PronetAB_metadata.csv
└── PronetCD
└── PronetCD_metadata.csv
Here is an example of the metadata.csv, created by Lochness.
eg) PronetAB_metadata.csv
Active |
Consent |
Subject ID |
REDCap |
Box |
XNAT |
Mindlamp |
|---|---|---|---|---|---|---|
1 |
1900-01-01 |
AB00001 |
redcap.Pronet:AB00001;redcap.UPENN:AB00001 |
box.PronetAB:AB00001 |
xnat.PronetAB:*:AB00001 |
mindlamp.PronetAB:108230 |
1 |
1900-01-01 |
AB00002 |
redcap.Pronet:AB00002;redcap.UPENN:AB00002 |
box.PronetAB:AB00002 |
xnat.PronetAB:*:AB00002 |
mindlamp.PronetAB:801230 |
1 |
1900-01-01 |
AB00003 |
redcap.Pronet:AB00003;redcap.UPENN:AB00003 |
box.PronetAB:AB00003 |
xnat.PronetAB:*:AB00003 |
mindlamp.PronetAB:208103 |
The columns for each data source get populated with the unique strings, which
are the combination of site (study) and subject ID, in the format that is
readable by Lochness. And this metadata.csv files are updated at every
sync circulation, therefore any new subjects added to the REDCap or
RPMS will be populated into the metadata.csv.
Note
These metadata files are automatically created and updated by lochness,
so users should not manually update them.
Pull data for each subject in metadata.csv
Then, Lochness goes over the list of data sources given to the sync.py
through --sources argument, checking for any available data that matches
unique subject ID patterns in the metadata.csv file.
List of data sources focused in AMP-SCZ
For Pronet
REDCap
UPENN REDCap
XNAT
Box
Mindlamp
For Prescient
RPMS
UPENN REDCap
Mediaflux
Mindlamp
Warning
Since this step depends on the metadata.csv file, any data from subjects
who are not included in the metadata.csv will not be downloaded by
Lochness. In another words, any data that belong to a subject who are
missing from REDCap or RPMS will not be downloaded by Lochness.
Transfer selected data to s3 bucket
With --s3 option, Lochness can also transfer file to AWS s3 bucket
using AWS CLI rsync function. With this argument, at the end of every sync
citculation, the files data under PHOENIX/GENERAL directory will be
transferred to the s3 bucket.
If any raw data types is okay to be be transferred, as they were downloaded
from their data source, --selective_sync option can be used to select these
data types. Then all the data under both GENERAL and PROTECTED will be
transferred to s3 bucket.