Thorough Readme Files

Warning

The sections below are dynamically linked to standalone readme markdown files. A documentation rebuild will capture changes in any of the files.
Consequentially, some dynamic links to functions or other documents will not work when displayed here as they would from the direct rendering of the readmes. This is because relative paths cannot be kept consistent.

MAIN README

CLEDB Coronal Magnetic Field Database Inversion

Repository for CLEDB - the Coronal Line Emission DataBase inversion distribution.

Authors: Alin Paraschiv & Philip Judge. High Altitude Observatory & National Solar Observatory

Contact: arparaschiv “at” ucar.edu; paraschiv.alinrazvan+cledb “at” gmail.com

Main aim:

Invert coronal vector magnetic field products from observations of polarized light. The algorithm takes arrays of one or two sets of spectro-polarimetric Stokes IQUV observations to derive line of sight and/or full vector magnetic field products.

Applications:

Inverting magnetic field information from spectro-polarimetric solar coronal observations from instruments like DKIST Cryo-NIRSP; DL-NIRSP; MLSO COMP/UCOMP.

Documentation

Extensive documentation, including installation instruction, dependencies, algorithm schematics and much more is available on CLEDB.READTHEDOCS.IO A git distribution PDF build is also provided.
In-depth documentation for the Bash & Fortran parallel database generation module is provided in README-RUNDB.md.
Installation and usage on RC systems is described in README-SLURM.md.
This is a beta-level release. Not all functionality is implemented. TODO.md documents updates, current issues, and functions to be implemented in the near future.

System platform compatibility

Debian+derivatives Linux x64 – all inversion modules are fully working.
RC system CentOS linux x64 – all inversion modules are fully working. Additional binary executable is provided. May require local compiling.
OSX (Darwin x64) Catalina and Big Sur – all inversion modules are fully working; One additional homebrew package required. See README-CODEDOC.pdf.
Windows platform – not tested.

Examples

Install the CLEDB distribution, generate databases, and update the database save location in the ctrlparams.py class, as described in the README-CODEDOC. Afterwards, both 1-line and 2-line implementations of CLEDB can be tested with synthetic data using the two provided Jupyter notebook examples

test_1line.ipynb
test_2line_IQUV.ipynb

The test data are hosted separately. These are called by enabling the corresponding 1.a-1.e cells in the test notebooks and scripts. See the documentation for extended details regarding the included datafiles.

1.a synthetic CLE 3 dipole data.
1.b synthetic CLE current-sheet data will be available soon.
1.c Only for internal testing.
1.d CoMP observation data.
1.e CoMP doppler analysis results for the 1.d datacube.

For terminal only compute systems the test data can be downloaded via the shell interface with the following method:

i. Load the following gdrive wrapper script into your bash window directly, or introduce it in your .bash_alias setup.

function gdrive_download () {   CONFIRM=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate "https://docs.google.com/uc?export=download&id=$1" -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p');   wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$CONFIRM&id=$1" -O $2;   rm -rf /tmp/cookies.txt; }

ii. Download the file using its gdrive FILE_ID from the download link (1.a test data FILE_ID = 1beyDfZbm6epMne92bqlKXcgPjYI2oGRR):

gdrive_download FILE_ID local_path/local_name   (sometimes needs to be run two times to set cookies correctly!)

Note: The script versions of all tests test_1line.py and test_2line.py together with the test_cledb_slurm.sh are slurm enabled to be used for headless RC system runs. These offer the same functionality as the notebooks (from which they are directly generated from). See the dedicated README-SLURM for additional information.

Both test examples are expected to fully execute with parallel job spawning via Numba/JIT in a correct installation.

Works that fundament and support the CLEDB inversion

Paraschiv & Judge, SolPhys, 2022 covered the scientific justification of the algorithm, and the setup of the CLEDB inversion.
Judge, Casini, & Paraschiv, ApJ, 2021 discussed the importance of scattering geometry when solving for coronal magnetic fields.
Ali, Paraschiv, Reardon, & Judge, ApJ, 2022 performed a spectroscopic exploration of the infrared regions of emission lines available for inversion with CLEDB.
Dima & Schad, ApJ, 2020 discussed potential degeneracies in using certain line combinations. The one-line CLEDB inversion utilizes the methods and results described in this work.
Schiffmann, Brage, Judge, Paraschiv & Wang, ApJ, 2021 performed large-scale Lande g factor calculations for ions of interest and discusses degeneracies in context of their results.
Casini & Judge, ApJ, 1999 and Judge & Casini, ASP proc., 2001 described the theoretical line formation process implemented in CLE, the coronal forward-synthesis code that is currently utilized by CLEDB.

README-RUNDB

CLEDB Parallel Database Generator

README for running CLE database calculations on multiple CPU threads.

Contact: Alin Paraschiv (arparaschiv at ucar edu)

History for BUILD module:

ARP: 20210617 - initial release.
ARP: 20210827 - Added a Slurm enabled version of the script for batch jobs on RC systems.
ARP: 20210915 - Rewrote the thread scaling to allocate tasks uniformly across threads; Both interactive and batch scripts now can utilize RC Slurm capabilities. The interactive version can only use Slurm allocated resources inside interactive jobs. The batch dedicated version can utilize scratch directories; It copies final outputs in a user’s project directory after finalizing tasks.
ARP: 20221222 - Updated both scripts to fix an error with calculating the optimal heights that are scaled across available nodes.

SCOPE:

This is a simple bash script implementation that launches separate parallel processes for building Stokes IQUV databases as part of the CLEDB_BUILD module. Two versions are provisioned:

rundb_1line.sh (For local interactive runs; can be utilized inside slurm interactive environments too.)
rundb_1line_slurm.sh (For batch and/or headless runs.)

INSTALL and USAGE:

make sure the scripts are executable:

  chmod u+x rundb_1line.sh
  chmod u+x rundb_1line_slurm.sh

(Only on OSX) Install gnu-sed (See notes below):
```
  brew install gnu-sed
```
(Optional if needed) OSX might have issues with running executables (“cannot execute binary file”). To fix try:
```
  xattr -d com.apple.quarantine /path/to/file
```

(Optional) for interactive jobs on RC systems, the correct modules may need to be preloaded in order for scripts to execute.

  module load slurm/blanca
  module load gcc/10.2.0                (gcc is preloaded automatically in the batch version of the script.)

run interactive jobs with (after starting the interactive node; see README_SLURM):
```
  ./rundb_1line.sh
```
run batch/headless jobs with:
```
  sbatch rundb_1line_slurm.sh
```

NOTES:

The interactive rundb_1line.sh script requires two manual keyboard user inputs.

i. select how many CPU threads to use;
```
  Hi
  You have xx CPU threads available.
  How many to use?
```
ii. which ion/line to compute. Each ion/line will create its own subfolder in the directory structure to store computations.
```
  Please indicate the line to generate. Options are:
  1:    FE XIII 1074.7nm
  2:    FE XIII 1079.8nm
  3:    Si X    1430.1nm
  4:    Si IX   3934.3nm
```
The batch rundb_1line_slurm.sh script has no keyboard inputs, but has manually defined variables that control the ions to generate and system paths.
Most directory and file pointers are dynamically linked to the CLEDB distribution directory. Local runs should run without interference. Some directory/system containing variables are defined to be compatible with the CURC system (scratch, project, etc. dirs). These may need to be updated for different systems.
** NEWLY COMPLETED RUNS WILL DELETE/OVERWRITE PREVIOUSLY COMPUTED CALCULATIONS AND LOGS IN THE CORRESPONDENT SUBFOLDER**
The scripts are configured to produce one line database outputs. All atomic data for the four ions of interest along with the configuration files are available in the config directory. This setup selects the relevant inputs automatically.
Outside of the two batch scripts, the only user editable file is the config/DB.INPUT that configures the database number of calculations (parameter resolution).
Database output, header, and logs will be written in the correspondent ion sub-directory. Intermediary folders and files will be deleted upon completion. The logs are dynamically written and calculation status can be checked anytime with tail; e.g.
```
  tail BASHJOB_0.LOG
```
The ./rundb scripts will wait for all thread tasks to finish before exiting. Due to limitation in CPU process ID (PID) tracking, the user is not notified in order of threads finalizing, but in the order they were scheduled. e.g. if thread 2 finishes before thread 0, the user will find out only after thread 0 and thread 1 finish. A bug might manifest if a new unrelated task is scheduled with the same PID as one of the runs, but this should not occur in normal circumstances. If such a case occurs, a tail of the logs will verify that everything went well and scripts can be exited manually.
The number of Y-heights to calculate between the ymin and ymax ranges are not always a multiple of the number of CPU threads. The scripts will efficiently scale the tasks on the available threads. If you request less tasks (via DB.INPUT) than threads (via keyboard or sbatch), the script will not utilize all pre-allocated resources.
The script heavily relies on the SED function. SED has different implementations on Linux (GNU) vs mac (BSD) which makes commands not be directly correspondent. A function wrapper SEDI that disentangles GNU vs BSD syntax is provided in the scripts. OSX users need to install a gnu implementation of sed (gnu-sed) for the script to be portable between systems (via the gsed command).
```
  brew install gnu-sed
```
The script cuts and appends midline on the DB.INPUT file, to set the ymin and ymax ranges for each CPU thread. The number of decimals for all variables and 3 spaces in between them need to be kept in the configuration file in order to not introduce bugs.
Executables (dbxxx) need to be build (from CLE) on the current architecture: ELF(linux) or Mach-O(OSX) If non-correct executables are called a “cannot execute binary file” error is produced. Architecture can be checked with the file command. The configuration deduces the OS in use and selects and uses the proper dbxxx executable in each case, where both Darwin and LINUX executables exist. The linux executable has a CURC cross compiled executable compiled with gcc/10.2.0 for use in RC systems.

README-SLURM

CLEDB Research Computing Runs

Contact: arparaschiv “at” ucar.edu; paraschiv.alinrazvan+cledb “at” gmail.com

SLURM ENABLED RESEARCH COMPUTING INTERACTIVE OR HEADLESS RUNS

Detailed instructions for setting up and running the CLEDB inversion distribution on research computing (RC) systems.

1. Slurm enabled test scripts

test_cledb_slurm.sh
test_1line.py
test_2line.py

Note: the test_1line.py and test_2line.py scripts are plain script versions of the test notebooks. These are directly exported from the Jupyter .ipynb notebooks. All changes to the notebooks should be exported to the scripts.

2. Installation and run instructions for RC systems

These instructions are following the CURC system guidelines and scripts are provisioned to be compatible with the blanca-nso compute nodes.

Activate the slurm/blanca module with:
```
  module load slurm/blanca
```

2.a Interactive runs

Start an interactive job:

  sinteractive --partition=blanca-nso --time=01:00:00 --ntasks=2 --nodes=1 --m=12gb

Install CLEBD via git clone in the /projects/$USER/ directory following the instructions in README-codedoc.PDF.
Create or update a .condarc file with the following contents so that anaconda environments and packages install to your /projects/$USER/ directory instead of /home/$USER/ directory due to lack of storage space.
```
  pkgs_dirs:
  - /projects/$USER/.conda_pkgs
  envs_dirs:
  - /projects/$USER/software/anaconda/envs
```
Anaconda install/enable. This step needs to be run at each sinteractive login to enable Anaconda.
```
  source /curc/sw/anaconda3/latest
```
Install the CLEDBenv anaconda environment using the CLEDBenv.yml file. Detailed instructions in README-codedoc.PDF.
Note: Install inside the sinteractive run or a compile node following the CURC guidelines. Don’t perform the installation from the login node.
Activate your new environment
```
  conda activate CLEDBenv
```

Generate a database:

  module load gcc/10.2.0
  ./CLEDB_BUILD/rundb_1line.sh

Note: A Fortran executable cross compiled on the CURC system with gcc/10.2.0 is provided and will be automatically used by the script. If libraries are missing, and runs are not executing, please contact us for the CLE source code distribution. The most current CLE distribution is not yet publicly hosted, but available upon request.
Update the database save location in the ctrlparams.py class, and then run any of the three .py test scripts.
```
  python3 test_1line.py
  python3 test_2line.py
```
Everything should work (remember to download the test data to the main CLEDB root dir) with the exception of remotely connecting to a Jupyter notebook server spawned inside an sinteractive session (which on CURC refuses to connect). CURC offers dedicated Jupyter notebook/lab compute nodes, but beware of how the low resource allocation (usually 1 thread) might interact negatively with the Numba/JIT parallel enabled functions.

2.b Batch/headless runs

The database generating scripts in CLEDB_BUILD directory have a dedicated headless run script rundb_1line_slurm.sh which has slurm headers and where all user inputs are disabled. RC resources are requested via the sbatch commands in the script header. The ion to generate the database along with some path variables need to be manually edited in the script before running. This version of the database generation script will perform disk I/O on $SCRATCH partitions, and not on local directories. Databases will be moved back to the /projects/$USER/ directories after computations are finished.
Call it using sbatch after editing for the ion and paths to generate for each ion (multiple sbatch commands can be run concurrently if resources are available):
```
  sbatch rundb_1line_slurm.sh
```
The bash test_cledb_slurm.sh wrapper script is a starting point for running test/production headless runs via the sbatch command. It provisionally calls one of the two above mentioned .py scripts based on a decision tree.
The script is to be updated/finalized when production runs are ready and data and header ingestion procedures are known.