Thorough Readme Files
Warning
The sections below are dynamically linked to standalone readme markdown files. A documentation rebuild will capture changes in any of the files.
Consequentially, some dynamic links to functions or other documents will not work when displayed here as they would from the direct rendering of the readmes. This is because relative paths cannot be kept consistent.
MAIN README
CLEDB Coronal Magnetic Field Database Inversion
Repository for CLEDB - the Coronal Line Emission DataBase inversion.
Authors: Alin Paraschiv and Philip Judge. National Solar Observatory & High Altitude Observatory
Contact: arparaschiv “at” nso.edu
Main aim:
Invert coronal vector magnetic field products from observations of polarized light. The algorithm takes arrays of one or two sets of spectro-polarimetric Stokes IQUV observations to derive line of sight and/or full vector magnetic field products.
Applications:
Inverting magnetic field information from spectro-polarimetric solar coronal observations from instruments like DKIST Cryo-NIRSP; DL-NIRSP; MLSO COMP/UCOMP.
Documentation
Extensive documentation, including installation instruction, dependencies, algorithm schematics and much more is available on CLEDB.READTHEDOCS.IO A git distribution PDF build is also provided.
In-depth documentation for the Bash & Fortran parallel database generation module is provided in README-RUNDB.md.
Installation and usage on RC systems is described in README-SLURM.md.
This is a beta-level release. Not all functionality is implemented. TODO.md documents updates, current issues, and functions to be implemented in the near future.
System platform compatibility
Debian+derivatives Linux x64 – all inversion modules are fully working.
RC system CentOS linux x64 – all inversion modules are fully working. An additional binary executable is provided. May require local compiling.
OSX (Darwin x64) Catalina and Big Sur – all inversion modules are fully working; One additional homebrew package required. See README-RUNDB.
Windows platform – not tested.
Examples
Install the CLEDB distribution, generate databases, and, if needed, update the database save location in the ctrlparams.py class, as described in the CLEDB.READTHEDOCS.IO .
The new PyCELP database generation tool is recommended. It is more precise, but requires some computational resources for calculations. A default PyCELP generated database can be found here to help get started (33Gb download). Just extract the two database folders in the CLEDB_BUILD directory and you should be set to running the examples.
Afterward, both 1-line and 2-line implementations of CLEDB can be tested with synthetic data using the two provided Jupyter notebook examples
The test data are hosted separately. These are called by enabling the corresponding 1.- 4. cells in the test notebooks and scripts. See the documentation for details regarding the included datafiles.
For terminal only compute systems, the test data can be downloaded via the shell interface with the following method:
i. Load the following gdrive wrapper script into your bash window directly, or introduce it in your .bash_alias setup.
function gdrive_download () { CONFIRM=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate "https://docs.google.com/uc?export=download&id=$1" -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p'); wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$CONFIRM&id=$1" -O $2; rm -rf /tmp/cookies.txt; }
ii. Download the file using its gdrive FILE_ID from the download link (1.a test data FILE_ID = 1beyDfZbm6epMne92bqlKXcgPjYI2oGRR):
gdrive_download FILE_ID local_path/local_name (sometimes needs to be run two times to set cookies correctly!)
Note: The script versions of all tests test_1line.py and test_2line.py together with the test_cledb_slurm.sh are slurm enabled to be used for headless RC system runs. These offer the same functionality as the notebooks (from which they are directly generated from). See the dedicated README-SLURM for additional information.
Both test examples are expected to fully execute with parallel job spawning via Numba/JIT in a correct installation.
Contributions
We welcome contribution ideas and even implementations of new functionalities and optimizations to be included in CLEDB. This can be done through a pull-merge request or by contacting the developers directly to discuss your plans and ideas. The developers will strive to accept and implement contributions as long as they fit within the scope of the software and are adhering to the code of conduct.
Acknowledgement: Works that fundament and support the CLEDB methodology
Paraschiv & Judge, SolPhys, 2022 covered the scientific justification of the algorithm, and the setup of the CLEDB inversion.
Judge, Casini, & Paraschiv, ApJ, 2021 discussed the importance of scattering geometry when solving for coronal magnetic fields.
Ali, Paraschiv, Reardon, & Judge, ApJ, 2022 performed a spectroscopic exploration of the infrared regions of emission lines available for inversion with CLEDB.
Dima & Schad, ApJ, 2020 discussed potential degeneracies in using certain line combinations. The one-line CLEDB inversion utilizes the methods and results described in this work.
Schiffmann, Brage, Judge, Paraschiv & Wang, ApJ, 2021 performed large-scale Lande g factor calculations for ions of interest and discusses degeneracies in context of their results.
Casini & Judge, ApJ, 1999 and Judge & Casini, ASP proc., 2001 described the theoretical line formation process implemented in CLE, the coronal forward-synthesis code that is currently utilized by CLEDB.
README-RUNDB
CLEDB Parallel Database Generator
README for performing PyCELP or CLE database calculations on multiple CPU threads.
Contact: Alin Paraschiv (arparaschiv at nso edu)
History for the BUILD module:
ARP: 20210617 - initial release.
ARP: 20210827 - Added a Slurm enabled version of the script for batch jobs on RC systems.
ARP: 20210915 - Rewrote the thread scaling to allocate tasks uniformly across threads; Both interactive and batch scripts now can utilize RC Slurm capabilities. The interactive version can only use Slurm allocated resources inside interactive jobs. The batch dedicated version can utilize scratch directories; It copies final outputs in a user’s project directory after finalizing tasks.
ARP: 20221222 - Updated both scripts to fix an error with calculating the optimal heights that are scaled across available nodes.
ARP: 20240730 - Overhaul of the database building functionality. A new implementation of the CLEDB BUILD module using PyCELP is now provided.
SCOPE:
This is a script implementation that launches separate parallel processes for building Stokes IQUV databases as part of the CLEDB_BUILD module. Three versions are provisioned:
rundb_1line_with_PyCELP.py – PyCELP: For local interactive and batch runs.
rundb_1line_with_CLE.sh – DEPRECATED – CLE: For local interactive runs (Slurm interactive session compatible).
rundb_1line_with_CLE_batch.sh – DEPRECATED – CLE: For batch and/or headless runs.
INSTALL and USAGE:
PyCELP database building requires three components:
a. The CLEDBenv conda python environment to be installed. Instructions: CLEDB.READTHEDOCS.IO
b. PyCELP will need to be installed separately in the CLEDBenv environment, following the instructions on Github (In this casedo not create a separate environment as recommended there). PyCELP can not be automatically included in the CLEDBenv environment at this moment.
c. The latest available version CHIANTI database to be downloaded from the CHIANTI website. Default CHIANTI download folder in CLEDB is ./CLEDB_BUILD/config/PyCELP/. Otherwise, the XUVTOP path in the rundb_1line_with_PyCELP.py script will need updating to where you installed CHIANTI.
make sure the scripts you plan using are executable:
chmod u+x rundb_1line_with_XXXX.yy
With PyCELP: run any type of job via (see notes below about options):
conda activate CLEDBenv python rundb_1line_with_PyCELP.py in1 in2(Optional) in3(optional) or nohup python rundb_1line_with_PyCELP.py in1 in2 in3 & (This frees the terminal and appends all output to a text file called nohup in the directory from which the script is run.)
With CLE (deprecated): run interactive jobs (after starting the interactive node; see README_SLURM) and batch/headless jobs with either:
./rundb_1line_with_CLE.sh or sbatch rundb_1line_with_CLE_batch.sh
(Optional for OSX) Install gnu-sed (See notes below). OSX might also have issues with running executables (“cannot execute binary file”) Try changing permissions with xattr.
brew install gnu-sed xattr -d com.apple.quarantine /path/to/file
(Optional) for interactive jobs on RC systems, the correct modules may need to be preloaded in order for scripts to execute.
module load slurm/blanca module load gcc/10.2.0 (gcc is preloaded automatically in the batch version of the script.)
NOTES:
** NEWLY COMPLETED RUNS WILL DELETE/OVERWRITE PREVIOUSLY COMPUTED CALCULATIONS AND LOGS IN THE CORRESPONDENT SUBFOLDER**
The scripts are configured to produce one line database outputs. All atomic data for the four ions of interest along with the configuration files are available in the config directory. This setup selects the relevant inputs automatically.
A database configuration file is CLEDB_BUILD/DB.INPUT that configures the database number of calculations (parameter resolution) that is read by either PyCELP or CLE tools.
Production Databases generated via PyCELp calculation:
The rundb_1line_with_PyCELP.py script can be run straight using default values or by specifying two direct parameter inputs, in1, in2, and in3.
in1 is a mandatory input, while in2 and in3 are optional.
in1 – The desired line to calculate. Options 1-4 correspond to: 1: FE XIII 1074.7nm 2: FE XIII 1079.8nm 3: Si X 1430.1nm 4: Si IX 3934.3nm
in2 – (optional) The number of CPU threads to use. Valid options are 1 to n - 4 threads, where n represents the available system threads. If the number is bigger than available threads, the script will use n-4 threads to leave room for cpu task overhead. By default, the script will scale n-4 parallel threads to run calculations.
in3 – (optional) The number of atomic levels to include in calculations. The script is internally configured to run with 25 atomic levels. Although this ensures a fast execution for the default DB.INPUT configuration, the computed databases will not be as accurate for precision inversion calculations. About 80 levels minimum are required for a quantitative analysis level database, although this is rather computationally demanding. A precompiled 80 level database can be downloaded from the link provided in the main CLEDB readme.
Alternative databases generated via CLE calculations –Deprecated–
The rundb_1line_with_CLE.sh script requires two manual keyboard user inputs.
i. select how many CPU threads to use;
Hi You have xx CPU threads available. How many to use?
ii. which ion/line to compute. Each ion/line will create its own subfolder in the directory structure to store computations.
Please indicate the line to generate. Options are: 1: FE XIII 1074.7nm 2: FE XIII 1079.8nm 3: Si X 1430.1nm 4: Si IX 3934.3nm
The batch rundb_1line_CLE_slurm.sh script has no keyboard inputs, but has manually defined variables that a user can edit to control the ions to generate.
Most directory and file pointers are dynamically linked to the CLEDB distribution directory. Local runs should run without interference. Some directory/system variables are defined to be compatible with the CURC system (scratch, project, etc.). These may need to be updated for different systems.
The CLE databases are more compact in terms of physical size and require less resources to be generated. The parameter space and accuracy of these calculations (due to very low number atom levels that can included) are significantly lower than of the PyCELP calculations. Great caution is required for interpretation of matching solutions. Also, due to a different implementation, using these databases will lead to longer computation times of the inversion.
Additional CLE specific notes - DEPRECATED functionality
The ./rundb_1line_with_CLE_XXX.sh scripts implement an external parallelization allocation and will wait for all thread tasks to finish before exiting. Due to limitation in CPU process ID (PID) tracking, the user is not notified in order of threads finalizing, but in the order they were scheduled. e.g. if thread 2 finishes before thread 0, the user will find out only after thread 0 and thread 1 finish. A bug might manifest if a new unrelated task is scheduled with the same PID as one of the runs, but this should not occur in normal circumstances. If such a case occurs, a tail of the logs will verify that everything went well and scripts can be exited manually.
The number of Y-heights to calculate between the ymin and ymax ranges are not always a multiple of the number of CPU threads. The scripts will efficiently scale the tasks on the available threads. If you request less tasks (via DB.INPUT) than threads (via keyboard or sbatch), the script will not utilize all pre-allocated resources.
The script heavily relies on the SED function. SED has different implementations on Linux (GNU) vs mac (BSD) which makes commands not be directly correspondent. A function wrapper SEDI that disentangles GNU vs BSD syntax is provided in the scripts. OSX users need to install a gnu implementation of sed (gnu-sed) for the script to be portable between systems (via the gsed command).
brew install gnu-sed
The script cuts and appends midline on the DB.INPUT file, to set the ymin and ymax ranges for each CPU thread. The number of decimals for all variables and 3 spaces in between them need to be kept in the configuration file in order to not introduce bugs.
Executables (dbxxx) need to be build (from CLE) on the current architecture: ELF(linux) or Mach-O(OSX) If non-correct executables are called a “cannot execute binary file” error is produced. Architecture can be checked with the file command. The configuration deduces the OS in use and selects and uses the proper dbxxx executable in each case, where both Darwin and LINUX executables exist. The linux executable has a CURC cross compiled executable compiled with gcc/10.2.0 for use in RC systems.
Database output, header, and logs will be written in the correspondent ion sub-directory. Intermediary folders and files will be deleted upon completion. The logs are dynamically written, and calculation status can be checked anytime with tail; e.g.
tail BASHJOB_0.LOG
README-SLURM
CLEDB Research Computing Runs
Contact: arparaschiv “at” ucar.edu; paraschiv.alinrazvan+cledb “at” gmail.com
SLURM ENABLED RESEARCH COMPUTING INTERACTIVE OR HEADLESS RUNS
Detailed instructions for setting up and running the CLEDB inversion distribution on research computing (RC) systems.
1. Slurm enabled test scripts
Note: the test_1line.py and test_2line.py scripts are plain script versions of the test notebooks. These are directly exported from the Jupyter .ipynb notebooks. All changes to the notebooks should be exported to the scripts.
2. Installation and run instructions for RC systems
These instructions are following the CURC system guidelines and scripts are provisioned to be compatible with the blanca-nso compute nodes.
Activate the slurm/blanca module with:
module load slurm/blanca
2.a Interactive runs
Start an interactive job:
sinteractive --partition=blanca-nso --time=01:00:00 --ntasks=2 --nodes=1 --m=12gb
Install CLEBD via git clone in the /projects/$USER/ directory following the instructions in README-codedoc.PDF.
Create or update a .condarc file with the following contents so that anaconda environments and packages install to your /projects/\(USER/ directory instead of /home/\)USER/ directory due to lack of storage space.
pkgs_dirs: - /projects/$USER/.conda_pkgs envs_dirs: - /projects/$USER/software/anaconda/envs
Anaconda install/enable. This step needs to be run at each sinteractive login to enable Anaconda.
source /curc/sw/anaconda3/latest
Install the CLEDBenv anaconda environment using the CLEDBenv.yml file. Detailed instructions in README-codedoc.PDF.
Note: Install inside the sinteractive run or a compile node following the CURC guidelines. Don’t perform the installation from the login node.Activate your new environment
conda activate CLEDBenv
Generate a database:
module load gcc/10.2.0 ./CLEDB_BUILD/rundb_1line.sh
Note: A Fortran executable cross compiled on the CURC system with gcc/10.2.0 is provided and will be automatically used by the script. If libraries are missing, and runs are not executing, please contact us for the CLE source code distribution. The most current CLE distribution is not yet publicly hosted, but available upon request.
Update the database save location in the ctrlparams.py class, and then run any of the three .py test scripts.
python3 test_1line.py python3 test_2line.py
Everything should work (remember to download the test data to the main CLEDB root dir) with the exception of remotely connecting to a Jupyter notebook server spawned inside an sinteractive session (which on CURC refuses to connect). CURC offers dedicated Jupyter notebook/lab compute nodes, but beware of how the low resource allocation (usually 1 thread) might interact negatively with the Numba/JIT parallel enabled functions.
2.b Batch/headless runs
The database generating scripts in CLEDB_BUILD directory have a dedicated headless run script rundb_1line_slurm.sh which has slurm headers and where all user inputs are disabled. RC resources are requested via the sbatch commands in the script header. The ion to generate the database along with some path variables need to be manually edited in the script before running. This version of the database generation script will perform disk I/O on \(SCRATCH partitions, and not on local directories. Databases will be moved back to the /projects/\)USER/ directories after computations are finished.
Call it using sbatch after editing for the ion and paths to generate for each ion (multiple sbatch commands can be run concurrently if resources are available):
sbatch rundb_1line_slurm.sh
The bash test_cledb_slurm.sh wrapper script is a starting point for running test/production headless runs via the sbatch command. It provisionally calls one of the two above mentioned .py scripts based on a decision tree.
The script is to be updated/finalized when production runs are ready and data and header ingestion procedures are known.