Setting run parameters

Our workflows are executed using defaults that specify parameters for setting requirements for memory, threads, environment, e.c.t. Each of these parameters can be modified within the pipeline.

Modifiable run parameters

job_memory: Number of slots (threads/cores/CPU) to use for the task. Default: “4G”
job_total_memory: Total memory to use for a job.
to_cluster: Send the job to the cluster. Default: True
without_cluster: When this is set to True the job is ran locally. Default: False
cluster_memory_ulimit: Restrict virtual memory. Default: False
job_condaenv: Name of the conda environment to use for each job. Default: will use the one specified in bashrc
job_array: If set True, run statement as an array job. Job_array should be tuple with start, end, and increment. Default: False

Specifying parameters to job

Parameters can be set within a pipeline task as follows:

@transform( '*.unsorted', suffix('.unsorted'), '.sorted')
def sortFile( infile, outfile ):

  statement = '''sort -t %(tmpdir)s %(infile)s > %(outfile)s'''

  P.run(statement,
        job_condaenv="sort_environment",
        job_memory=30G,
        job_threads=2,
        without_cluster = False,
        job_total_memory = 50G)