LPC

Access LPC

To gain access send an email to PMACS-SYS-SCI@lists.upenn.edu asking the admins to give you permission to the queue. Include in the email that the queue is "barash" and PMACS username that would like access. Yoseph will need to confirm authorization therefore, make sure to CC Yoseph on this email.

Log Into Submit Host

Once access has been given, ssh into scisub.pmacs.upenn.edu. This host is publicly reachable and doesn't require a VPN connection.

Example MAJIQ Job

There is a simple MAJIQ job example attached to this wiki. It uses the data from the workshop example. It only runs a build, but this should be enough to you started.

  1. Download "majiq_test.sh" script attached to this wiki and move it to scisub.

  2. Prepare data needed to complete job found in the beginning of the script

  3. Submit job by running the following command bsub < majiq_test.sh

To check the status run bjobs. Once the job has finished, you should find the majiq_build directory and the "out" and an "error" file.

Non-Release MAJIQ

Install

  1. ssh into sciget.pmacs.upenn.edu

  2. run: module add git python/3.6.1

  3. run: pip3 install --user cython numpy pysam -U

  4. run: pip3 install --user git+https://bitbucket.org/biociphers/majiq.git#majiq -U

Running

  1. run: module add python/3.6.1

  2. run: ~/.local/bin/majiq -v

Static Directory

Each execute node will have a scratch (/scratch) directory that is available for large jobs. We've found that it is more efficient to move your files to the scratch directory, work from there, and then remove them when done. It might be a good idea to write your output files to this directory and then move them to their final location once the job has finished.

Transferring Files

Common Queue Commands

Check the LSF documentation for further information.

  • bjobs - Check job statuses

  • ibash - log onto a worker node

Questions or Concerns?

Submit a ticket at https://helpdesk.pmacs.upenn.edu under the "Systems" queue or email PMACS-SYS-SCI@lists.upenn.edu with all your LPC questions.

Storage

A Warning

There is limited space, I know there seems like a lot there, but our tape isn't infinite. Know how much space is available within the directory you're working.

Where to Store Data (AKA The Capitalistic model)

  • Home Directory - /home/<username> or ~/

    • For small or sensitive jobs, default to using your home directory. This will be the only space where you have control over the files.

  • Unbacked Up Directories (15TB each) - /project/barash_hdr{1,2,3}/

    • General Use.

  • Backed Up Directories (7TB each) - /project/barash_hdb{1,2}/

    • For files we don't want to lose as we can't easily recover (like FASTQ from GEO), stuff that is shared, and stuff that is important to reproduce papers.

Finding Data

If you can't remember were you put your work, here's a couple helpful command to find your way back.

find /project/barash_hd* -name <file_name>

Available Barash Lab Hosts

Run the following to see which hosts are available for Barash Lab:

bhosts barash_exe

As of Oct 22, 2019, available machines include:

bilbringi, chimaera, eradicator, mamut

Useful Commands

To help mitigate issue where users are controlling nodes, we'll have to include the following arguments in every job submission:

bsub -n 7 -R "span[ptile=7]" < job.sh

For example, if you are running STAR with 7 threads, the -n 7 in the bsub command ensures that you get 7 cores for the job and -R "span[ptile=7]" ensures that you get those 7 cores on the same machine, in most cases, for example running STAR or MAJIQ, you do need the cores on the same machine. General rule is that number of threads required == number of cores requested. Each machine on LPC has 15 cores, so don't request more than 7 cores unless you absolutely need to.

To increase memory you can use the -M option, for example the following option in your bsub command asks the lsf for 10 GB of RAM for the job:

-M 10240

Interactive mode on specific machine:

bsub -Is -m "mamut" -q barash_interactive 'bash'

Don't use the interactive bash if you need to run computationally heavy jobs, use the normal bsub command. The lsf system doesn't keep track of the resources used by the ibash and you can cause the machine to crash or other jobs to fail.

Check hosts statuses:

bhosts barash_exe

Additional Reading

https://wiki.pmacs.upenn.edu/pub/LPC

https://www.ibm.com/support/knowledgecenter/en/SSETD4_9.1.3/lsf_kc_cmd_ref.html