UBC Research Data

LIST-S2: pre-computed deleteriousness of all possible mutations in human (OX=9606) protein sequences Nawar Malhis

Description

LIST-S2 predicts the deleteriousness of amino acid mutations in protein sequences. Here we provide precomputed predictions of human protein sequences. Scores are in the range [0 .. 1], where lower scores indicate more benign and higher indicate deleteriousness. One can also visualize/download LIST-S2 scores for all possible mutation of a specific protein sequence identified by its UniProt accession number. https://precomputed.list-s2.msl.ubc.ca/ LIST-S2 2019_10 tabix files: Precomputed predictions of ~200,000 human protein sequences release 2019_10 identified by their UniParc protein ID (UPI). Columns: 1. UniParc: UniParc protein ID UPI 2. Position: sequence position start at 1. 3. rAA: Reference amino acid. 4. AAA: Allele amino acid. 5. LIST-S2: LIST-S2 score. Two files: 1. LIST-S2_Human_UniParc_2019_10.tsv.gz.tbi 2. LIST-S2_Human_UniParc_2019_10.tsv.gz, divided into 4 parts: LIST-S2_Human_UniParc_2019_10_part1 LIST-S2_Human_UniParc_2019_10_part2 LIST-S2_Human_UniParc_2019_10_part3 LIST-S2_Human_UniParc_2019_10_part4 To reassemble LIST-S2_Human_UniParc_2019_10.tsv.gz: mv LIST-S2_Human_UniParc_2019_10_part1 LIST-S2_Human_UniParc_2019_10.tsv.gz cat LIST-S2_Human_UniParc_2019_10_part2 >> LIST-S2_Human_UniParc_2019_10.tsv.gz cat LIST-S2_Human_UniParc_2019_10_part3 >> LIST-S2_Human_UniParc_2019_10.tsv.gz cat LIST-S2_Human_UniParc_2019_10_part4 >> LIST-S2_Human_UniParc_2019_10.tsv.gz LIST-S2_OX9606_2019_10: Precomputed predictions of ~200,000 human protein sequences release 2019_10 identified by their UniProt accession. Columns: 1. AC: Sequence id from the fasta header. 2. Pos: Amino acid position. 3. Ref: The reference amino acid. 4. Conservation: The average LIST-S2 deleteriousness score of all possible mutations at that position. 5. 20 columns one for each amino acid: The potential deleteriousness LIST-S2 score for mutating the reference amino acid to “this” amino acid. The data is divided into two files: LIST-S2_OX9606_2019_10_part1 LIST-S2_OX9606_2019_10_part2 To reassemble: mv LIST-S2_OX9606_2019_10_part1 OX9606.tar.gz cat LIST-S2_OX9606_2019_10_part2 >> OX9606.tar.gz LIST-S2_Genomic_Human_2019_10.tsv.gz: Precomputed predictions of ~60,000 human protein sequences release 2019_10 identified by their genomic positions. Columns: 1. Chromosome: chromosome id. 2. position_g: chromosome position 3. ref_n: reference nucleotide. 4. allele_n: allele nucleotide. 5. AC: UniProt accession. 6. UPI: UniParc protein ID. 7. position_aa: amino acid position. 8. ref_aa: reference amino acid. 9. allele_aa: allele amino acid. 10. LIST-S2: LIST-S2 score. LIST-SI_OX=9606 (2019-07-14): Precomputed predictions of ~115,000 human protein sequences identified by their UniProt accession followed by Ensembl ENST id. Columns: 1. AC: Sequence id from the fasta header. 2. Pos: Amino acid position. 3. Ref: The reference amino acid. 4. Conservation: The average LIST-S2 deleteriousness score of all possible mutations at that position. 5. 20 columns one for each amino acid: The potential deleteriousness LIST-S2 score for mutating the reference amino acid to “this” amino acid. The data is divided into two files: LIST-SI_OX=9606_P1.tar.gz and LIST-SI_OX=9606_P2.tar.gz.

Item Media

Item Citations and Data

Licence

CC0 1.0