Multi-Ancestry Analysis

Online Multi-Ancestry PRS Training

PennPRS multiple-ancestry data analysis pipeline.

Quick-Start Working Example

In this section, we provide a working example of the PennPRS multi-ancestry data analysis pipeline. More detailed instructions can be found in the section below.

Suppose a user wants to train PRS models for human height in four populations, including AFR, AMR, EAS, and EUR. The user can navigate to our GWAS Queryable Database and search for 'Height':

Searching for "Height" on the PennPRS GWAS Queryable Database.

There are four studies can be used for this multi-ancestry PRS training, including GCST90095033 (AMR, 59,771 subjects), GCST90018739 (EAS, 165,056 subjects), GCST90029008 (EUR, 673,878 subjects), and GCST90013468 (AFR, 25,369 subjects). Users can view more details about these studies by clicking the corresponding 'View Study' link, which redirects to the corresponding GWAS Catalog page. Here is an example for GCST90013468.

GWAS Catalog page for study "GCST90013468".

To begin PRS training using these datasets from the GWAS Catalog, the user can navigate to our multi-ancestry analysis page.

PennPRS Multi-Ancestry Analysis page.

Input '4' ancestries in this question.

PennPRS Multi-Ancestry Analysis Step 1.

Next, enter the relevant information in each of the four ancestries one by one, which can be obtained from the corresponding GWAS Catalog page. Then click 'Save & Continue'.

PennPRS Multi-Ancestry Analysis Step 1.

Next, the user selects the specific PRS training methods. PennPRS currently support the PROPSER-pseudo method. We strongly recommend using the default settings for this method, although users have the option to modify them if needed.

PennPRS Multi-Ancestry Analysis Step 2.

Next, the user names the job. We recommend enabling email notifications and double-checking the job details before submission. Then click 'Submit'.

The user will then be directed to a page confirming successful job submission. If email notifications are enabled, the user will also receive updates on the job status.

PennPRS Multi-Ancestry Analysis Email Notification.

A typical job takes approximately 5 to 20 hours to complete, depending on server load as well as the nature of the datasets selected. The user will receive an email notification once the job is finished. If no errors occur, the user can then download the trained PRS models (for four ancestries in this case) and the log file in a zip file from the 'Job Center'.

The job log file indicates that the job has been successfully completed without errors.
The eight PennPRS models (two models for each of the four ancestries) generated by this job.

If the job fails, the user can check the returned log files (available from the "Download Error Log"), browse the FAQ Section, or contact the PennPRS team directly for support.

Detailed Steps of Job Submission

Our cloud-based, end-to-end multi-ancestry PRS model training job consists of four steps:

Step 1. Upload or query one GWAS summary-level data file for each ancestry population.

Step 2. Select the multi-ancestry PRS method and specify the model parameter setting.

Step 3. Configure and submit the job.

Step 4. Monitor job status and download results.

Multi-ancestry PRS model training on PennPRS.

Below we provide details for each step.

Step 1. Build input GWAS summary data files from multiple ancestries.

Similar to single-ancestry analysis, users can either upload their local GWAS summary data files or query summary data from our public GWAS summary database built based on over 27,000 harmonized datasets from the GWAS Catalog.

The required data format is the same as that for single-ancestry analysis, and users are allowed to upload their local data for a subset of ancestries and query data for the remaining ancestries. Please also make sure to upload one file at a time (maximum file size allowed: 800MB). For detailed instructions, please refer to Step 1 on single-ancestry analysis.

Note: each multi-ancestry analysis job only allows uploading or querying two to five GWAS summary data files, with all the uploaded files coming from different ancestries.

Specifiy the number of ancestries in multi-ancestry analysis and input GWAS summary data.

Update April 06, 2025: We have supported direct querying of GWAS summary statistics of over 2400 disease phenotypes from the FinnGen database (R12) (https://pennprs.org/data).

Step 2. Select the multi-ancestry PRS method and specify the model parameter setting.

We currently support one multi-ancestry method, PROSPER, using the pseudo-training version developed and tested by the PennPRS team. The user can either use the default setting (highly recommended) or customize the settings.

Select the multi-ancestry PRS method and specify the model parameter setting.

Step 3. Configure and submit the job

In Step 3, the user can provide a job name and enable email notifications. The user will then review the input data and method information before submitting the job.

Input job name and enable email notifications.
Job configration, review, and submission.

Step 4. Monitor job status and download results.

Once a job is successfully submitted, the user will see the following page, and the user can monitor the job status by clicking "View Job Status".

Job successfully submitted.

If the email notifications are enabled in Step 3, the user will receive separate status updates from nonreply.pennprs@gmail.com at each stage:

(i) when the job is successfully submitted

(ii) when the job starts running

(iii) when the job finishes (either completed successfully or failed)

The user can view the job status and download the results from the "Job Center".

If the job is completed successfully, the user will be able to obtain the PRS weights by clicking "Download Results".

If the job fails, the user can check the returned log files (available from the "Download Error Log"), browse the FAQ Section, or contact the PennPRS team directly for support.

Monitor job status and download results.

Last updated