PCA Analysis Script – README

📌 Overview

This repository contains an R script (pca.r) that performs Principal Component Analysis (PCA) on datasets with:

Reference samples (ref) → used as a baseline group
Unknown samples (unknown) → compared against the baseline

The script:

Dynamically detects numeric analyte columns (names can change between runs)
Handles missing values in reference samples by imputing the mean of the reference group for that analyte
Optionally handles missing values in unknown samples
Computes PCA scores and loadings
Ranks reference samples by distance to each unknown in PCA space

Outputs:

pca_plots.pdf – multi‑page PDF:
1. PCA score plot (refs = grey points, unknowns = red points with labels)
2. Scree plot (variance explained per PC, in numeric PC order)
3. PCA loadings plot (PC1 vs PC2)
unknown_<SampleID>.csv – per‑unknown CSV containing:
- The unknown sample’s analyte values
- The N closest reference samples (20 by default)
- Sorted ascending by Euclidean distance in PCA space

📊 Statistical Background

1. Principal Component Analysis (PCA)

PCA transforms the original set of correlated variables (analytes) into a set of uncorrelated variables called principal components (PCs):

PC1 explains the largest variance in the dataset
PC2 explains the next largest variance, and so on
Data is centered and scaled before PCA

Mathematically:

Z = (X − X̄) / σ

PCA: Z · W = T

Where:

X = original analyte data
W = eigenvectors (loadings)
T = scores (coordinates in PC space)

2. Distance Metric

Distances are computed as Euclidean distance in PCA score space:

d(a, b) = √[ Σᵢ₌₁ᵏ (PCᵢ,ₐ − PCᵢ,ᵦ)² ]

where k is the number of PCs considered (all by default).

For each unknown:

Distances to all refs are calculated
The closest refs are ranked and saved

3. PCA Loadings

PCA loadings indicate how strongly each analyte influences a PC:

Large magnitude = greater influence on that PC
Sign indicates the direction of the relationship

🖥 Code Workflow

Data Input
- Reads pca_data.csv from the script’s folder
- Converts headers to lowercase
- Confirms sample and type columns exist
Pre‑processing
- Detects analyte columns automatically
- Imputes missing reference analyte values with reference means
  (optional: also imputes unknown values if enabled in the script)
PCA Calculation
- Uses prcomp() with scaling and centering
- Extracts scores and loadings
Output Generation
- Per unknown: Saves unknown_<SampleID>.csv with that unknown + N closest refs (default 20)
- Generates PDF with:
  1. Score plot (refs = grey points, unknowns = red points with labels)
  2. Scree plot (% variance explained)
  3. Loadings plot (PC1 vs PC2)

⚙️ Usage Instructions

1. Prepare your data

Save your dataset as pca_data.csv in the same directory as pca.r.

Required columns:

Sample → unique sample ID
Type → "ref" or "unknown"
One or more numeric analyte columns

Example:

Sample,Type,As,Ba,Cd,Co,Cr
REF001,ref,1.2,0.5,0.3,0.6,0.8
REF002,ref,1.3,0.4,0.4,0.5,0.7
UNK001,unknown,0.9,0.6,0.5,0.7,0.9

2. Run the Script

Rscript pca.r

3. Check the outputs

pca_plots.pdf → open to see PCA score plot, scree plot, loadings plot
unknown_.csv → inspect analyte values and ranked closest refs for each unknown

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
pca.r		pca.r
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PCA Analysis Script – README

📌 Overview

📊 Statistical Background

1. Principal Component Analysis (PCA)

2. Distance Metric

3. PCA Loadings

🖥 Code Workflow

⚙️ Usage Instructions

1. Prepare your data

2. Run the Script

3. Check the outputs

About

Uh oh!

Releases

Packages

Languages

DittoHK/pca

Folders and files

Latest commit

History

Repository files navigation

PCA Analysis Script – README

📌 Overview

📊 Statistical Background

1. Principal Component Analysis (PCA)

2. Distance Metric

3. PCA Loadings

🖥 Code Workflow

⚙️ Usage Instructions

1. Prepare your data

2. Run the Script

3. Check the outputs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages