Feature/DIMS refactor GenerateViolinPlots #82

ALuesink · 2025-08-15T14:13:58Z

The refactor of GenerateViolinPlots:

code to functions
added unit tests

…inPlots

…tor_GenerateViolinPlots

rernst

First of all lots of work done, good job! I feel like there is room for improvement, some general thoughts:

Many parameters have names like metab_interest_sorted. In the context of a function, it’s not relevant whether the input is “of interest” or “sorted.” Use neutral, descriptive names that reflect the data type or role.
Several functions are named after their use case rather than their functionality. name functions based on what they do, not where they are used.
When breaking function calls across lines, maintain a consistent style. Preferred format:

Rfunction1(
   function_2(param_a),  
   param_b,  
   param_c,
)

There is no error catching for missing files or invalid paths. Currently, the code will crash, making debugging difficult.
There seems to be a lot of ad-hoc data transformations. It feels like the DIMS application is missing a standardized data format for saving and reusing data between steps.

rernst · 2025-11-18T15:20:20Z

DIMS/export/generate_violin_plots_functions.R

+#' @param intensity_cols: names of the columns that contain the intensities (string)
+#'
+#' @returns fraction_side_intensity: a vector of intensities (vector of integers)
+get_intentities_for_ratios <- function(ratios_metabs_df, row_index, intensities_zscore_df, fraction_side, intensity_cols) {


The functionality of this function would get more clear with some more descriptive comments, for example before each if/else block. Secondly the name get_intentities_for_ratios implies that we get multiple intensities for multiple ratios, however the return object fraction_side_intensity implies only one value.

rernst · 2025-11-18T15:20:50Z

DIMS/export/generate_violin_plots_functions.R

+#' @param intensity_cols: names of the columns that contain the intensities (string)
+#'
+#' @returns fraction_side_intensity: a vector of intensities (vector of integers)
+get_intentities_for_ratios <- function(ratios_metabs_df, row_index, intensities_zscore_df, fraction_side, intensity_cols) {


Function name contains a typo intentities -> intensities

rernst · 2025-11-18T15:23:42Z

DIMS/export/generate_violin_plots_functions.R

+get_zscore_columns <- function(colnames_zscore, intensity_cols) {
+  sample_intersect <- intersect(paste0(intensity_cols, "_Zscore"), grep("_Zscore", colnames_zscore, value = TRUE))
+  return(sample_intersect)
+}


The function name get_zscore_columns implies that we get columns (data or index?) with z-scores. The descriptions describes we get sample_ids.

A better name would be something like get_sample_ids_with_zscore.

rernst · 2025-11-18T15:28:44Z

DIMS/export/generate_violin_plots_functions.R

+get_list_metabolites <- function(metab_group_dir) {
+  # get a list of all metabolite files
+  metabolite_files <- list.files(metab_group_dir, pattern = "*.txt", full.names = FALSE, recursive = FALSE)
+  # put all metabolites into one list
+  metab_list_all <- lapply(paste(metab_group_dir, metabolite_files, sep = "/"),
+                           read.table, sep = "\t", header = TRUE, quote = "")
+  names(metab_list_all) <- gsub(".txt", "", metabolite_files)
+
+  return(metab_list_all)
+}


Use the same 'word' for metabolite -> not metab.

You named the function to its use, not to its function. I think that it just creates a bunch of dataframes from a directory containing .txt files. So a better name would be something like (making it reusable) -> get_dataframes_from_dir.

rernst · 2025-11-18T15:30:10Z

DIMS/export/generate_violin_plots_functions.R

+  # get a list of all metabolite files
+  metabolite_files <- list.files(metab_group_dir, pattern = "*.txt", full.names = FALSE, recursive = FALSE)
+  # put all metabolites into one list
+  metab_list_all <- lapply(paste(metab_group_dir, metabolite_files, sep = "/"),


Set full_names to True to get ride of the 'paste' on line 48.

rernst · 2025-11-19T08:30:03Z

DIMS/GenerateViolinPlots.R

+# Remove columns, move HMDB_code & HMDB_name column to the front, change intensity columns to numeric
+intensities_zscore_df <- intensities_zscore_df %>%
+  select(-c(plots, HMDB_name_all, HMDB_ID_all, sec_HMDB_ID, HMDB_key, sec_HMBD_ID_rlvnc, name,
+            relevance, descr, origin, fluids, tissue, disease, pathway, nr_ctrls)) %>%
+  relocate(c(HMDB_code, HMDB_name)) %>%
+  rename(mean_controls = avg_ctrls, sd_controls = sd_ctrls) %>%
+  mutate(across(!c(HMDB_name, HMDB_code), as.numeric))
+
+# Get the controls and patient IDs, select the intensity columns
+controls <- colnames(intensities_zscore_df)[grepl("^C", colnames(intensities_zscore_df)) &
+                                              !grepl("_Zscore$", colnames(intensities_zscore_df))]
+control_intensities_cols_index <- which(colnames(intensities_zscore_df) %in% controls)
+nr_of_controls <- length(controls)
+
+patients <- colnames(intensities_zscore_df)[grepl("^P", colnames(intensities_zscore_df)) &
+                                              !grepl("_Zscore$", colnames(intensities_zscore_df))]
+patient_intensities_cols_index <- which(colnames(intensities_zscore_df) %in% patients)
+nr_of_patients <- length(patients)
+
+intensity_cols_index <- c(control_intensities_cols_index, patient_intensities_cols_index)
+intensity_cols <- colnames(intensities_zscore_df)[intensity_cols_index]


This could be one (or more) 'prepare_data' functions.

rernst · 2025-11-19T08:31:27Z

DIMS/GenerateViolinPlots.R

+intensity_cols_index <- c(control_intensities_cols_index, patient_intensities_cols_index)
+intensity_cols <- colnames(intensities_zscore_df)[intensity_cols_index]
+
+#### Calculate ratios of intensities for metabolites ####


Parts of this block can be 'calculate' functions.

rernst · 2025-11-19T08:31:48Z

DIMS/GenerateViolinPlots.R

+zscore_patients_df <- intensities_zscore_ratios_df %>% select(HMDB_code, HMDB_name, any_of(paste0(patients, "_Zscore")))
+zscore_controls_df <- intensities_zscore_ratios_df %>% select(HMDB_code, HMDB_name, any_of(paste0(controls, "_Zscore")))
+
+#### Make violin plots #####


And this a make create violoin plot pdf function

rernst · 2025-11-19T08:33:04Z

DIMS/GenerateViolinPlots.R

+save_prob_scores_to_excel(diem_probability_score, output_dir, run_name)
+
+
+#### Generate dIEM plots #########


This could also be a function.

rernst · 2025-11-19T08:33:28Z

DIMS/GenerateViolinPlots.R

+    # metabs_iems <- lapply(top_iems, function(iem) {
+    #   iem_probablity <- patient_top_iems_probs %>% filter(Disease == iem) %>% pull(!!sym(patient_id))
+    #   metabs_iems_names <- c(metabs_iems_names, paste0(iem, ", probability score ", iem_probablity))
+    #   metab_iem <- expected_biomarkers_df %>% filter(Disease == iem) %>% select(HMDB_code, HMDB_name)
+    #   return(metab_iem)
+    # })
+    # names(metabs_iems) <- metabs_iems_names


Remove old? code.

ALuesink added 10 commits August 7, 2025 17:00

Refactored code GenerateViolinPlots

e8d0a7f

Fixed errors

9ffd606

Fixed linting

aede6ee

Added new package for unit testing

8094aed

Added unit tests and associated files for GenerateViolinPlots

62160b2

Fixed snapshot issues GenerateViolinPlots

cf9f349

Changes to snapshots

bd128dd

Fixed snapshot issues, second try

624d64a

Fixing snapshot issue, third try

9a56d10

Fixed snapshot issue, fourth try

936f292

ALuesink marked this pull request as ready for review August 21, 2025 12:52

ALuesink mentioned this pull request Sep 2, 2025

Refactor GenerateViolinPlots UMCUGenetics/DIMS#115

Open

ALuesink added 7 commits September 5, 2025 13:32

Merge branch 'develop' into feature/DIMS_refactor_GenerateViolinPlots

a02016a

Fixed issue if P1001 is present but no Z-scores

926ad6d

print statement for testing

b1a3858

Removed duplicated line & print statement

ba47db5

Merge branch 'origin/develop' into feature/DIMS_refactor_GenerateViol…

4079ecc

…inPlots

Merge remote-tracking branch 'origin/develop' into feature/DIMS_refac…

7439b26

…tor_GenerateViolinPlots

Fix for error dIEM plots

3875aaf

ALuesink changed the base branch from main to develop November 3, 2025 08:24

rernst self-requested a review November 18, 2025 14:54

rernst requested changes Nov 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature/DIMS refactor GenerateViolinPlots #82

Feature/DIMS refactor GenerateViolinPlots #82

Uh oh!

ALuesink commented Aug 15, 2025 •

edited

Loading

Uh oh!

rernst left a comment

Uh oh!

rernst Nov 18, 2025

Uh oh!

rernst Nov 18, 2025

Uh oh!

rernst Nov 18, 2025

Uh oh!

rernst Nov 18, 2025

Uh oh!

rernst Nov 18, 2025

Uh oh!

rernst Nov 19, 2025

Uh oh!

rernst Nov 19, 2025

Uh oh!

rernst Nov 19, 2025

Uh oh!

rernst Nov 19, 2025

Uh oh!

rernst Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		save_prob_scores_to_excel(diem_probability_score, output_dir, run_name)


		#### Generate dIEM plots #########

Feature/DIMS refactor GenerateViolinPlots #82

Are you sure you want to change the base?

Feature/DIMS refactor GenerateViolinPlots #82

Uh oh!

Conversation

ALuesink commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rernst left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ALuesink commented Aug 15, 2025 •

edited

Loading