Skip to content

Random forest classifier example in R#1

Open
AnushreeChopde wants to merge 4 commits intollsc-supercloud:mainfrom
AnushreeChopde:master
Open

Random forest classifier example in R#1
AnushreeChopde wants to merge 4 commits intollsc-supercloud:mainfrom
AnushreeChopde:master

Conversation

@AnushreeChopde
Copy link

The example demonstrates the random forest classifier and how prediction of class is done on cardiotocography dataset (CTG.csv). We have shown the serial as well as parallel execution and implementation on the random forest model in R and observed the execution time. From the example it can be concluded that with increase in the number of cores, execution time required by parallel is much less than that taken by serial.

In this example, we have two folders named Serial and Parallel, under random_forest, each having respective R codes, README.md and submit.sh.

@lmilechin
Copy link
Contributor

lmilechin commented Dec 18, 2019

A few things:

  • Add package dependences to the Serial/README.md file (randomForest, caret, e1071)
  • Submission scripts (and possibly R scripts) have windows line endings, which prevents job submission. This can be fixed by cloning onto supercloud (or another Linux machine) and running dos2unix on each file, then committing/pushing those changes here
  • Typo in Parallel/README.md: doparallel should be doParallel
  • Typo in Parallel/README.md: Caret should be caret
  • Typo in Parallel/README.md: Foreach should be foreach
  • Remove commented lines in submission scripts
  • Don't redirect output in submission script, allow it to be written to output file (just have Rscript rf_serial.R)
  • There are no cores being allocated for the Parallel example, unless it is being submitted with the LLsub -s option, which should be documented. Since you are going up to 16 cores, you should allocate 16 cores to the job (-s option for LLsub, -c option for sbatch).
  • Got the following error when running the parallel version:
 Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x),  : 
  'data' must be of a vector type, was 'NULL'
Calls: plot ... matplot -> ncol -> as.matrix -> as.matrix.default -> array

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants