R package for data cleaning, preliminary data analysis and modeling assessing with visualisation.
Data cleaning have 2 functions at the moment:
-
DetMiss: Detecting missing value in a given data frame, data.table or vector. -
PopMiss: Imputing missing values via: deleting, replace, or populating with mean or mode.
-
DataSummary: Enhanced data summary of a dataset given min, max, missing, unique count and variable type info. -
CramersV: Calculate the Cramers' V statistics on character or factor variables in a given dataset. -
dataPlot: Plot response on an one-way basis by explanatory variable.
library(MASS)
data("Insurance")
dataPlot(Insurance$Age,Insurance$Claims,exposure = Insurance$Holders,
by=Insurance$District,xname="Age",byname="District")
dataComp: Compare two dataset to check whether there is a profile change.
-
compPlot: Compare model prediction with actual on one-way basis. -
liftPlot: Visualising and comparing model accuracy by lift curves. -
resiPlot: Assessing the residual using contour and AvsE chart.
Pred = Insurance$Claims + runif(nrow(Insurance),min=0,max=10)
resiPlot(Insurance$Claims,Pred)
-
rocPlot: Comparing model predictions under roc curve (AUC). -
interPlot: Visualising data feature or model predictions by 2 factors at the same time using 3D plot.
interPlot(Insurance$Age,Insurance$District,Insurance$Claims,xname="Age",yname="District")
-
modelMetric: Gives simple model metrics calculation functions. -
tree2data: function used to collect information fromgbmorrandomForestmodel object to create data forsankeyNetworkplot innetworkD3package. Example:
library(networkD3)
data(iris)
iris.mod <- gbm(Species ~ ., distribution="multinomial", data=iris, n.trees=2000, shrinkage=0.01, cv.folds=5, verbose=FALSE, n.cores=1)
tree_data <- tree2data(iris.mod,1)
sankeyNetwork(tree_data[[1]],tree_data[[2]],Source="src",Target="tar",Value="value",NodeID="name")
You can install DataMan from GitHub as follows:
devtools::install_github('SixiangHu/DataMan')This package is free and open source software, licensed under GPL 2 or later.


