Delete columns/rows with more than x% missing

To remove columns with some amount of NA, you can use
colMeans(is.na(...))

## Some sample data
set.seed(0)
dat <- matrix(1:100, 10, 10)
dat[sample(1:100, 50)] <- NA
dat <- data.frame(dat)

## Remove columns with more than 50% NA
dat[, which(colMeans(!is.na(dat)) > 0.5)]

## Remove rows with more than 50% NA
dat[which(rowMeans(!is.na(dat)) > 0.5), ]

## Remove columns and rows with more than 50% NA
dat[which(rowMeans(!is.na(dat)) > 0.5), which(colMeans(!is.na(dat)) > 0.5)]

More Related Contents:

Use dynamic name for new column/variable in `dplyr`
data.table vs dplyr: can one do something well the other can’t or does poorly?
Extract a dplyr tbl column as a vector
Pass arguments to dplyr functions
dplyr::select function clashes with MASS::select
Sum across multiple columns with dplyr
dplyr: nonstandard column names (white space, punctuation, starts with numbers)
dplyr filter: Get rows with minimum of variable, but only the first if multiple minima
dplyr mutate/replace several columns on a subset of rows
dplyr: How to use group_by inside a function?
case_when in mutate pipe
Summarize all group values and a conditional subset in the same call
Concatenate unique strings after groupby in R
ignore NA in dplyr row sum
How to parametrize function calls in dplyr 0.7?
Use filter in dplyr conditional on an if statement in R
How to do range grouping on a column using dplyr?
combine two data frames with all posible combinations
dplyr / R cumulative sum with reset
finding close match from data frame 1 in data fame 2
R: How to filter/subset a sequence of dates
Add a column with count of NAs and Mean
select columns based on multiple strings with dplyr contains()
grouped operations that result in length not equal to 1 or length of group in dplyr
Error with select function from dplyr
Repeating rows of data.frame in dplyr [duplicate]
Run a custom function on a data frame in R, by group
Dealing with spaces and “weird” characters in column names with dplyr::rename()
Filtering rows in R unexpectedly removes NAs when using subset or dplyr::filter
Count how many values in some cells of a row are not NA (in R)

More Related Contents:

Leave a Comment Cancel reply