Random forest prediction

11/22/2023

# Fitting and evaluating a model without interactions. Note: Every plotting function in the package now allows changing the colors of their main features via specific arguments such as lor, lor, or lor. The figure below shows the scatterplots of the response variable (y axis) against each predictor (x axis). The predictors (columns 5 to 21) represent diverse factors that may influence plant richness such as sampling bias, the area of the ecoregion, climatic variables, human presence and impact, topography, geographical fragmentation, and features of the neighbors of each ecoregion. World <- rnaturalearth :: ne_countries ( scale = "medium", returnclass = "sf" ) ggplot2 :: ggplot ( ) + ggplot2 :: geom_sf ( data = world, fill = "white" ) + ggplot2 :: geom_point ( data = plant_richness_df, ggplot2 :: aes ( x = x, y = y, color = richness_species_vascular ), size = 2.5 ) + ggplot2 :: scale_color_viridis_c ( direction = - 1, option = "F" ) + ggplot2 :: theme_bw ( ) + ggplot2 :: labs (color = "Plant richness" ) + ggplot2 :: scale_x_continuous (limits = c ( - 170, - 30 ) ) + ggplot2 :: scale_y_continuous (limits = c ( - 58, 80 ) ) + ggplot2 :: ggtitle ( "Plant richness of the American ecoregions" ) + ggplot2 :: xlab ( "Longitude" ) + ggplot2 :: ylab ( "Latitude" ) Any column yielding TRUE will generate issues while trying to fit models with spatialRF. If higher than 0, you can find what columns are giving issues with sapply(as.ame(scale(df)), function(x)any(is.nan(x))) and sapply(as.ame(scale(df)), function(x)any(is.infinite(x))). You can check each condition with sum(apply(scale(df), 2, is.nan)) and sum(apply(scale(df), 2, is.infinite)). Columns must not yield NaN or Inf when scaled.This condition can be checked with apply(df, 2, var) = 0. If the result is larger than 0, then just execute df <- na.omit(df) to remove rows with empty cells. You can check if there are NA records with sum(apply(df, 2, is.na)). However, binary responses with values 0 and 1 are partially supported. They may work, or they won’t, but in any case, I designed this package for quantitative data alone. Factors in the response or the predictors are not explicitly supported in the package.Fitting a Random Forest model is moot otherwise. The number of predictors should be larger than 3.This limitation comes from the fact that the distance matrix grows very fast with an increasing number of training records, so for large datasets, there might not be enough RAM in your machine. The number of rows must be somewhere between 100 and ~5000, at least if your target is fitting spatial models.At the moment, tibbles are not fully supported. The data required to fit random forest models with spatialRF must fulfill several conditions: If after considering these limitations you are still interested, follow me, I will show you how it works. Take temporal autocorrelation into account (but this is something that might be implemented later on). Imputation or extrapolation (it can be done, but models based on spatial predictors are hardly transferable). Work with “big data”, whatever that means. Predict a model result over another region with a different spatial structure. However, when the focus is on fitting spatial models, and due to the nature of the spatial predictors used to represent the spatial structure of the training data, there are many things this package cannot do: In such case, the spatial autocorrelation of the model’s residuals is not assessed. Therefore, the spatial analyses implemented in the package can be applied to any spatial dataset, regular or irregular, with a sample size between ~100 and ~5000 cases (the higher end will depend on the RAM memory available), a quantitative or binary (values 0 and 1) response variable, and a more or less large set of predictive variables.Īll functions but rf_spatial() work with non-spatial data as well if the arguments distance.matrix and distance.thresholds are not provided In such case, the number of training cases is no longer limited by the size of the distance matrix, and models can be trained with hundreds of thousands of rows. The goal of spatialRF is to help fitting explanatory spatial regression, where the target is to understand how a set of predictors and the spatial structure of the data influences response variable.

0 Comments

Random forest prediction

Leave a Reply.

Author

Archives

Categories