---
title: "REMA basics"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{REMA basics}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
rema_path <- find.package("rema")
knitr::opts_knit$set(root.dir = file.path(rema_path, "example_data"))
knitr::opts_chunk$set(warning = FALSE, message = FALSE)

# devtools::build_rmd(files = 'vignettes/ex1_basics.Rmd')
```

```{r setup, warning = FALSE, message = FALSE}
library(rema)
library(ggplot2)
library(dplyr)
library(knitr)

ggplot2::theme_set(cowplot::theme_cowplot(font_size = 14) +
                     cowplot::background_grid() +
                     cowplot::panel_border())
```

In this series of vignettes we walk through several examples of how to prepare and fit a suite of random effects (RE) models for biomass estimation and apportionment using the `rema` package.

## The `rema` workflow:

1.  Load `rema` and data. The user can read biomass or other abundance index data from file (e.g. .csv file), or they can use the `rwout.rep` report file from the ADMB version of the RE model using `read_admb_re()`.

2.  Specify model structure and assumptions using `prepare_rema_input()`. This function allows users to quickly transition from a single to two survey model, specify alternative process error structures, add likelihood penalties or priors on parameters, and evaluate alternative assumptions about zero biomass observations.

3.  Fit the specified REMA model using `fit_rema()` and determine whether the model has met basic convergence criteria (e.g., Hessian is positive definite, a maximum gradient component approximately equal to zero).

4.  Extract `rema` model output into clean, consistently formatted data frames using `tidy_rema()`. The user can visualize this model output using `plot_rema()`, or quickly format it into tables for a report.

5.  Compare alternative REMA models and conduct model selection using `compare_rema_models()`. Output from this function includes a table of Akaike Information Criteria (AIC) when appropriate, figures, and tidied data frames. This function also accepts model output from the ADMB version of the RE model for easy comparison to past models.

Taken together, these functions allow R users to quickly fit and interrogate a suite of simple statistical models in TMB without needing software-specific training or expertise.

## Example 1: Univariate random effects (RE) model with a single survey and stratum 

This example uses Aleutian Islands shortraker (`aisr_rwout.rep`) and fits to NMFS bottom trawl survey estimates. 

1.  Read in the `rwout.rep`, a custom report file generated by the ADMB version of the random effects model.

```{r read-admb, echo=TRUE, message=FALSE, warning=FALSE}
# ?read_admb_re
admb_re <- read_admb_re(filename = 'aisr_rwout.rep',
                        # optional label for the single biomass survey stratum
                        biomass_strata_names = 'Aleutians Islands',
                        model_name = 'ADMB: AI shortraker')
names(admb_re)
kable(admb_re$biomass_dat)

```

2.  Prepare REMA model inputs using the `admb_re` data object.

```{r prepare-input, echo=TRUE, message=FALSE, warning=FALSE}
# ?prepare_rema_input 
input <- prepare_rema_input(model_name = 'TMB: AI shortraker', admb_re = admb_re)
names(input) 
```

The `input$data`, `input$par`, `input$map`, and `input$random` are used by `TMB` to fit the REMA model. The other objects are used by other functions to process model results or conduct residual analyses.

Data could alternatively be entered into `prepare_rema_input()` using the `biomass_dat` argument:
```{r prepare-input-alt, eval=FALSE, message=FALSE, warning=FALSE}
# (not run)
input <- prepare_rema_input(model_name = 'tmb_rema_aisr', biomass_dat = admb_re$biomass_dat)
```

3.  Fit REMA model

```{r fit-rema, echo=TRUE, message=FALSE, warning=FALSE}
# ?fit_rema
m <- fit_rema(input)
```

4.  Get tidied model output and plot results

Note that the 95% confidence intervals of the observations (i.e., `obs_lci` and `obs_uci` in `output$biomass_by_strata` are based on the assumption of normality in log-space; therefore, they are asymmetric on the arithmetic scale.
```{r rema-output, echo=TRUE, message=FALSE, warning=FALSE}
# ?tidy_rema
output <- tidy_rema(rema_model = m)
kable(output$parameter_estimates) # estimated fixed effects parameters
kable(head(output$biomass_by_strata, 5)) # data.frame of predicted and observed biomass by stratum
```

```{r plot-rema, echo=TRUE, fig.height=4, fig.width=7, message=FALSE, warning=FALSE}
# ?plot_rema
plots <- plot_rema(tidy_rema = output, biomass_ylab = 'Biomass (t)') # optional y-axis label
plots$biomass_by_strata
```

5.  Compare with ADMB RE model results

```{r compare, echo=TRUE, message=FALSE, fig.height=4, fig.width=8.5, warning=FALSE}
# ?compare_rema_models
compare <- compare_rema_models(rema_models = list(m),
                               admb_re = admb_re,
                               biomass_ylab = 'Biomass (t)')
compare$plots$biomass_by_strata
```

## Example 2: Multivariate random effects model (REM) with a single survey and multiple strata

This example uses Bering Sea and Aleutian Islands shortspine thornyhead (`bsaisst_rwout.rep`) and fits to the NMFS EBS slope and AI bottom trawl survey estimates. The original ADMB model has a single, shared process error (PE) over the three strata. This example demonstrates how to define alternative PE structures, and how to u to perform model selection using AIC. 

```{r ex2, echo=TRUE, message=FALSE, warning=FALSE}

admb_re <- read_admb_re(filename = 'bsaisst_rwout.rep',
                      biomass_strata_names = c('AI survey', 'EBS slope survey', 'S. Bering Sea (AI survey)'),
                      model_name = 'ADMB: single PE')

# the original ADMB model shared PE across all strata
input1 <- prepare_rema_input(model_name = 'TMB: single PE',
                            admb_re = admb_re,
                            PE_options = list(pointer_PE_biomass = c(1, 1, 1)))
m1 <- fit_rema(input1)
output <- tidy_rema(rema_model = m1)
kable(output$parameter_estimates) # estimated fixed effects parameters
kable(head(output$biomass_by_strata, 5)) # data.frame of predicted and observed biomass by stratum
kable(head(output$total_predicted_biomass, 5)) # combined/total predicted biomass
```

We can use `ggplot2` functions to modify formatting of plots:
```{r plot-rema2, echo=TRUE, message=FALSE, fig.height=9, fig.width=8.5, warning=FALSE}

plots <- plot_rema(tidy_rema = output, biomass_ylab = 'Biomass (t)')
plots$biomass_by_strata + facet_wrap(~strata, ncol = 1, scales = 'free_y')
```

One of the primary uses for the RE model is apportioning catch by management area. The `tidy_rema()` function output includes a table with proportions of predicted biomass by strata:

```{r appo, echo=TRUE, message=FALSE, warning=FALSE}
kable(tail(output$proportion_biomass_by_strata, 3)) 
# figure:
# plots$proportion_biomass_by_strata
```

We can compare the TMB model results with the original ADMB model. Note different confidence intervals in the ADMB and TMB versions. The ADMB code uses the Marlow method to sum the variances of biomass in log-space, whereas the TMB version applies the standard Delta method.

```{r compare2, echo=TRUE, message=FALSE, fig.height=9, fig.width=8.5, warning=FALSE}
compare <- compare_rema_models(rema_models = list(m1),
                               admb_re = admb_re,
                               biomass_ylab = 'Biomass (t)')

# Note different confidence intervals between the ADMB version (Marlow method) and the TMB version (Delta method)
compare$plots$biomass_by_strata + facet_wrap(~strata, ncol = 1, scales = 'free_y')
```

```{r compare3, echo=TRUE, message=FALSE, fig.height=4, fig.width=8.5, warning=FALSE}
compare$plots$total_predicted_biomass + ggplot2::ggtitle('BSAI Shortspine thornyhead') 
```

We can easily fit an alternative REMA model with strata-specific PE parameters, and compare the two models using AIC. In this case we see the single PE model has the lowest AIC value. Note that ADMB models cannot currently be compared with REMA models using AIC, therefore we omit the `admb_re = admb_re` argument in the `compare_rema_models()` function call below:

```{r compare5, echo=TRUE, message=FALSE, fig.height=9, fig.width=8.5, warning=FALSE}

# REMA defaults to strata-specific parameters, which could be explicitly defined
# as: PE_options = list(pointer_PE_biomass = c(1, 2, 3))
input2 <- prepare_rema_input(model_name = 'TMB: strata-specific PE',
                            admb_re = admb_re) 

m2 <- fit_rema(input2)

compare <- compare_rema_models(rema_models = list(m1, m2),
                               biomass_ylab = 'Biomass (t)')

compare$plots$biomass_by_strata + facet_wrap(~strata, ncol = 1, scales = 'free_y')
kable(compare$aic)
kable(compare$output$parameter_estimates)

```