--- title: "Data Imputation" author: "Bill Denney" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Data Imputation} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` Imputation may be required for noncompartmental analysis (NCA) calculations. Typical imputations may require setting the concentration before the first dose to zero or shifting actual time predose concentrations to the beginning of the dosing interval. PKNCA supports imputation either for the full analysis dataset or per calculation interval. The current list of imputation methods built into PKNCA can be found by looking at `?PKNCA_impute_method`: ```{r results='markup'} library(PKNCA) cat(paste( "*", ls("package:PKNCA", pattern = "^PKNCA_impute_method") ), sep = "\n") ``` ## How does imputation occur? (You can skip this section if you don't desire the details of the methods of imputation.) Imputation occurs just before calculations are performed within PKNCA. Imputation occurs only on a single interval definition at a time, so the same group (usually meaning the same subject with the same analyte) at the same time range can have different imputations for different parameter calculations. The reason that this is done is to ensure that there are no unintentional modifications to the data. As an example, if an AUC~0-24~ were calculated on Day 1 and Day 2 of a study with actual times, the nominal 24 hour sample may be collected at 23.5 hours. It may be preferable to keep the 23.5 hour sample at 23.5 hours for the Day 1 calculation, and at the same time, it may be preferred to shift the same 23.5 hr sample to 24 hours (time 0 on Day 2) for the Day 2 calculation. ## How to select imputation methods to use The selection of imputation methods uses a string of text with commas or spaces (or both) separating the imputation methods to use. No imputation will be performed if the imputation method is requested as `NA` or `""`. * To select no imputation (the default), indicate the imputation by `NA` or `""`. * To set imputation on the full dataset, use the `impute` argument to `PKNCAdata()` to specify the methods to use. * To set imputation by interval, use the `impute` argument to `PKNCAdata()` to specify the column in the intervals dataset to use for imputation. * You cannot specify imputation for both the full dataset and by interval at the same time. And, if a column name in the dataset matches the `impute` argument to `PKNCAdata()`, that will be used. Imputation method functions are named `PKNCA_impute_method_[method name]`. For example, the method to impute a concentration of 0 at time 0 is named `PKNCA_impute_method_start_conc0`. When specifying the imputation method to use, give the `[method name]` part of the function name. So for the example above, use `"start_conc0"`. To specify more than one, give all the methods in order with a comma or space separating them. For example, to first move a predose concentration up to the time of dosing and then set time 0 to concentration 0, use `"start_predose,start_conc0"`, and the two methods will be applied in order. ## Imputation for the full dataset If an imputation applies to the full dataset, it can be provided in the `impute` argument to `PKNCAdata()`: ```{r impute-full-data} library(PKNCA) # Remove time 0 to illustrate that imputation works d_conc <- as.data.frame(datasets::Theoph)[!datasets::Theoph$Time == 0, ] conc_obj <- PKNCAconc(d_conc, conc~Time|Subject) d_dose <- unique(datasets::Theoph[datasets::Theoph$Time == 0, c("Dose", "Time", "Subject")]) dose_obj <- PKNCAdose(d_dose, Dose~Time|Subject) data_obj <- PKNCAdata(conc_obj, dose_obj, impute = "start_predose,start_conc0") nca_obj <- pk.nca(data_obj) summary(nca_obj) ``` ## Imputation by calculation interval If an imputation applies to specific intervals, the column in the interval data.frame can be provided in the `impute` argument to `PKNCAdata()`: ```{r impute-by-interval} library(PKNCA) # Remove time 0 to illustrate that imputation works d_conc <- as.data.frame(datasets::Theoph)[!datasets::Theoph$Time == 0, ] conc_obj <- PKNCAconc(d_conc, conc~Time|Subject) d_dose <- unique(datasets::Theoph[datasets::Theoph$Time == 0, c("Dose", "Time", "Subject")]) dose_obj <- PKNCAdose(d_dose, Dose~Time|Subject) d_intervals <- data.frame( start=0, end=c(24, 24.1), auclast=TRUE, impute=c(NA, "start_conc0") ) data_obj <- PKNCAdata(conc_obj, dose_obj, intervals = d_intervals, impute = "impute") nca_obj <- pk.nca(data_obj) # PKNCA does not impute time 0 by default, so AUClast in the 0-24 interval is # not calculated summary(nca_obj) ``` ## Advanced: Writing your own imputation functions Writing your own imputation function is intended to be a simple process. To create an imputation function requires the following steps: 1. Write a function where the name starts with `PKNCA_impute_method_` and the remainder of the function name is a brief description of the method. (Such as `PKNCA_impute_method_start_conc0`.) 2. The function should have 4 arguments: `conc`, `time`, `...`, and `options`. 3. The function should return a single data.frame with two columns named `conc` and `time`. The rows in the data.frame must be sorted by `time`. In addition to the above, the function may take named arguments of: * `start` and `end` to indicate the start and end time of the interval, and * `conc.group` and `time.group` to indicate the concentrations and times that have not been filtered for the interval.