Update writing_templates_and_data_guides.Rmd

douwe · douwe · commit 7ce4cd836814 · 2026-03-03T17:50:20.000+01:00
diff --git a/README.Rmd b/README.Rmd
@@ -56,54 +56,6 @@ Once you have entered the data and metadata in a template you can use the `read_
 
 Details about the data guide format and how to write one as well as about how to design a template can be found in the package vignettes.
 
-### Data guide
-
-The *data guide* is a human readable and editable file in [YAML](https://yaml.org/spec/1.2.2/) format that specifies the structure and location of the data in the Excel file. It contains a list of data types, each of which is defined by a name and a set of parameters. As the name suggests, the *data guide* is used by the **excelDataGuide** package as a guide to extract all indexed data from the Excel file and convert it into proper R objects. Part of the *data guide* from the example in the package, *i.e.* `system.file("extdata", "example_guide.yml", package = "excelDataGuide")` is shown below:
-
-``` yaml
-guide.version: '1.0'
-template.name: competition
-template.min.version: '9.3'
-template.max.version: ~
-plate.format: 96
-locations:
-  - sheet: description
-    type: cells
-    varname: .template
-    translate: false
-    variables:
-      - name: version
-        cell: B2
-  - sheet: description
-    type: keyvalue
-    translate: true
-    atomicclass:
-      - character
-      - character
-      - character
-      - character
-      - character
-      - date
-      - character
-      - numeric
-      - character
-      - numeric
-      - character
-      - numeric
-      - character
-      - character
-    varname: metadata
-    ranges:
-      - A10:B21
-      - A24:B25
-```
-
-We provide a cue schema for the data guide, allowing you to check the validity of 
-guides that you wrote. The schema is available in the package as 
-`system.file("extdata", "excelguide_schema.cue", package = "excelDataGuide")`. To 
-check its validity against the schema you can use the [CUE](https://cuelang.org/docs/) validator. 
-More details can be found in the vignette (to be done, see below).
-
 ## Future work
 
 - Complete the vignette ([issue](https://github.com/SystemsBioinformatics/excelDataGuide/issues/2))
diff --git a/README.md b/README.md
@@ -58,85 +58,23 @@ data <- read_data(datafile, guidefile)
 The output of the `read_data()` function is a list object the format of
 which is determined for a large part by the design of the data guide.
 
-## Details
-
-### How it works
-
-When you design a template Excel file for data reporting and analysis
-you also create a *data guide* file that specifies the structure and
-location of the data in the template. If you design the template
-carefully you can use the same data guide for several versions of the
-template. That is, as long as the location of the indexed data does not
-change, you can use the same data guide for different versions of the
-template. You can specify the compatible version of the templates in the
-*data guide*. The package will check compatibility. Clearly, you should
-use versioned data templates, and hence, a required field in a template
-is its version number. An example of a template with data is provided in
-the package
-(`system.file("extdata", "example_data.xlsx", package = "excelDataGuide")`).
+## How it works
+
+When you design a template spreadsheet file for data reporting and
+analysis you also create a *data guide* file that specifies the
+structure and location of the data in the template. Examples of a
+template with data and of a data guide are provided in the package
+(`system.file("extdata", "example_data.xlsx", package = "excelDataGuide")`
+and
+`system.file("extdata", "example_guide.yml", package = "excelDataGuide")`).
 
 Once you have entered the data and metadata in a template you can use
-the package to extract the data into R. The package will check and
-coerce the data types to the required formats.
-
-### Data guide
-
-The *data guide* is a human readable and editable file in
-[YAML](https://yaml.org/spec/1.2.2/) format that specifies the structure
-and location of the data in the Excel file. It contains a list of data
-types, each of which is defined by a name and a set of parameters. As
-the name suggests, the *data guide* is used by the **excelDataGuide**
-package as a guide to extract all indexed data from the Excel file and
-convert it into proper R objects. Part of the *data guide* from the
-example in the package, *i.e.*
-`system.file("extdata", "example_guide.yml", package = "excelDataGuide")`
-is shown below:
-
-``` yaml
-guide.version: '1.0'
-template.name: competition
-template.min.version: '9.3'
-template.max.version: ~
-plate.format: 96
-locations:
-  - sheet: description
-    type: cells
-    varname: .template
-    translate: false
-    variables:
-      - name: version
-        cell: B2
-  - sheet: description
-    type: keyvalue
-    translate: true
-    atomicclass:
-      - character
-      - character
-      - character
-      - character
-      - character
-      - date
-      - character
-      - numeric
-      - character
-      - numeric
-      - character
-      - numeric
-      - character
-      - character
-    varname: metadata
-    ranges:
-      - A10:B21
-      - A24:B25
-```
+the `read_data()` function in the package to extract the data into R
+with a single command. The package will check and coerce the data types
+to the required formats.
 
-We provide a cue schema for the data guide, allowing you to check the
-validity of guides that you wrote. The schema is available in the
-package as
-`system.file("extdata", "excelguide_schema.cue", package = "excelDataGuide")`.
-To check its validity against the schema you can use the
-[CUE](https://cuelang.org/docs/) validator. More details can be found in
-the vignette (to be done, see below).
+Details about the data guide format and how to write one as well as
+about how to design a template can be found in the package vignettes.
 
 ## Future work
 
diff --git a/vignettes/writing_templates_and_data_guides.Rmd b/vignettes/writing_templates_and_data_guides.Rmd
@@ -162,7 +162,7 @@ Fixed parameters needed for calculations, for example for acceptance criteria or
 (parameters that are usually described in a SOP) are best entered on a separate sheet, and referred to by absolute
 references or by named references in calculations. This mechanism prevents you from having to search through the 
 entire template for formulas using these parameters if you need to change them, and it prevents you from accidentally 
-using wrong values in calculations. In the case of the example we have a separate hidden sheet called **_parameters** 
+using wrong values in calculations. In the case of the example we have a separate hidden sheet called **\_parameters** 
 for this purpose. This prevents users from accidentally modifying them, and keeps the template clean and organized. 
 The information in this sheet can be indexed in the data guide, and will then be available to script-based analyses as well.
 
@@ -174,7 +174,13 @@ knitr::include_graphics("images/data.png")
 
 ### Missing values in spreadsheets
 
-TODO
+We urge you to use the `NA()` function to represent missing values in your tenmplates, in particular in calculations. The advantage of using `NA()` is that calculations in the sheets will automatically handle `NA()` and pass them on to subsequent caclulations, avoiding errors and producing sensible results. A disadvantage of using `NA()` is that it requires special care to detect and handle missing values in formulas. One particularly weird problem is that you can not use detection of the string "#N/A" in a cell as a way to generically detect missing values in formulas, even though this "solution" is often presented on internet fora, even in official documentation. The reason is that different language settings of Excel use different string representations for missing values. You have to consistently use the `ISNA()` function to detect `NA()` values throughout your entire template.
+
+### Labeled values (bad values)
+
+You may have obtained raw measurements that you do not want to include in your analysis. Clearly, you should not delete these measurements from the spreadsheet, because labelling a value as a "bad" measurement is, to some degree, a subjective action with which an other user or your future self may disagree. Instead, you can label them as "bad". An easy way to do this is by adding a star before or behind the value, *e.g.* `1000*` or `*1000`. You should also add a note explaining why the value is bad in a table with columns of cell addresses and remarks at a logical position in the same sheet. You can detect such "starred" values in Excel by using for example the `ISERROR()`, `ISNUMBER()` or `ISNONTEXT()` functions in a clause in calculations with these values and set a calculated cell to `NA()` based on the result. For example, `=IF(NOT(ISNUMBER(A1)), NA(), A1)` will set the cell with this formula to `NA()` if the value is not a number. An additional visual aid to detect "starred" values is to use a different font color or cell background for such cells using conditional formatting.
+
+In the excelDataGuide package we provide the functions `has_star()` and `star_to_number()` to detect "starred" values, convert them back to numbers, but label them as "bad" in a separate column in the template output.
 
 ### What else?