Skip to content

Commit 7ce4cd8

Browse files
committed
Update writing_templates_and_data_guides.Rmd
1 parent aa616c7 commit 7ce4cd8

3 files changed

Lines changed: 22 additions & 126 deletions

File tree

README.Rmd

Lines changed: 0 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -56,54 +56,6 @@ Once you have entered the data and metadata in a template you can use the `read_
5656

5757
Details about the data guide format and how to write one as well as about how to design a template can be found in the package vignettes.
5858

59-
### Data guide
60-
61-
The *data guide* is a human readable and editable file in [YAML](https://yaml.org/spec/1.2.2/) format that specifies the structure and location of the data in the Excel file. It contains a list of data types, each of which is defined by a name and a set of parameters. As the name suggests, the *data guide* is used by the **excelDataGuide** package as a guide to extract all indexed data from the Excel file and convert it into proper R objects. Part of the *data guide* from the example in the package, *i.e.* `system.file("extdata", "example_guide.yml", package = "excelDataGuide")` is shown below:
62-
63-
``` yaml
64-
guide.version: '1.0'
65-
template.name: competition
66-
template.min.version: '9.3'
67-
template.max.version: ~
68-
plate.format: 96
69-
locations:
70-
- sheet: description
71-
type: cells
72-
varname: .template
73-
translate: false
74-
variables:
75-
- name: version
76-
cell: B2
77-
- sheet: description
78-
type: keyvalue
79-
translate: true
80-
atomicclass:
81-
- character
82-
- character
83-
- character
84-
- character
85-
- character
86-
- date
87-
- character
88-
- numeric
89-
- character
90-
- numeric
91-
- character
92-
- numeric
93-
- character
94-
- character
95-
varname: metadata
96-
ranges:
97-
- A10:B21
98-
- A24:B25
99-
```
100-
101-
We provide a cue schema for the data guide, allowing you to check the validity of
102-
guides that you wrote. The schema is available in the package as
103-
`system.file("extdata", "excelguide_schema.cue", package = "excelDataGuide")`. To
104-
check its validity against the schema you can use the [CUE](https://cuelang.org/docs/) validator.
105-
More details can be found in the vignette (to be done, see below).
106-
10759
## Future work
10860

10961
- Complete the vignette ([issue](https://github.com/SystemsBioinformatics/excelDataGuide/issues/2))

README.md

Lines changed: 14 additions & 76 deletions
Original file line numberDiff line numberDiff line change
@@ -58,85 +58,23 @@ data <- read_data(datafile, guidefile)
5858
The output of the `read_data()` function is a list object the format of
5959
which is determined for a large part by the design of the data guide.
6060

61-
## Details
62-
63-
### How it works
64-
65-
When you design a template Excel file for data reporting and analysis
66-
you also create a *data guide* file that specifies the structure and
67-
location of the data in the template. If you design the template
68-
carefully you can use the same data guide for several versions of the
69-
template. That is, as long as the location of the indexed data does not
70-
change, you can use the same data guide for different versions of the
71-
template. You can specify the compatible version of the templates in the
72-
*data guide*. The package will check compatibility. Clearly, you should
73-
use versioned data templates, and hence, a required field in a template
74-
is its version number. An example of a template with data is provided in
75-
the package
76-
(`system.file("extdata", "example_data.xlsx", package = "excelDataGuide")`).
61+
## How it works
62+
63+
When you design a template spreadsheet file for data reporting and
64+
analysis you also create a *data guide* file that specifies the
65+
structure and location of the data in the template. Examples of a
66+
template with data and of a data guide are provided in the package
67+
(`system.file("extdata", "example_data.xlsx", package = "excelDataGuide")`
68+
and
69+
`system.file("extdata", "example_guide.yml", package = "excelDataGuide")`).
7770

7871
Once you have entered the data and metadata in a template you can use
79-
the package to extract the data into R. The package will check and
80-
coerce the data types to the required formats.
81-
82-
### Data guide
83-
84-
The *data guide* is a human readable and editable file in
85-
[YAML](https://yaml.org/spec/1.2.2/) format that specifies the structure
86-
and location of the data in the Excel file. It contains a list of data
87-
types, each of which is defined by a name and a set of parameters. As
88-
the name suggests, the *data guide* is used by the **excelDataGuide**
89-
package as a guide to extract all indexed data from the Excel file and
90-
convert it into proper R objects. Part of the *data guide* from the
91-
example in the package, *i.e.*
92-
`system.file("extdata", "example_guide.yml", package = "excelDataGuide")`
93-
is shown below:
94-
95-
``` yaml
96-
guide.version: '1.0'
97-
template.name: competition
98-
template.min.version: '9.3'
99-
template.max.version: ~
100-
plate.format: 96
101-
locations:
102-
- sheet: description
103-
type: cells
104-
varname: .template
105-
translate: false
106-
variables:
107-
- name: version
108-
cell: B2
109-
- sheet: description
110-
type: keyvalue
111-
translate: true
112-
atomicclass:
113-
- character
114-
- character
115-
- character
116-
- character
117-
- character
118-
- date
119-
- character
120-
- numeric
121-
- character
122-
- numeric
123-
- character
124-
- numeric
125-
- character
126-
- character
127-
varname: metadata
128-
ranges:
129-
- A10:B21
130-
- A24:B25
131-
```
72+
the `read_data()` function in the package to extract the data into R
73+
with a single command. The package will check and coerce the data types
74+
to the required formats.
13275

133-
We provide a cue schema for the data guide, allowing you to check the
134-
validity of guides that you wrote. The schema is available in the
135-
package as
136-
`system.file("extdata", "excelguide_schema.cue", package = "excelDataGuide")`.
137-
To check its validity against the schema you can use the
138-
[CUE](https://cuelang.org/docs/) validator. More details can be found in
139-
the vignette (to be done, see below).
76+
Details about the data guide format and how to write one as well as
77+
about how to design a template can be found in the package vignettes.
14078

14179
## Future work
14280

vignettes/writing_templates_and_data_guides.Rmd

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -162,7 +162,7 @@ Fixed parameters needed for calculations, for example for acceptance criteria or
162162
(parameters that are usually described in a SOP) are best entered on a separate sheet, and referred to by absolute
163163
references or by named references in calculations. This mechanism prevents you from having to search through the
164164
entire template for formulas using these parameters if you need to change them, and it prevents you from accidentally
165-
using wrong values in calculations. In the case of the example we have a separate hidden sheet called **_parameters**
165+
using wrong values in calculations. In the case of the example we have a separate hidden sheet called **\_parameters**
166166
for this purpose. This prevents users from accidentally modifying them, and keeps the template clean and organized.
167167
The information in this sheet can be indexed in the data guide, and will then be available to script-based analyses as well.
168168

@@ -174,7 +174,13 @@ knitr::include_graphics("images/data.png")
174174

175175
### Missing values in spreadsheets
176176

177-
TODO
177+
We urge you to use the `NA()` function to represent missing values in your tenmplates, in particular in calculations. The advantage of using `NA()` is that calculations in the sheets will automatically handle `NA()` and pass them on to subsequent caclulations, avoiding errors and producing sensible results. A disadvantage of using `NA()` is that it requires special care to detect and handle missing values in formulas. One particularly weird problem is that you can not use detection of the string "#N/A" in a cell as a way to generically detect missing values in formulas, even though this "solution" is often presented on internet fora, even in official documentation. The reason is that different language settings of Excel use different string representations for missing values. You have to consistently use the `ISNA()` function to detect `NA()` values throughout your entire template.
178+
179+
### Labeled values (bad values)
180+
181+
You may have obtained raw measurements that you do not want to include in your analysis. Clearly, you should not delete these measurements from the spreadsheet, because labelling a value as a "bad" measurement is, to some degree, a subjective action with which an other user or your future self may disagree. Instead, you can label them as "bad". An easy way to do this is by adding a star before or behind the value, *e.g.* `1000*` or `*1000`. You should also add a note explaining why the value is bad in a table with columns of cell addresses and remarks at a logical position in the same sheet. You can detect such "starred" values in Excel by using for example the `ISERROR()`, `ISNUMBER()` or `ISNONTEXT()` functions in a clause in calculations with these values and set a calculated cell to `NA()` based on the result. For example, `=IF(NOT(ISNUMBER(A1)), NA(), A1)` will set the cell with this formula to `NA()` if the value is not a number. An additional visual aid to detect "starred" values is to use a different font color or cell background for such cells using conditional formatting.
182+
183+
In the excelDataGuide package we provide the functions `has_star()` and `star_to_number()` to detect "starred" values, convert them back to numbers, but label them as "bad" in a separate column in the template output.
178184

179185
### What else?
180186

0 commit comments

Comments
 (0)