Skip to content

Commit a196cc5

Browse files
committed
auc evaluation
1 parent 9da9859 commit a196cc5

2 files changed

Lines changed: 152 additions & 121 deletions

File tree

R_REVIEW.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,13 @@
77
(length(na.omit(diffs))*n/60)
88
```
99

10+
## AUC
11+
12+
```
13+
day = rep(data_ip[[2]], 1440/dt0),
14+
```
15+
Generate sequence of days repeated 1440/dt0, while it has to have each day repeated by 1440/dt0 and followed by the next
16+
1017
## CGMS2DayByDay
1118

1219
[ndays = ceiling(as.double(difftime(max(tr), min(tr), units = "days")) + 1)](https://github.com/irinagain/iglu/blob/82e4d1a39901847881d5402d1ac61b3e678d2a5e/R/utils.R#L208) has to be ndays = ceiling(as.double(difftime(max(tr), min(tr), units = "days")))`

notebooks/auc_evaluation.ipynb

Lines changed: 145 additions & 121 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818
},
1919
{
2020
"cell_type": "code",
21-
"execution_count": 2,
21+
"execution_count": 1,
2222
"metadata": {},
2323
"outputs": [],
2424
"source": [
@@ -51,7 +51,7 @@
5151
},
5252
{
5353
"cell_type": "code",
54-
"execution_count": 3,
54+
"execution_count": 2,
5555
"metadata": {},
5656
"outputs": [
5757
{
@@ -153,7 +153,7 @@
153153
},
154154
{
155155
"cell_type": "code",
156-
"execution_count": 4,
156+
"execution_count": null,
157157
"metadata": {},
158158
"outputs": [],
159159
"source": [
@@ -167,7 +167,7 @@
167167
},
168168
{
169169
"cell_type": "code",
170-
"execution_count": 5,
170+
"execution_count": 4,
171171
"metadata": {},
172172
"outputs": [
173173
{
@@ -194,6 +194,104 @@
194194
"print(f\"rpy2 version: {version('rpy2')}\")"
195195
]
196196
},
197+
{
198+
"cell_type": "markdown",
199+
"metadata": {},
200+
"source": [
201+
"## Test on synthetic data\n",
202+
"\n",
203+
"- Samples - every 5 min\n",
204+
"- duration - 1h\n",
205+
"- values [80,120] repeated for sampling duration\n",
206+
"\n",
207+
"Expected hourly AUC = 100 mg.h/dL"
208+
]
209+
},
210+
{
211+
"cell_type": "code",
212+
"execution_count": 5,
213+
"metadata": {},
214+
"outputs": [
215+
{
216+
"data": {
217+
"text/html": [
218+
"<div>\n",
219+
"<style scoped>\n",
220+
" .dataframe tbody tr th:only-of-type {\n",
221+
" vertical-align: middle;\n",
222+
" }\n",
223+
"\n",
224+
" .dataframe tbody tr th {\n",
225+
" vertical-align: top;\n",
226+
" }\n",
227+
"\n",
228+
" .dataframe thead th {\n",
229+
" text-align: right;\n",
230+
" }\n",
231+
"</style>\n",
232+
"<table border=\"1\" class=\"dataframe\">\n",
233+
" <thead>\n",
234+
" <tr style=\"text-align: right;\">\n",
235+
" <th></th>\n",
236+
" <th>id</th>\n",
237+
" <th>hourly_auc</th>\n",
238+
" </tr>\n",
239+
" </thead>\n",
240+
" <tbody>\n",
241+
" <tr>\n",
242+
" <th>1</th>\n",
243+
" <td>subject1</td>\n",
244+
" <td>102.222222</td>\n",
245+
" </tr>\n",
246+
" </tbody>\n",
247+
"</table>\n",
248+
"</div>"
249+
],
250+
"text/plain": [
251+
" id hourly_auc\n",
252+
"1 subject1 102.222222"
253+
]
254+
},
255+
"execution_count": 5,
256+
"metadata": {},
257+
"output_type": "execute_result"
258+
}
259+
],
260+
"source": [
261+
"hours = 1\n",
262+
"dt0 = 5\n",
263+
"samples = int(hours*60/dt0)\n",
264+
"times = pd.date_range('2020-01-01', periods=samples, freq=f\"{dt0}min\")\n",
265+
"glucose_values = [80,120]* int(samples/2)\n",
266+
"\n",
267+
"syntheticdata = pd.DataFrame({\n",
268+
" 'id': ['subject1'] * samples,\n",
269+
" 'time': times,\n",
270+
" 'gl': glucose_values\n",
271+
"})\n",
272+
"\n",
273+
"synthetic_iglu_auc_results = iglu_py.auc(syntheticdata)\n",
274+
"synthetic_iglu_auc_results"
275+
]
276+
},
277+
{
278+
"cell_type": "markdown",
279+
"metadata": {},
280+
"source": [
281+
"**Note:** Incorrect AUC calculation is a result of CGMS2DayByDay function bugs:\n",
282+
"- one sample shift in interpolation - results in 11 samples instead of 12\n",
283+
"- actual_dates returns 2 dates instead of one\n",
284+
"\n",
285+
"Additional suspicious code is in AUC itself: `day = rep(data_ip[[2]], 1440/dt0),` - IMHO it resample sequential gl to different days, instead of sequential sampling for each day before sampling for the next \n"
286+
]
287+
},
288+
{
289+
"cell_type": "markdown",
290+
"metadata": {},
291+
"source": [
292+
"## Test on example data "
293+
]
294+
},
197295
{
198296
"cell_type": "code",
199297
"execution_count": 6,
@@ -280,6 +378,7 @@
280378
}
281379
],
282380
"source": [
381+
"test_data = \"../tests/data/example_data_5_subject.csv\"\n",
283382
"# load test data into DF\n",
284383
"df = pd.read_csv(test_data, index_col=0)\n",
285384
"\n",
@@ -298,12 +397,41 @@
298397
"cell_type": "markdown",
299398
"metadata": {},
300399
"source": [
301-
"Lets try to run AUC on simulated data with easily calculatable AUC"
400+
"## Conclusions \n",
401+
"IGLU AUC calculations are substantially differ from expected ranges suggested by ChatGPT\n"
402+
]
403+
},
404+
{
405+
"cell_type": "markdown",
406+
"metadata": {},
407+
"source": [
408+
"# IGLU_PYTHON results"
409+
]
410+
},
411+
{
412+
"cell_type": "code",
413+
"execution_count": 7,
414+
"metadata": {},
415+
"outputs": [],
416+
"source": [
417+
"# Add project directory to PYTHONPATH\n",
418+
"import os\n",
419+
"import sys\n",
420+
"import pandas as pd\n",
421+
"sys.path.append(os.path.abspath('..'))\n",
422+
"import iglu_python\n"
423+
]
424+
},
425+
{
426+
"cell_type": "markdown",
427+
"metadata": {},
428+
"source": [
429+
"## Test on synthetic data"
302430
]
303431
},
304432
{
305433
"cell_type": "code",
306-
"execution_count": 18,
434+
"execution_count": 8,
307435
"metadata": {},
308436
"outputs": [
309437
{
@@ -333,72 +461,46 @@
333461
" </thead>\n",
334462
" <tbody>\n",
335463
" <tr>\n",
336-
" <th>1</th>\n",
464+
" <th>0</th>\n",
337465
" <td>subject1</td>\n",
338-
" <td>102.222222</td>\n",
466+
" <td>100.0</td>\n",
339467
" </tr>\n",
340468
" </tbody>\n",
341469
"</table>\n",
342470
"</div>"
343471
],
344472
"text/plain": [
345473
" id hourly_auc\n",
346-
"1 subject1 102.222222"
474+
"0 subject1 100.0"
347475
]
348476
},
349-
"execution_count": 18,
477+
"execution_count": 8,
350478
"metadata": {},
351479
"output_type": "execute_result"
352480
}
353481
],
354482
"source": [
355-
"hours = 1\n",
356-
"dt0 = 5\n",
357-
"samples = int(hours*60/dt0)\n",
358-
"times = pd.date_range('2020-01-01', periods=samples, freq=f\"{dt0}min\")\n",
359-
"glucose_values = [80,120]* int(samples/2)\n",
360-
"\n",
361-
"data = pd.DataFrame({\n",
362-
" 'id': ['subject1'] * samples,\n",
363-
" 'time': times,\n",
364-
" 'gl': glucose_values\n",
365-
"})\n",
366-
"\n",
367-
"iglu_auc_results = iglu_py.auc(data)\n",
368-
"iglu_auc_results"
483+
"synthetic_iglu_auc_results = iglu_python.auc(syntheticdata)\n",
484+
"synthetic_iglu_auc_results"
369485
]
370486
},
371487
{
372488
"cell_type": "markdown",
373489
"metadata": {},
374490
"source": [
375-
"## Conclusions \n",
376-
"IGLU AUC calculations are substantially differ from expected ranges suggested by ChatGPT\n"
491+
"**Note:** Result match expected"
377492
]
378493
},
379494
{
380495
"cell_type": "markdown",
381496
"metadata": {},
382497
"source": [
383-
"# IGLU_PYTHON results"
498+
"## Test on Example data"
384499
]
385500
},
386501
{
387502
"cell_type": "code",
388-
"execution_count": 7,
389-
"metadata": {},
390-
"outputs": [],
391-
"source": [
392-
"# Add project directory to PYTHONPATH\n",
393-
"import os\n",
394-
"import sys\n",
395-
"\n",
396-
"sys.path.append(os.path.abspath('..'))"
397-
]
398-
},
399-
{
400-
"cell_type": "code",
401-
"execution_count": 12,
503+
"execution_count": 9,
402504
"metadata": {},
403505
"outputs": [
404506
{
@@ -501,14 +603,9 @@
501603
}
502604
],
503605
"source": [
504-
"import pandas as pd\n",
505-
"\n",
506-
"import iglu_python\n",
507-
"\n",
508606
"# load test data into DF\n",
509607
"df = pd.read_csv(test_data, index_col=0)\n",
510608
"\n",
511-
"iglu_python.IGLU_R_COMPATIBLE = False\n",
512609
"iglu_python_auc_results = iglu_python.auc(df)\n",
513610
"iglu_python_auc_results = iglu_python_auc_results.round(0)\n",
514611
"\n",
@@ -518,88 +615,15 @@
518615
"iglu_python_auc_results['Difference to IGLU(%)'] = ((iglu_python_auc_results['IGLU PYTHON AUC (mg*h/dL)'] - iglu_python_auc_results['IGLU AUC (mg*h/dL)']) / iglu_python_auc_results['IGLU AUC (mg*h/dL)'] * 100).round(1)\n",
519616
"iglu_python_auc_results['Difference to ChatGPt(%)'] = ((iglu_python_auc_results['IGLU PYTHON AUC (mg*h/dL)'] - iglu_python_auc_results['ChatGPT AUC (mg*h/dL)']) / iglu_python_auc_results['ChatGPT AUC (mg*h/dL)'] * 100).round(1)\n",
520617
"\n",
521-
"\n",
522-
"\n",
523-
"display(iglu_python_auc_results)\n",
524-
"\n",
525-
"\n",
526-
"\n"
527-
]
528-
},
529-
{
530-
"cell_type": "code",
531-
"execution_count": 21,
532-
"metadata": {},
533-
"outputs": [
534-
{
535-
"data": {
536-
"text/html": [
537-
"<div>\n",
538-
"<style scoped>\n",
539-
" .dataframe tbody tr th:only-of-type {\n",
540-
" vertical-align: middle;\n",
541-
" }\n",
542-
"\n",
543-
" .dataframe tbody tr th {\n",
544-
" vertical-align: top;\n",
545-
" }\n",
546-
"\n",
547-
" .dataframe thead th {\n",
548-
" text-align: right;\n",
549-
" }\n",
550-
"</style>\n",
551-
"<table border=\"1\" class=\"dataframe\">\n",
552-
" <thead>\n",
553-
" <tr style=\"text-align: right;\">\n",
554-
" <th></th>\n",
555-
" <th>id</th>\n",
556-
" <th>hourly_auc</th>\n",
557-
" </tr>\n",
558-
" </thead>\n",
559-
" <tbody>\n",
560-
" <tr>\n",
561-
" <th>0</th>\n",
562-
" <td>subject1</td>\n",
563-
" <td>100.0</td>\n",
564-
" </tr>\n",
565-
" </tbody>\n",
566-
"</table>\n",
567-
"</div>"
568-
],
569-
"text/plain": [
570-
" id hourly_auc\n",
571-
"0 subject1 100.0"
572-
]
573-
},
574-
"execution_count": 21,
575-
"metadata": {},
576-
"output_type": "execute_result"
577-
}
578-
],
579-
"source": [
580-
"hours = 1\n",
581-
"dt0 = 5\n",
582-
"samples = int(hours*60/dt0)\n",
583-
"times = pd.date_range('2020-01-01', periods=samples, freq=f\"{dt0}min\")\n",
584-
"glucose_values = [80,120]* int(samples/2)\n",
585-
"\n",
586-
"data = pd.DataFrame({\n",
587-
" 'id': ['subject1'] * samples,\n",
588-
" 'time': times,\n",
589-
" 'gl': glucose_values\n",
590-
"})\n",
591-
"\n",
592-
"iglu_python.IGLU_R_COMPATIBLE = True\n",
593-
"iglu_python_auc_results = iglu_python.auc(data)\n",
594-
"iglu_python_auc_results"
618+
"display(iglu_python_auc_results)\n"
595619
]
596620
},
597621
{
598622
"cell_type": "markdown",
599623
"metadata": {},
600624
"source": [
601625
"## Conclusions \n",
602-
"IGLU_PYTHON AUC calculations are close to IGLU calculations (-5%), and closer to suggested by ChatGPT\n",
626+
"IGLU_PYTHON AUC calculations are close to IGLU calculations (-0.5%)\n",
603627
"\n"
604628
]
605629
}

0 commit comments

Comments
 (0)