Skip to content

Commit 3bf0c93

Browse files
reorganise documentation to a access timeseries data focus (#131)
1 parent 7a83e9c commit 3bf0c93

9 files changed

Lines changed: 128 additions & 134 deletions

File tree

docs/support.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
Support
44
=======
55
For bug/issue reports, please submit as issue on `GitHub`_. For questions and feedback contact
6-
us at feedback@4insight.io
6+
us at support@4subsea.com
77

88
.. _GitHub: https://github.com/4subsea/drio-python
99

docs/user_guide/access_data.rst

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
.. py:currentmodule:: datareservoirio
2+
3+
Access time series
4+
==================
5+
6+
Access existing data
7+
-------------------------------------
8+
9+
You can access any data for which you have the ``TimeSeriesId`` (and authorization). It is possible to
10+
query aggregated data directly, e.g. you can query 1 minute average values for a specified period. The finest available aggregation period
11+
is *"tick"* (100 nanoseconds).
12+
13+
14+
.. code-block:: python
15+
16+
# set up
17+
import datareservoirio as drio
18+
import numpy as np
19+
import pandas as pd
20+
21+
auth = drio.Authenticator()
22+
# Follow instructions to authenticate
23+
24+
client = drio.Client(auth)
25+
26+
# Get timeseries data resampled to 15 minutes average for selected period
27+
timeseries = client.get_samples_aggregate(series_id,
28+
start='2024-01-01', end='2024-01-02',
29+
aggregation_period='15m',
30+
aggregation_function='mean')
31+
32+
# Get all data for selected time period
33+
timeseries = client.get_samples_aggregate(series_id,
34+
start='2024-01-01', end='2024-01-02',
35+
aggregation_period='tick',
36+
aggregation_function='mean')
37+
38+
.. note::
39+
40+
:py:meth:`Client.get_samples_aggregate` returns a :py:class:`pandas.Series`. The :py:mod:`start`, :py:mod:`end`, :py:mod:`aggregation_period` and :py:mod:`aggregation_function` parameters are required.
41+
42+
.. important::
43+
44+
Time series data is archived 90 days after the upload. To access archived data directly, you can use
45+
the :py:meth:`Client.get` method, but the data can also be restored by contacting :ref:`support <support>`.
46+
47+
48+
Access archived data
49+
--------------------
50+
You can access time series data using the :py:meth:`Client.get` method, as long as you have
51+
the ``TimeSeriesId`` (and authorization). Note that this method only returns the raw data, and
52+
does not support aggregation. Below is an example demonstrating how to access archived time
53+
series data. We strongly recommended to use the
54+
:py:meth:`Client.get_samples_aggregate` as long as the data was uploaded within the last 90 days,
55+
or contact support to restore it.
56+
57+
.. code-block:: python
58+
59+
# Get entire timeseries
60+
timeseries = client.get(series_id)
61+
62+
# Get a slice of time series
63+
timeseries = client.get(series_id, start='2018-01-01 12:00:00',
64+
end='2018-01-02 06:00:00')
65+
66+
67+
.. warning::
68+
69+
The time resolution of aggregated data is in ticks (1tick = 100 nanoseconds), while the time resolution of non-aggregated data is in nanoseconds. This may lead to discrepancies in data when comparing the two, and some datapoints might get lost when using aggregation to access data, in cases when there are multiple datapoints within the same 100 nanosecond range.
70+
71+
72+
.. tip::
73+
When handling high-frequency data and/or extended timespans, it is crucial to consider memory usage.
74+
Accessing an excessive amount of data at once can cause your script to fail. The following is a recommended approach for accessing data in smaller chunks:
75+
76+
.. code-block:: python
77+
78+
# Make a date iterator
79+
start_end = pd.date_range(start="2020-01-01 00:00", end="2020-02-01 00:00", freq="1H")
80+
start_end_iter = zip(start_end[:-1], start_end[1:])
81+
82+
series_id = <your time series ID>
83+
84+
85+
# Get timeseries in chunks
86+
for start, end in start_end_iter:
87+
timeseries = client.get(series_id, start=start, end=end)
88+
89+
90+
.. _DataReservoir.io: https://www.datareservoir.io/
91+
.. _Pandas: https://pandas.pydata.org/

docs/user_guide/advanced_config.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -129,8 +129,8 @@ The following log names can be used to fine-tune the desired log output:
129129
* datareservoirio: top level module including configuration, authentication and client.
130130
* datareservoirio.storage: storage module, including cache and data download.
131131

132-
If you need a more comprehensive logging solution that captures every interaction with the :py:mod:`requests`, :py:mod:`oauthlib`, and :py:mod:`requests-oauthlib` modules, as well as logging related to :py:mod:`datareservoirio`, you can utilize the
133-
provided code. If you require logging for only one of the specific packages, you may use the pre-existing loggers integrated within :py:mod:`requests`, :py:mod:`oauthlib`, and :py:mod:`requests-oauthlib`.
132+
If you need a more comprehensive logging solution that captures every interaction with the :py:mod:`requests`, :py:mod:`oauthlib`, and :py:mod:`requests-oauthlib` modules, as well as logging related to :py:mod:`datareservoirio`, you can use the code below.
133+
If you require logging for only one of the specific packages, you may use the pre-existing loggers integrated within :py:mod:`requests`, :py:mod:`oauthlib`, and :py:mod:`requests-oauthlib`.
134134

135135
.. code-block:: python
136136

docs/user_guide/cookbook.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ It is really easy to visualize data with `Matplotlib`_:
2828
2929
Save data to file
3030
-----------------
31-
Sometimes you may want to dump data to file (Don't worry, we won't judge you):
31+
Sometimes you may want to dump data to file:
3232

3333
.. code-block:: python
3434
@@ -90,7 +90,7 @@ Work with large amount of data
9090
------------------------------
9191
When working with large data sizes (long time spans and/or high sampling frequency),
9292
it is often useful to download data in chunks and resample so that you don't have
93-
all the data in memory at the same time. Let's see how you can download 6 months of
93+
all the data in memory at the same time. Here's how you can download 6 months of
9494
data and get the 1-hour standard deviation:
9595

9696
.. code-block:: python

docs/user_guide/dos_donts.rst

Lines changed: 0 additions & 49 deletions
This file was deleted.

docs/user_guide/index.rst

Lines changed: 11 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,10 @@ Python API:
66

77
* Manage series (time series or sequences):
88

9-
* Access existing series
10-
* Create and upload new series
11-
* Edit and append to existing series
12-
* Delete series
9+
* Access existing time series
10+
* Create and upload new time series
11+
* Edit and append to existing time series
12+
* Delete time series
1313

1414
* Manage metadata
1515

@@ -45,7 +45,7 @@ enriched with :ref:`metadata <metadata>`.
4545

4646
For time series the index is interpreted as
4747
nanoseconds since epoch (1970-01-01 00:00:00+00:00), i.e. support for
48-
nanosencond resolution.
48+
nanosencond resolution.
4949

5050
:py:class:`pandas.Series` objects maps perfectly to this paradigm.
5151
:py:mod:`datareservoirio` is designed around :py:class:`pandas.Series` as it
@@ -114,13 +114,10 @@ One simple yet effective way of creating a hierarcy and taxonomy is to use
114114
We found that this approach is rather easy to visualize and maps well to the
115115
physical world.
116116

117-
What is it **NOT** for
117+
What is it NOT for
118118
______________________
119-
Despite its flexibility, `DataReservoir.io`_ has its limitations when it comes
120-
to metadata; it is **NOT** a general purpose database that you can dump
121-
anything in to and it is not designed to keep track of complex hierarchical
122-
information. The query capabilities are also kept simple and efficient by
123-
design.
119+
Despite its flexibility, `DataReservoir.io`_ has limitations regarding metadata. It is **NOT** a general-purpose database for storing arbitrary data,
120+
nor is it designed to manage complex hierarchical information. Additionally, its query capabilities are intentionally kept simple and efficient.
124121

125122
For very advanced use cases, it may be advisable to employ a purpose built
126123
database solution (that compliments `DataReservoir.io`_ for your application).
@@ -133,9 +130,10 @@ database solution (that compliments `DataReservoir.io`_ for your application).
133130
:maxdepth: 2
134131
:hidden:
135132

136-
manage_series
133+
134+
access_data
135+
update_edit_data
137136
manage_metadata
138137
browse_search
139-
dos_donts
140138
advanced_config
141139
cookbook

docs/user_guide/manage_metadata.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
Manage metadata
22
================
3-
Similar to series, you can add, update, and delete metadata. In addition, you
3+
Similar to time series, you can add, update, and delete metadata. In addition, you
44
can assign a metadata entry to one or more series.For a more comprehensive
55
description of the metadata feature, see `docs.4insight.io`_
66

Lines changed: 19 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,12 @@
11
.. py:currentmodule:: datareservoirio
22
3-
Manage series
4-
=============
3+
Create, edit and delete time series
4+
===================================
55

6-
Store series data
7-
-----------------
6+
Create time series
7+
------------------
8+
9+
Below is an example of how to create a simple time series.
810

911
.. code-block:: python
1012
@@ -59,14 +61,15 @@ Store sequence:
5961
series = pd.Series(np.random.rand(10), index=np.arange(10))
6062
response = client.create(series)
6163
62-
63-
Edit and append data
64-
--------------------
65-
66-
You can always append new data to an existing time series (and sequence).
64+
Edit time series
65+
----------------
66+
You can append new data to an existing time series (and sequence).
6767
However, any overlappinging indicies will result in overwrite/edit of existing
6868
data:
6969

70+
.. _DataReservoir.io: https://www.datareservoir.io/
71+
.. _Pandas: https://pandas.pydata.org/
72+
7073
.. code-block:: python
7174
7275
dt_index = pd.date_range('2018-01-02 00:00:00', periods=10, freq='6H')
@@ -105,67 +108,18 @@ be made available on the series.
105108
should not pass the server-side validation the data will be ignored.
106109

107110

108-
Access existing data
109-
--------------------
110-
111-
You can access any data you have ``TimeSeriesId`` (and authorization) for:
112-
113-
.. code-block:: python
114-
115-
# Get entire timeseries
116-
timeseries = client.get(series_id)
117-
118-
# Get a slice of time series
119-
timeseries = client.get(series_id, start='2018-01-01 12:00:00',
120-
end='2018-01-02 06:00:00')
121-
122-
# Get a sequence
123-
sequence = client.get(series_id, convert_date=False)
124-
125-
.. note::
126-
127-
:py:meth:`Client.get` returns :py:class:`pandas.Series`.
128-
129-
130-
Access existing data with aggregation
131-
-------------------------------------
132-
133-
You can also access any data you have ``TimeSeriesId`` (and authorization) for with applied aggregation using:
134-
135-
.. code-block:: python
136-
137-
# Get entire timeseries
138-
timeseries = client.get_samples_aggregate(series_id, start='2024-01-01',
139-
end='2024-01-02', aggregation_period='15m',
140-
aggregation_function='mean')
141-
142-
.. note::
143-
144-
:py:meth:`Client.get_samples_aggregate` also returns :py:class:`pandas.Series`. The :py:mod:`start`, :py:mod:`end`, :py:mod:`aggregation_period` and :py:mod:`aggregation_function` parameters are required.
145-
146-
.. important::
147-
148-
Retrieving aggregated data is available only for the last 90 days.
149-
150-
.. warning::
151-
152-
The time resolution of aggregated data is in ticks (1tick = 100 nanoseconds), while the time resolution of non-aggregated data is in nanoseconds. This may lead to discrepancies in data when comparing the two, and some datapoints might get lost when using aggregation to access data, in cases when there are multiple datapoints within the same 100 nanosecond range.
153-
154111
Delete data
155112
-----------
156113

157-
Note that deleting data is permanent and all references to ``TimeSerieId``
158-
is removed from the `DataReservoir.io`_ inventory:
159-
160-
.. code-block:: python
161-
162-
client.delete(series_id)
114+
It is only possible to delete an entire time series. Deleting a single datapoint
115+
is not supported.
163116

117+
.. danger::
164118

119+
Note that deleting data is permanent and all references to ``TimeSerieId``
120+
is removed from the `DataReservoir.io`_ inventory.
165121

166-
.. _DataReservoir.io: https://www.datareservoir.io/
167-
.. _Pandas: https://pandas.pydata.org/
168-
169-
122+
.. code-block:: python
170123
124+
client.delete(series_id)
171125

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ dependencies = [
2929
"requests-oauthlib",
3030
"importlib_resources",
3131
"opencensus-ext-azure",
32-
"tenacity",
32+
"tenacity<8.5",
3333
"urllib3 > 2",
3434
"tqdm",
3535
"numpy < 2"

0 commit comments

Comments
 (0)