reorganise documentation to a access timeseries data focus (#131)

hanne-opseth-rygg-4ss · web-flow · commit 3bf0c93ce255 · 2024-09-06T15:41:04.000+02:00
diff --git a/docs/support.rst b/docs/support.rst
@@ -3,7 +3,7 @@
 Support
 =======
 For bug/issue reports, please submit as issue on `GitHub`_. For questions and feedback contact
-us at feedback@4insight.io
+us at support@4subsea.com
 
 .. _GitHub: https://github.com/4subsea/drio-python
 
diff --git a/docs/user_guide/access_data.rst b/docs/user_guide/access_data.rst
@@ -0,0 +1,91 @@
+.. py:currentmodule:: datareservoirio
+
+Access time series
+==================
+
+Access existing data
+-------------------------------------
+
+You can access any data for which you have the ``TimeSeriesId`` (and authorization). It is possible to 
+query aggregated data directly, e.g. you can query 1 minute average values for a specified period. The finest available aggregation period
+is *"tick"* (100 nanoseconds). 
+
+
+.. code-block:: python
+    
+    # set up
+    import datareservoirio as drio
+    import numpy as np
+    import pandas as pd
+
+    auth = drio.Authenticator()
+    # Follow instructions to authenticate
+
+    client = drio.Client(auth)
+
+    # Get timeseries data resampled to 15 minutes average for selected period
+    timeseries = client.get_samples_aggregate(series_id, 
+                            start='2024-01-01', end='2024-01-02', 
+                            aggregation_period='15m',
+                            aggregation_function='mean')
+
+    # Get all data for selected time period
+    timeseries = client.get_samples_aggregate(series_id, 
+                            start='2024-01-01', end='2024-01-02', 
+                            aggregation_period='tick',
+                            aggregation_function='mean')
+
+.. note::
+
+    :py:meth:`Client.get_samples_aggregate` returns a :py:class:`pandas.Series`. The :py:mod:`start`, :py:mod:`end`, :py:mod:`aggregation_period` and :py:mod:`aggregation_function` parameters are required.   
+
+.. important::
+
+    Time series data is archived 90 days after the upload. To access archived data directly, you can use 
+    the :py:meth:`Client.get` method, but the data can also be restored by contacting :ref:`support <support>`.
+
+
+Access archived data
+--------------------
+You can access time series data using the :py:meth:`Client.get` method, as long as you have 
+the ``TimeSeriesId`` (and authorization). Note that this method only returns the raw data, and 
+does not support aggregation. Below is an example demonstrating how to access archived time 
+series data. We strongly recommended to use the 
+:py:meth:`Client.get_samples_aggregate` as long as the data was uploaded within the last 90 days,
+or contact support to restore it.
+
+.. code-block:: python
+
+    # Get entire timeseries
+    timeseries = client.get(series_id)
+
+    # Get a slice of time series
+    timeseries = client.get(series_id, start='2018-01-01 12:00:00',
+                            end='2018-01-02 06:00:00')
+
+
+.. warning::
+
+    The time resolution of aggregated data is in ticks (1tick = 100 nanoseconds), while the time resolution of non-aggregated data is in nanoseconds. This may lead to discrepancies in data when comparing the two, and some datapoints might get lost when using aggregation to access data, in cases when there are multiple datapoints within the same 100 nanosecond range.
+
+
+.. tip::
+    When handling high-frequency data and/or extended timespans, it is crucial to consider memory usage. 
+    Accessing an excessive amount of data at once can cause your script to fail. The following is a recommended approach for accessing data in smaller chunks:
+
+    .. code-block:: python
+
+        # Make a date iterator
+        start_end = pd.date_range(start="2020-01-01 00:00", end="2020-02-01 00:00", freq="1H")
+        start_end_iter = zip(start_end[:-1], start_end[1:])
+
+        series_id = <your time series ID>
+
+
+        # Get timeseries in chunks
+        for start, end in start_end_iter:
+            timeseries = client.get(series_id, start=start, end=end)
+
+
+.. _DataReservoir.io: https://www.datareservoir.io/
+.. _Pandas: https://pandas.pydata.org/
diff --git a/docs/user_guide/advanced_config.rst b/docs/user_guide/advanced_config.rst
@@ -129,8 +129,8 @@ The following log names can be used to fine-tune the desired log output:
 * datareservoirio: top level module including configuration, authentication and client.
 * datareservoirio.storage: storage module, including cache and data download.
 
-If you need a more comprehensive logging solution that captures every interaction with the :py:mod:`requests`, :py:mod:`oauthlib`, and :py:mod:`requests-oauthlib` modules, as well as logging related to :py:mod:`datareservoirio`, you can utilize the 
-provided code. If you require logging for only one of the specific packages, you may use the pre-existing loggers integrated within :py:mod:`requests`, :py:mod:`oauthlib`, and :py:mod:`requests-oauthlib`.
+If you need a more comprehensive logging solution that captures every interaction with the :py:mod:`requests`, :py:mod:`oauthlib`, and :py:mod:`requests-oauthlib` modules, as well as logging related to :py:mod:`datareservoirio`, you can use the code below. 
+If you require logging for only one of the specific packages, you may use the pre-existing loggers integrated within :py:mod:`requests`, :py:mod:`oauthlib`, and :py:mod:`requests-oauthlib`.
 
 .. code-block:: python
 
diff --git a/docs/user_guide/cookbook.rst b/docs/user_guide/cookbook.rst
@@ -28,7 +28,7 @@ It is really easy to visualize data with `Matplotlib`_:
 
 Save data to file
 -----------------
-Sometimes you may want to dump data to file (Don't worry, we won't judge you):
+Sometimes you may want to dump data to file:
 
 .. code-block:: python
 
@@ -90,7 +90,7 @@ Work with large amount of data
 ------------------------------
 When working with large data sizes (long time spans and/or high sampling frequency),
 it is often useful to download data in chunks and resample so that you don't have
-all the data in memory at the same time. Let's see how you can download 6 months of
+all the data in memory at the same time. Here's how you can download 6 months of
 data and get the 1-hour standard deviation:
 
 .. code-block:: python
diff --git a/docs/user_guide/dos_donts.rst b/docs/user_guide/dos_donts.rst
diff --git a/docs/user_guide/index.rst b/docs/user_guide/index.rst
@@ -6,10 +6,10 @@ Python API:
 
 * Manage series (time series or sequences):
 
-    * Access existing series
-    * Create and upload new series
-    * Edit and append to existing series
-    * Delete series
+    * Access existing time series
+    * Create and upload new time series
+    * Edit and append to existing time series
+    * Delete time series
 
 * Manage metadata
 
@@ -45,7 +45,7 @@ enriched with :ref:`metadata <metadata>`.
 
 For time series the index is interpreted as
 nanoseconds since epoch (1970-01-01 00:00:00+00:00), i.e. support for
-nanosencond resolution.
+nanosencond resolution. 
 
 :py:class:`pandas.Series` objects maps perfectly to this paradigm.
 :py:mod:`datareservoirio` is designed around :py:class:`pandas.Series` as it
@@ -114,13 +114,10 @@ One simple yet effective way of creating a hierarcy and taxonomy is to use
 We found that this approach is rather easy to visualize and maps well to the
 physical world.
 
-What is it **NOT** for
+What is it NOT for
 ______________________
-Despite its flexibility, `DataReservoir.io`_ has its limitations when it comes
-to metadata; it is **NOT** a general purpose database that you can dump
-anything in to and it is not designed to keep track of complex hierarchical
-information. The query capabilities are also kept simple and efficient by
-design.
+Despite its flexibility, `DataReservoir.io`_ has limitations regarding metadata. It is **NOT** a general-purpose database for storing arbitrary data, 
+nor is it designed to manage complex hierarchical information. Additionally, its query capabilities are intentionally kept simple and efficient.
 
 For very advanced use cases, it may be advisable to employ a purpose built
 database solution (that compliments `DataReservoir.io`_ for your application).
@@ -133,9 +130,10 @@ database solution (that compliments `DataReservoir.io`_ for your application).
    :maxdepth: 2
    :hidden:
 
-   manage_series
+
+   access_data
+   update_edit_data
    manage_metadata
    browse_search
-   dos_donts
    advanced_config
    cookbook
diff --git a/docs/user_guide/manage_metadata.rst b/docs/user_guide/manage_metadata.rst
@@ -1,6 +1,6 @@
 Manage metadata
 ================
-Similar to series, you can add, update, and delete metadata. In addition, you
+Similar to time series, you can add, update, and delete metadata. In addition, you
 can assign a metadata entry to one or more series.For a more comprehensive 
 description of the metadata feature, see `docs.4insight.io`_
 
diff --git a/docs/user_guide/update_edit_data.rst b/docs/user_guide/update_edit_data.rst
@@ -1,10 +1,12 @@
 .. py:currentmodule:: datareservoirio
 
-Manage series
-=============
+Create, edit and delete time series
+===================================
 
-Store series data
------------------
+Create time series
+------------------
+
+Below is an example of how to create a simple time series.
 
 .. code-block:: python
 
@@ -59,14 +61,15 @@ Store sequence:
     series = pd.Series(np.random.rand(10), index=np.arange(10))
     response = client.create(series)
 
-
-Edit and append data
---------------------
-
-You can always append new data to an existing time series (and sequence).
+Edit time series
+----------------
+You can append new data to an existing time series (and sequence).
 However, any overlappinging indicies will result in overwrite/edit of existing
 data:
 
+.. _DataReservoir.io: https://www.datareservoir.io/
+.. _Pandas: https://pandas.pydata.org/
+
 .. code-block:: python
 
     dt_index = pd.date_range('2018-01-02 00:00:00', periods=10, freq='6H')
@@ -105,67 +108,18 @@ be made available on the series.
     should not pass the server-side validation the data will be ignored.
 
 
-Access existing data
---------------------
-
-You can access any data you have ``TimeSeriesId`` (and authorization) for:
-
-.. code-block:: python
-
-    # Get entire timeseries
-    timeseries = client.get(series_id)
-
-    # Get a slice of time series
-    timeseries = client.get(series_id, start='2018-01-01 12:00:00',
-                            end='2018-01-02 06:00:00')
-
-    # Get a sequence
-    sequence = client.get(series_id, convert_date=False)
-
-.. note::
-
-    :py:meth:`Client.get` returns :py:class:`pandas.Series`.
-
-
-Access existing data with aggregation
--------------------------------------
-
-You can also access any data you have ``TimeSeriesId`` (and authorization) for with applied aggregation using:
-
-.. code-block:: python
-
-    # Get entire timeseries
-    timeseries = client.get_samples_aggregate(series_id, start='2024-01-01',
-                            end='2024-01-02', aggregation_period='15m',
-                            aggregation_function='mean')
-
-.. note::
-
-    :py:meth:`Client.get_samples_aggregate` also returns :py:class:`pandas.Series`. The :py:mod:`start`, :py:mod:`end`, :py:mod:`aggregation_period` and :py:mod:`aggregation_function` parameters are required.   
-
-.. important::
-
-    Retrieving aggregated data is available only for the last 90 days.
-
-.. warning::
-
-    The time resolution of aggregated data is in ticks (1tick = 100 nanoseconds), while the time resolution of non-aggregated data is in nanoseconds. This may lead to discrepancies in data when comparing the two, and some datapoints might get lost when using aggregation to access data, in cases when there are multiple datapoints within the same 100 nanosecond range.
-    
 Delete data
 -----------
 
-Note that deleting data is permanent and all references to ``TimeSerieId``
-is removed from the `DataReservoir.io`_ inventory:
-
-.. code-block:: python
-
-    client.delete(series_id)
+It is only possible to delete an entire time series. Deleting a single datapoint 
+is not supported.
 
+.. danger::
 
+    Note that deleting data is permanent and all references to ``TimeSerieId``
+    is removed from the `DataReservoir.io`_ inventory.
 
-.. _DataReservoir.io: https://www.datareservoir.io/
-.. _Pandas: https://pandas.pydata.org/
-
-
+.. code-block:: python
 
+    client.delete(series_id)
 
diff --git a/pyproject.toml b/pyproject.toml
@@ -29,7 +29,7 @@ dependencies = [
     "requests-oauthlib",
     "importlib_resources",
     "opencensus-ext-azure",
-    "tenacity",
+    "tenacity<8.5",
     "urllib3 > 2",
     "tqdm",
     "numpy < 2"