.. SPDX-FileCopyrightText: 2024 Jiri Vlasak
..
.. SPDX-License-Identifier: CC-BY-SA-4.0

API: Endpoints
==============

This document discusses the interface to the database for clients. We
discuss the *procedure names* and `data transfer objects (DTO)`_. We
also discuss the endpoints, i.e., the locations where the procedures are
called.

.. _data transfer objects (DTO): https://en.wikipedia.org/wiki/Data_transfer_object

- repository: https://git.sr.ht/~qeef/dd-hot-tm
- documentation: https://dd-hot-tm.tojemoje.site/

.. contents::
   :local:

.. include:: problem.rst


The purpose of the endpoints
----------------------------

The endpoints provide to clients the places *where* the communication
with clients happens. *How* the communication happens is given by the
interface, i.e., the procedure names and DTOs related to the endpoints.

We understand the clients as whatever is used by mappers to request the
server -- a web page, JOSM editor, or mapper's script.

We aim on the communication between clients and the server via `HTTP`_,
using `JSON`_ to encode the values of DTOs. This is de facto standard
for client-server communication in the web environment and leveraging
other (or more) technologies unnecessarily increase technical debt.

.. _HTTP: https://en.wikipedia.org/wiki/HTTP
.. _JSON: https://en.wikipedia.org/wiki/JSON

Well-defined constraints on *where* and *how* the communication happens
simplify the implementation of the clients.


The interface to endpoints (how)
--------------------------------

The interface consists of procedure names and DTOs. The procedure names
are given by the `HTTP`_: HTTP uses ``GET`` to retrieve information,
``POST`` to send new information, ``PUT`` to update the existing
information, and ``DELETE`` to destroy the existing information
available at the endpoint.

The DTOs concretize the request, providing additional information as
identifier or a reason of change.


The endpoints (where)
---------------------

The endpoints specify the places or resources or objects that can be
retrieved (``GET``), created (``POST``), changed (``PUT``), and/or
deleted (``DELETE``).

The endpoints do not necessarily need to reflect the database -- they
serve the different purpose of looking friendly to the clients.

So, at the end, in the API design, we are interested in the endpoints
and their interface with aim on the simplicity of the communication
about the `mapping workflow`_ between the server and clients.


Non-goals
---------

This document *does not* cover all endpoints HOT TM uses. This document
covers only endpoints related to tasks and the transitions between the
task states. It is expected there are similar documents covering other
endpoints related to the other parts of HOT TM as groups or campaigns,
and that there is another document that puts all these parts together
and introduces overall endpoints schema.

This document *does not* cover task issues or annotations. The document
aims solely on the main function of tasks within the HOT TM and keeping
the tasks history.

This document *does not* introduce production-ready endpoints and
corresponding interface.

Authentication and authorization is out of scope of this document.


Balance between endpoints and DTOs
----------------------------------

Because we limit ourselves to HTTP, one half of the interface is given
(``GET``, ``POST``, ``PUT``, and ``DELETE``.) What left are endpoints
(*where* the communication happens) and DTOs (*how* the communication
happens). We try to find a balance between the two.


Endpoints extreme
^^^^^^^^^^^^^^^^^

When there is endpoint for everything, we call it *endpoints extreme*.
`HOT TM API`_ is close. Paraphrasing HOT TM API, there are tasks-related
endpoints accepting ``POST``:

``/project/{pid}/task/{tid}/map``
    Expect the task ``tid`` of the project ``pid`` to be in *unlocked to
    map* state, changing the task into the *locked for mapping* state.

``/project/{pid}/task/{tid}/finish``
    Expect the task ``tid`` of the project ``pid`` to be in *locked for
    mapping* state, changing the task into the *unlocked to check*
    state.

``/project/{pid}/task/{tid}/check``
    Expect the task ``tid`` of the project ``pid`` to be in *unlocked to
    check* state, changing the task into the *locked for checking*
    state.

``/project/{pid}/task/{tid}/good``
    Expect the task ``tid`` of the project ``pid`` to be in *locked for
    checking* state, changing the task into the *unlocked done* state.

``/project/{pid}/task/{tid}/bad``
    Expect the task ``tid`` of the project ``pid`` to be in *locked for
    checking* state, changing the task into the *unlocked to map* state.

and ``GET``:

``/project/{pid}/tasks/states``
    Retrieve the state of all tasks of the project ``pid``.

``/project/{pid}/task/{tid}/state``
    Retrieve the state of the task ``tid`` of the project ``pid``.

Where ``pid`` is project identifier and ``tid`` is task identifier. In
such a case, there is little to no information transferred in DTOs.

.. _HOT TM API: https://tasks.hotosm.org/api-docs


DTOs extreme
^^^^^^^^^^^^

When there is a single endpoint for everything, we call it *DTOs
extreme*, because all the information is encoded in DTOs:

``/whatever``
    Accepts ``POST`` and ``GET``.

    For ``POST``, the DTOs must always contain ``pid`` -- the project
    identifier, ``tid`` -- the task identifier, and ``action``, where
    ``action`` can be *map*, *finish*, *check*, *good*, or *bad*.

    For ``GET``, the DTO must always contain ``pid`` -- the project
    identifier. However, the ``GET`` does not have a body as ``POST``
    has, so there is no place where to put the values of the DTO. To
    keep the "DTOs extreme" approach, we need to encode the DTO's values
    in the URL of the endpoint, i.e., ``/whatever?pid={pid}``.

    (Please, note that ``/whatever?pid={pid}`` indeed is different from
    the ``/whatever/{pid}``, because the former is understood as
    ``/whatever`` path with ``pid={pid}`` query, but the latter only as
    ``/whatever/{pid}`` path by the `URL syntax`_.)

    When there is no ``tid`` -- the task identifier -- in the DTO (i.e.,
    in the query part of the URL,) it is expected that the client
    requests the state of all the tasks of the project ``pid``. If
    ``tid`` is specified within the DTO, the state of the task ``tid``
    of the project ``pid`` is sent back.

.. _URL syntax: https://en.wikipedia.org/wiki/URL#Syntax


Finding the balance
^^^^^^^^^^^^^^^^^^^

We can see that both extreme approaches suffer from the scalability
issues:

- An example for *Endpoints extreme* is extending the workflow with the
  *reset* action that changes the state of the task ``tid`` of the
  project ``pid`` from *unlocked done* to *unlocked to map* -- new
  endpoint needs to be introduced.

- An example for *DTOs extreme* is whatever extension of ``GET``
  request, which is already enough cumbersome in the *DTOs extreme*
  example.  (Encoding DTO values in the query part of the URL make sense
  for small number of parameters like when using pagination. It is not
  scalable.)

When finding the balance between endpoints and DTOs, we aim on the
simplicity of the implementation in the clients. Having the right
balance between endpoints and DTOs improves the scalability and overall
maintainability of the code base.

Our endpoints draft is based on the terminology used at the beginning of
this design document -- *Project*, *Task*, and *Action*.

The *endpoint path*, where path reflects the path of the `URL syntax`_,
consists of the *endpoint parts*. The convention for an *endpoint part*
is to use plural, like ``.../projects``, for endpoints representing a
list of objects, and singular followed by an identifier, like
``.../project/{pid}``, for endpoints representing particular object,
where ``...`` may be zero or more *endpoint parts*, like
``.../project/{pid}/tasks`` or ``.../project/{pid}/task/{tid}``.

The values between ``{`` and ``}`` are the identifiers of the objects.

Considering `mapping workflow`_ and targeting load testing, our API
proposal consists of the following endpoints and DTOs:

``/project/{pid}/tasks``
    Accepts ``GET``, returns the list of tasks and their corresponding
    states.

    An example of returned DTO is::

        [
            {
                "pid": 1,
                "tid": 1,
                "state": "unlocked to map",
            },
            {
                "pid": 1,
                "tid": 2,
                "state": "locked for mapping",
            },
            ...
            {
                "pid": 1,
                "tid": 1000,
                "state": "unlocked to map",
            }
        ]

``/project/{pid}/task/{tid}``
    Accepts ``GET``, returns the task and its corresponding state.

    An example of returned DTO is::

        {
            "pid": 1,
            "tid": 1,
            "state": "unlocked to map",
        }

``/project/{pid}/actions``
    Accepts ``POST``, returns the task and its boundary.

    An example of the DTO from a client to the backend that requests
    mapping of a random task::

        {
            "what": "map"
        }

    An example of the DTO with the reply from the backend to the
    client::

        {
            "pid": 1,
            "tid": 22,
            "geometry": {"some": "geom"}
        }

    An example of the DTO from a client to the backend that requests
    finishing the task, where ``tid`` must be specified::

        {
            "what": "finish",
            "tid": 22
        }

Addressing the scalability issues of the `Endpoints extreme`_, extending
the workflow means adding the support for::

    {
        "what": "reset",
        "tid": 22
    }

DTO in the ``/project/{pid}/actions`` endpoint.

Addressing the scalability issues of the `DTOs extreme`_, the hierarchy
of the exposed objects is leveraged, e.g., having *project* with *tasks*
and *actions* leads to ``/projects/{pid}/tasks`` and
``/project/{pid}/actions`` endpoints.


Load testing
------------

The ``load_test.py`` module is in the repository root. Its documentation
follows.

.. automodule:: load_test

----

We compare how the ``balanced`` API works for :ref:`almost tm admin` and
:ref:`actions history` database schemas. We run the "average" load
testing for one hour for both of them. We also stressed the server with
"extreme" load testing for ten minutes for both of them.

:ref:`almost tm admin`

- for average load test (wait 30 to 60 seconds between requests):

    - average response time is ``21.47`` ms
    - 100%ile response time is ``170`` ms
    - #requests ``8664``
    - see `full report <_static/average-1h-almost_tm_admin.html>`_

- for extreme load test (wait 1 to 2.5 seconds between requests):

    - average response time is ``27.89`` ms (v2 ``23.27`` ms)
    - 100%ile response time is ``3100`` ms (v2 ``2000`` ms)
    - #requests ``35711`` (v2 ``34657``)
    - see `full report <_static/extreme-10min-almost_tm_admin.html>`_
    - see `full report v2 <_static/extreme-10min-almost_tm_admin-v2.html>`_

:ref:`actions history`

- for average load test (wait 30 to 60 seconds between requests):

    - average response time is ``26.2`` ms
    - 100%ile response time is ``180`` ms
    - #requests ``8370``
    - see `full report <_static/average-1h-actions_history.html>`_

- for extreme load test (wait 1 to 2.5 seconds between requests):

    - average response time is ``22.45`` ms (``22.24`` ms)
    - 100%ile response time is ``170`` ms (``320`` ms)
    - #requests ``35578`` (v2 ``36405``)
    - see `full report <_static/extreme-10min-actions_history.html>`_
    - see `full report v2 <_static/extreme-10min-actions_history-v2.html>`_


Conclusion
----------

We conducted load testing experiments in order to stress different
database designs in an environment of many (simulated) mappers.
Recalling the :ref:`backend diagram`, our code base consists solely of
*API* endpoints and *Database*. In real application, three dots ``...``
from :ref:`backend diagram` would be a strata connecting API with
database. We can afford simplification, because a lot of functionality
is currently out of scope these documents.

When we study the results of load testing, there is not much difference
between the :ref:`almost tm admin` and :ref:`actions history`. This is
interesting in relevance to the database experiments described in
:ref:`database experiments`.

Only difference observed is that *actions history* looks more stable.
"Extreme" load test had to be conducted to find this.

However, we need to be aware of that we tested limited functionality.
For example, retrieving the history of the actions per project was not
tested. The functionality like this should be described in the load
test, implemented, and the implementation performance measured.