.. SPDX-FileCopyrightText: 2024 Jiri Vlasak .. .. SPDX-License-Identifier: CC-BY-SA-4.0 API: Endpoints ============== This document discusses the interface to the database for clients. We discuss the *procedure names* and `data transfer objects (DTO)`_. We also discuss the endpoints, i.e., the locations where the procedures are called. .. _data transfer objects (DTO): https://en.wikipedia.org/wiki/Data_transfer_object - repository: https://git.sr.ht/~qeef/dd-hot-tm - documentation: https://dd-hot-tm.tojemoje.site/ .. contents:: :local: .. include:: problem.rst The purpose of the endpoints ---------------------------- The endpoints provide to clients the places *where* the communication with clients happens. *How* the communication happens is given by the interface, i.e., the procedure names and DTOs related to the endpoints. We understand the clients as whatever is used by mappers to request the server -- a web page, JOSM editor, or mapper's script. We aim on the communication between clients and the server via `HTTP`_, using `JSON`_ to encode the values of DTOs. This is de facto standard for client-server communication in the web environment and leveraging other (or more) technologies unnecessarily increase technical debt. .. _HTTP: https://en.wikipedia.org/wiki/HTTP .. _JSON: https://en.wikipedia.org/wiki/JSON Well-defined constraints on *where* and *how* the communication happens simplify the implementation of the clients. The interface to endpoints (how) -------------------------------- The interface consists of procedure names and DTOs. The procedure names are given by the `HTTP`_: HTTP uses ``GET`` to retrieve information, ``POST`` to send new information, ``PUT`` to update the existing information, and ``DELETE`` to destroy the existing information available at the endpoint. The DTOs concretize the request, providing additional information as identifier or a reason of change. The endpoints (where) --------------------- The endpoints specify the places or resources or objects that can be retrieved (``GET``), created (``POST``), changed (``PUT``), and/or deleted (``DELETE``). The endpoints do not necessarily need to reflect the database -- they serve the different purpose of looking friendly to the clients. So, at the end, in the API design, we are interested in the endpoints and their interface with aim on the simplicity of the communication about the `mapping workflow`_ between the server and clients. Non-goals --------- This document *does not* cover all endpoints HOT TM uses. This document covers only endpoints related to tasks and the transitions between the task states. It is expected there are similar documents covering other endpoints related to the other parts of HOT TM as groups or campaigns, and that there is another document that puts all these parts together and introduces overall endpoints schema. This document *does not* cover task issues or annotations. The document aims solely on the main function of tasks within the HOT TM and keeping the tasks history. This document *does not* introduce production-ready endpoints and corresponding interface. Authentication and authorization is out of scope of this document. Balance between endpoints and DTOs ---------------------------------- Because we limit ourselves to HTTP, one half of the interface is given (``GET``, ``POST``, ``PUT``, and ``DELETE``.) What left are endpoints (*where* the communication happens) and DTOs (*how* the communication happens). We try to find a balance between the two. Endpoints extreme ^^^^^^^^^^^^^^^^^ When there is endpoint for everything, we call it *endpoints extreme*. `HOT TM API`_ is close. Paraphrasing HOT TM API, there are tasks-related endpoints accepting ``POST``: ``/project/{pid}/task/{tid}/map`` Expect the task ``tid`` of the project ``pid`` to be in *unlocked to map* state, changing the task into the *locked for mapping* state. ``/project/{pid}/task/{tid}/finish`` Expect the task ``tid`` of the project ``pid`` to be in *locked for mapping* state, changing the task into the *unlocked to check* state. ``/project/{pid}/task/{tid}/check`` Expect the task ``tid`` of the project ``pid`` to be in *unlocked to check* state, changing the task into the *locked for checking* state. ``/project/{pid}/task/{tid}/good`` Expect the task ``tid`` of the project ``pid`` to be in *locked for checking* state, changing the task into the *unlocked done* state. ``/project/{pid}/task/{tid}/bad`` Expect the task ``tid`` of the project ``pid`` to be in *locked for checking* state, changing the task into the *unlocked to map* state. and ``GET``: ``/project/{pid}/tasks/states`` Retrieve the state of all tasks of the project ``pid``. ``/project/{pid}/task/{tid}/state`` Retrieve the state of the task ``tid`` of the project ``pid``. Where ``pid`` is project identifier and ``tid`` is task identifier. In such a case, there is little to no information transferred in DTOs. .. _HOT TM API: https://tasks.hotosm.org/api-docs DTOs extreme ^^^^^^^^^^^^ When there is a single endpoint for everything, we call it *DTOs extreme*, because all the information is encoded in DTOs: ``/whatever`` Accepts ``POST`` and ``GET``. For ``POST``, the DTOs must always contain ``pid`` -- the project identifier, ``tid`` -- the task identifier, and ``action``, where ``action`` can be *map*, *finish*, *check*, *good*, or *bad*. For ``GET``, the DTO must always contain ``pid`` -- the project identifier. However, the ``GET`` does not have a body as ``POST`` has, so there is no place where to put the values of the DTO. To keep the "DTOs extreme" approach, we need to encode the DTO's values in the URL of the endpoint, i.e., ``/whatever?pid={pid}``. (Please, note that ``/whatever?pid={pid}`` indeed is different from the ``/whatever/{pid}``, because the former is understood as ``/whatever`` path with ``pid={pid}`` query, but the latter only as ``/whatever/{pid}`` path by the `URL syntax`_.) When there is no ``tid`` -- the task identifier -- in the DTO (i.e., in the query part of the URL,) it is expected that the client requests the state of all the tasks of the project ``pid``. If ``tid`` is specified within the DTO, the state of the task ``tid`` of the project ``pid`` is sent back. .. _URL syntax: https://en.wikipedia.org/wiki/URL#Syntax Finding the balance ^^^^^^^^^^^^^^^^^^^ We can see that both extreme approaches suffer from the scalability issues: - An example for *Endpoints extreme* is extending the workflow with the *reset* action that changes the state of the task ``tid`` of the project ``pid`` from *unlocked done* to *unlocked to map* -- new endpoint needs to be introduced. - An example for *DTOs extreme* is whatever extension of ``GET`` request, which is already enough cumbersome in the *DTOs extreme* example. (Encoding DTO values in the query part of the URL make sense for small number of parameters like when using pagination. It is not scalable.) When finding the balance between endpoints and DTOs, we aim on the simplicity of the implementation in the clients. Having the right balance between endpoints and DTOs improves the scalability and overall maintainability of the code base. Our endpoints draft is based on the terminology used at the beginning of this design document -- *Project*, *Task*, and *Action*. The *endpoint path*, where path reflects the path of the `URL syntax`_, consists of the *endpoint parts*. The convention for an *endpoint part* is to use plural, like ``.../projects``, for endpoints representing a list of objects, and singular followed by an identifier, like ``.../project/{pid}``, for endpoints representing particular object, where ``...`` may be zero or more *endpoint parts*, like ``.../project/{pid}/tasks`` or ``.../project/{pid}/task/{tid}``. The values between ``{`` and ``}`` are the identifiers of the objects. Considering `mapping workflow`_ and targeting load testing, our API proposal consists of the following endpoints and DTOs: ``/project/{pid}/tasks`` Accepts ``GET``, returns the list of tasks and their corresponding states. An example of returned DTO is:: [ { "pid": 1, "tid": 1, "state": "unlocked to map", }, { "pid": 1, "tid": 2, "state": "locked for mapping", }, ... { "pid": 1, "tid": 1000, "state": "unlocked to map", } ] ``/project/{pid}/task/{tid}`` Accepts ``GET``, returns the task and its corresponding state. An example of returned DTO is:: { "pid": 1, "tid": 1, "state": "unlocked to map", } ``/project/{pid}/actions`` Accepts ``POST``, returns the task and its boundary. An example of the DTO from a client to the backend that requests mapping of a random task:: { "what": "map" } An example of the DTO with the reply from the backend to the client:: { "pid": 1, "tid": 22, "geometry": {"some": "geom"} } An example of the DTO from a client to the backend that requests finishing the task, where ``tid`` must be specified:: { "what": "finish", "tid": 22 } Addressing the scalability issues of the `Endpoints extreme`_, extending the workflow means adding the support for:: { "what": "reset", "tid": 22 } DTO in the ``/project/{pid}/actions`` endpoint. Addressing the scalability issues of the `DTOs extreme`_, the hierarchy of the exposed objects is leveraged, e.g., having *project* with *tasks* and *actions* leads to ``/projects/{pid}/tasks`` and ``/project/{pid}/actions`` endpoints. Load testing ------------ The ``load_test.py`` module is in the repository root. Its documentation follows. .. automodule:: load_test ---- We compare how the ``balanced`` API works for :ref:`almost tm admin` and :ref:`actions history` database schemas. We run the "average" load testing for one hour for both of them. We also stressed the server with "extreme" load testing for ten minutes for both of them. :ref:`almost tm admin` - for average load test (wait 30 to 60 seconds between requests): - average response time is ``21.47`` ms - 100%ile response time is ``170`` ms - #requests ``8664`` - see `full report <_static/average-1h-almost_tm_admin.html>`_ - for extreme load test (wait 1 to 2.5 seconds between requests): - average response time is ``27.89`` ms (v2 ``23.27`` ms) - 100%ile response time is ``3100`` ms (v2 ``2000`` ms) - #requests ``35711`` (v2 ``34657``) - see `full report <_static/extreme-10min-almost_tm_admin.html>`_ - see `full report v2 <_static/extreme-10min-almost_tm_admin-v2.html>`_ :ref:`actions history` - for average load test (wait 30 to 60 seconds between requests): - average response time is ``26.2`` ms - 100%ile response time is ``180`` ms - #requests ``8370`` - see `full report <_static/average-1h-actions_history.html>`_ - for extreme load test (wait 1 to 2.5 seconds between requests): - average response time is ``22.45`` ms (``22.24`` ms) - 100%ile response time is ``170`` ms (``320`` ms) - #requests ``35578`` (v2 ``36405``) - see `full report <_static/extreme-10min-actions_history.html>`_ - see `full report v2 <_static/extreme-10min-actions_history-v2.html>`_ Conclusion ---------- We conducted load testing experiments in order to stress different database designs in an environment of many (simulated) mappers. Recalling the :ref:`backend diagram`, our code base consists solely of *API* endpoints and *Database*. In real application, three dots ``...`` from :ref:`backend diagram` would be a strata connecting API with database. We can afford simplification, because a lot of functionality is currently out of scope these documents. When we study the results of load testing, there is not much difference between the :ref:`almost tm admin` and :ref:`actions history`. This is interesting in relevance to the database experiments described in :ref:`database experiments`. Only difference observed is that *actions history* looks more stable. "Extreme" load test had to be conducted to find this. However, we need to be aware of that we tested limited functionality. For example, retrieving the history of the actions per project was not tested. The functionality like this should be described in the load test, implemented, and the implementation performance measured.