API: Endpoints¶
This document discusses the interface to the database for clients. We discuss the procedure names and data transfer objects (DTO). We also discuss the endpoints, i.e., the locations where the procedures are called.
repository: https://git.sr.ht/~qeef/dd-hot-tm
documentation: https://dd-hot-tm.tojemoje.site/
The main problem HOT TM solves is that it helps mappers to manage the mapping of a large area. From the mapper’s point of view, the solution of this problem is straightforward. Split the large area into smaller parts and let the mappers lock and unlock them to communicate what part of the area is being worked on.
HOT TM uses the Project and Task terminology to denote a large area with additional information like the description of the purpose of mapping, and its smaller parts. Each task has its state, e.g. locked or unlocked. (In the HOT TM code base the “state” is often called “status”.) However, a task can also be understand as a thing to do instead of a part of the area. To avoid the misunderstanding, we introduce the Action to denote a transition from one task state to another.
Using the terminology of Project, Task and Action introduced above, we can describe the common mapping workflow. First, Celestine creates new project with all tasks unlocked. Then, Monica, Michal, Marcel and Miriam each locks random task and map that part of the project. Miriam and Marcel are finished with their tasks, unlock them and lock another random tasks. We can see the figure representing the task states (rounded boxes) with corresponding actions (arrows):
In addition, Radek and Ramona comes into the mapping workflow. They find tasks that have been recently mapped and check that these tasks have been mapped properly. To improve the representation of the workflow, we give the meaning to the locked and unlocked states and we name the actions, as shown in the following figure:
The names of the actions is the first half of the database interface. The other half, the DTOs, identify the project, the task, the transition and the mapper. The DTOs should not be confused with the states.
Note that the terminology and the naming of task states only reassembles HOT TM naming – there are more states with slightly different names in HOT TM code.
The purpose of the endpoints¶
The endpoints provide to clients the places where the communication with clients happens. How the communication happens is given by the interface, i.e., the procedure names and DTOs related to the endpoints.
We understand the clients as whatever is used by mappers to request the server – a web page, JOSM editor, or mapper’s script.
We aim on the communication between clients and the server via HTTP, using JSON to encode the values of DTOs. This is de facto standard for client-server communication in the web environment and leveraging other (or more) technologies unnecessarily increase technical debt.
Well-defined constraints on where and how the communication happens simplify the implementation of the clients.
The interface to endpoints (how)¶
The interface consists of procedure names and DTOs. The procedure names
are given by the HTTP: HTTP uses GET
to retrieve information,
POST
to send new information, PUT
to update the existing
information, and DELETE
to destroy the existing information
available at the endpoint.
The DTOs concretize the request, providing additional information as identifier or a reason of change.
The endpoints (where)¶
The endpoints specify the places or resources or objects that can be
retrieved (GET
), created (POST
), changed (PUT
), and/or
deleted (DELETE
).
The endpoints do not necessarily need to reflect the database – they serve the different purpose of looking friendly to the clients.
So, at the end, in the API design, we are interested in the endpoints and their interface with aim on the simplicity of the communication about the mapping workflow between the server and clients.
Non-goals¶
This document does not cover all endpoints HOT TM uses. This document covers only endpoints related to tasks and the transitions between the task states. It is expected there are similar documents covering other endpoints related to the other parts of HOT TM as groups or campaigns, and that there is another document that puts all these parts together and introduces overall endpoints schema.
This document does not cover task issues or annotations. The document aims solely on the main function of tasks within the HOT TM and keeping the tasks history.
This document does not introduce production-ready endpoints and corresponding interface.
Authentication and authorization is out of scope of this document.
Balance between endpoints and DTOs¶
Because we limit ourselves to HTTP, one half of the interface is given
(GET
, POST
, PUT
, and DELETE
.) What left are endpoints
(where the communication happens) and DTOs (how the communication
happens). We try to find a balance between the two.
Endpoints extreme¶
When there is endpoint for everything, we call it endpoints extreme.
HOT TM API is close. Paraphrasing HOT TM API, there are tasks-related
endpoints accepting POST
:
/project/{pid}/task/{tid}/map
Expect the task
tid
of the projectpid
to be in unlocked to map state, changing the task into the locked for mapping state./project/{pid}/task/{tid}/finish
Expect the task
tid
of the projectpid
to be in locked for mapping state, changing the task into the unlocked to check state./project/{pid}/task/{tid}/check
Expect the task
tid
of the projectpid
to be in unlocked to check state, changing the task into the locked for checking state./project/{pid}/task/{tid}/good
Expect the task
tid
of the projectpid
to be in locked for checking state, changing the task into the unlocked done state./project/{pid}/task/{tid}/bad
Expect the task
tid
of the projectpid
to be in locked for checking state, changing the task into the unlocked to map state.
and GET
:
/project/{pid}/tasks/states
Retrieve the state of all tasks of the project
pid
./project/{pid}/task/{tid}/state
Retrieve the state of the task
tid
of the projectpid
.
Where pid
is project identifier and tid
is task identifier. In
such a case, there is little to no information transferred in DTOs.
DTOs extreme¶
When there is a single endpoint for everything, we call it DTOs extreme, because all the information is encoded in DTOs:
/whatever
Accepts
POST
andGET
.For
POST
, the DTOs must always containpid
– the project identifier,tid
– the task identifier, andaction
, whereaction
can be map, finish, check, good, or bad.For
GET
, the DTO must always containpid
– the project identifier. However, theGET
does not have a body asPOST
has, so there is no place where to put the values of the DTO. To keep the “DTOs extreme” approach, we need to encode the DTO’s values in the URL of the endpoint, i.e.,/whatever?pid={pid}
.(Please, note that
/whatever?pid={pid}
indeed is different from the/whatever/{pid}
, because the former is understood as/whatever
path withpid={pid}
query, but the latter only as/whatever/{pid}
path by the URL syntax.)When there is no
tid
– the task identifier – in the DTO (i.e., in the query part of the URL,) it is expected that the client requests the state of all the tasks of the projectpid
. Iftid
is specified within the DTO, the state of the tasktid
of the projectpid
is sent back.
Finding the balance¶
We can see that both extreme approaches suffer from the scalability issues:
An example for Endpoints extreme is extending the workflow with the reset action that changes the state of the task
tid
of the projectpid
from unlocked done to unlocked to map – new endpoint needs to be introduced.An example for DTOs extreme is whatever extension of
GET
request, which is already enough cumbersome in the DTOs extreme example. (Encoding DTO values in the query part of the URL make sense for small number of parameters like when using pagination. It is not scalable.)
When finding the balance between endpoints and DTOs, we aim on the simplicity of the implementation in the clients. Having the right balance between endpoints and DTOs improves the scalability and overall maintainability of the code base.
Our endpoints draft is based on the terminology used at the beginning of this design document – Project, Task, and Action.
The endpoint path, where path reflects the path of the URL syntax,
consists of the endpoint parts. The convention for an endpoint part
is to use plural, like .../projects
, for endpoints representing a
list of objects, and singular followed by an identifier, like
.../project/{pid}
, for endpoints representing particular object,
where ...
may be zero or more endpoint parts, like
.../project/{pid}/tasks
or .../project/{pid}/task/{tid}
.
The values between {
and }
are the identifiers of the objects.
Considering mapping workflow and targeting load testing, our API proposal consists of the following endpoints and DTOs:
/project/{pid}/tasks
Accepts
GET
, returns the list of tasks and their corresponding states.An example of returned DTO is:
[ { "pid": 1, "tid": 1, "state": "unlocked to map", }, { "pid": 1, "tid": 2, "state": "locked for mapping", }, ... { "pid": 1, "tid": 1000, "state": "unlocked to map", } ]
/project/{pid}/task/{tid}
Accepts
GET
, returns the task and its corresponding state.An example of returned DTO is:
{ "pid": 1, "tid": 1, "state": "unlocked to map", }
/project/{pid}/actions
Accepts
POST
, returns the task and its boundary.An example of the DTO from a client to the backend that requests mapping of a random task:
{ "what": "map" }
An example of the DTO with the reply from the backend to the client:
{ "pid": 1, "tid": 22, "geometry": {"some": "geom"} }
An example of the DTO from a client to the backend that requests finishing the task, where
tid
must be specified:{ "what": "finish", "tid": 22 }
Addressing the scalability issues of the Endpoints extreme, extending the workflow means adding the support for:
{
"what": "reset",
"tid": 22
}
DTO in the /project/{pid}/actions
endpoint.
Addressing the scalability issues of the DTOs extreme, the hierarchy
of the exposed objects is leveraged, e.g., having project with tasks
and actions leads to /projects/{pid}/tasks
and
/project/{pid}/actions
endpoints.
Load testing¶
The load_test.py
module is in the repository root. Its documentation
follows.
HOT TM proposal load testing.
This file is meant to be run using locust -f load_test.py
.
To prepare the databases for load testing, build and run the database containers. In the first terminal:
cd hot_tm_proposal/database
docker-compose build --no-cache almost_tm_admin
docker-compose run --rm --name almost_tm_admin almost_tm_admin
In the second terminal:
cd hot_tm_proposal/database
docker-compose build --no-cache actions_history
docker-compose run --rm --name actions_history actions_history
Then, new projects need to be created in the databases. If there is no “testing
virtual environment” in the database
directory, start with creating one:
cd hot_tm_proposal/database
python3 -m venv tve
. tve/bin/activate
pip install -r requirements.txt
and then, in the database
directory (cd hot_tm_proposal/database
,) run
the script to prepare the databases:
python3 drop_all_and_create_10_projects.py
Last step before load testing is to start the FastAPI application. If there is no “testing virtual environment” in the repository root, it’s time to create one:
python3 -m venv tve
. tve/bin/activate
pip install -r hot_tm_proposal/database/requirements.txt
pip install -r hot_tm_proposal/api/requirements.txt
Then run the application either for the almost_tm_admin
database schema:
HOT_TM_DB_SCHEMA=almost_tm_admin PYTHONPATH=hot_tm_proposal/database/ uvicorn hot_tm_proposal.api.balanced:app --workers 4
or for the actions_history
database schema:
HOT_TM_DB_SCHEMA=actions_history PYTHONPATH=hot_tm_proposal/database/ uvicorn hot_tm_proposal.api.balanced:app --workers 4
The last step is to run locust.io (in another terminal, but in the virtual
environment tve
in the repository root):
. tve/bin/activate
locust -f load_test.py
Then, visit the web browser at http://localhost:8089/
, set the parameters
of new load test (we use 100
number of mappers, 10
ramp up, and
http://localhost:8000
host address,) and start load testing.
We compare how the balanced
API works for almost_tm_admin and
actions_history database schemas. We run the “average” load
testing for one hour for both of them. We also stressed the server with
“extreme” load testing for ten minutes for both of them.
for average load test (wait 30 to 60 seconds between requests):
average response time is
21.47
ms100%ile response time is
170
ms#requests
8664
see full report
for extreme load test (wait 1 to 2.5 seconds between requests):
average response time is
27.89
ms (v223.27
ms)100%ile response time is
3100
ms (v22000
ms)#requests
35711
(v234657
)see full report
see full report v2
for average load test (wait 30 to 60 seconds between requests):
average response time is
26.2
ms100%ile response time is
180
ms#requests
8370
see full report
for extreme load test (wait 1 to 2.5 seconds between requests):
average response time is
22.45
ms (22.24
ms)100%ile response time is
170
ms (320
ms)#requests
35578
(v236405
)see full report
see full report v2
Conclusion¶
We conducted load testing experiments in order to stress different
database designs in an environment of many (simulated) mappers.
Recalling the Backend diagram, our code base consists solely of
API endpoints and Database. In real application, three dots ...
from Backend diagram would be a strata connecting API with
database. We can afford simplification, because a lot of functionality
is currently out of scope these documents.
When we study the results of load testing, there is not much difference between the almost_tm_admin and actions_history. This is interesting in relevance to the database experiments described in Experiments.
Only difference observed is that actions history looks more stable. “Extreme” load test had to be conducted to find this.
However, we need to be aware of that we tested limited functionality. For example, retrieving the history of the actions per project was not tested. The functionality like this should be described in the load test, implemented, and the implementation performance measured.