The repository is structured as follows:
install/: scripts to install the docker containers, load/dump data, etc.analysis/: the Python scripts for data analysisconfig/: configuration automatically generated bydocker_create_elastic_servers.shto store the credentialspreproc/: all the scripts for preprocessing (e.g., flow identification, insertion of extra information by decoding LoRa headers, etc.)tools/: auxiliary scripts used elsewhere (e.g., LoRa decoding, elastic search queries, etc.)
We propose two options:
- Docker
- Virtual Machines (e.g., Open Nebula)
In any case, the installation produces config/myconfig.py including all the parameters and credentials. This python script is typically included by all oter Python scripts to get the local parameters of the installation (IP address of the server, credentials, etc.)
You should read the messages printed at the end of the installation process to see the procedure to connect Kibana and Elastic Search (a manual procedure is still needed):
-
Use your browser to visit
http://IP_VM:5601and use the enrollment key and the verification code provided by the script (both in the output and inmyconfig.py) -
You can use the Kibana interface to dig into the elastic search index
http://IP_VM:5601
We provide the scripts for a docker installation of Kibana and elastic search.
-
one container for Elastic Search, another container for Kibana
- you can select the version number in the scripts
-
Please run:
docker_create_elastic_servers
We provide the scripts for a VM installation of Kibana and elastic search.
-
one VM for Kibana + one elastic search node
-
other VM for other elastic search nodes to create a cluster (not yet supported)
-
Please run:
vm_create_elastic_servers.sh
We rely on the following Python packages:
- pandas
- requests
- elasticsearch
- seaborn
- matplotlib
You can install them with:
apt-get install python3-venvif the package is not yest installed on your system (debian-like distribution)python3 -m venv .venvsource .venv/bin/activatepip install -r install/requirements.txt
We rely on elasticdump () to backup and restore the dataset. npm and the elasticdumppackage are automatically installed when you run the scripts.
We provide the following scripts to dump/load data in elastic search servers:
-
elasticdump_from_lora-es_per_month.shconnects to the elastic search server at ICube, and creates one compressed json dump per month -
elasticdump_load_data.shis a bash script to connect to the local installation and that injects the dump in the local elastic search instance (custom index name)
You can explore the dataset in Kibana (cf. install section), by default: http://IP@_ES_SERVER:5601/app/enterprise_search/content/search_indices/lora-index
Be careful, we have in our dataset two versions for the mapping (v3 and v4). The records do not present exactly the same fields. Thus, we have a script to reindex a dataset (version 3 or 4) into an outpupt index, mapping in particular the fields we used in the data analysis. In this way, we have a global index for all the years.
-
elasticsearch_reindex.shis a bash script to reindex a v3/v4 lora index into the common index (that will be used for all the analysis) -
The following fields differ between v3 and v4
- mqtt_time -> time
- txInfo.modulation -> txInfo.modulation.type (e.g. LoRa)
- loRaModulationInfo -> txInfo.modulation.lora
- rxInfo.gatewayID -> rxInfo.gatewayId
- rxInfo.uplinkID -> rxInfo.uplinkIdText
Be careful: uplinkID (v4) is a long, uplinkIDText (v3) is text!
- codeRate -> change
4/5(v3) intoC_4_5(v4) to be consistent - rxInfo.LoRaSNR -> rxInfo.snr
-
All scripts use
myconfig.pyto store the parameters (IP address of the server, credentials) for elastic search. -
All plots are saved in pdf format in the
analysis/figuresdirectory. Then should be run in the following order
We regrouped all the scripts for the pre-processing in the preproc directory:
insert_extra_infos.py: insert LoRa information, by dissecting LoRa frames (additional fields per record in the index);insert_dup_infos: identify duplicated packets, flagged accordingly in the index;extract_interpacket_distribution: extract for each address the sequence of packets, chronologically ordered. Disclaimer: several devices may share the same address, thus we implemented an algorithm to infer the different flows sharing the same address
All these scripts can be stopped and restarted at any time: they will continue where they stopped.
You should use a local venv for your Python packages, and activate it (source .venv/bin/activate). Please see the install section.
We implemented the following analysis:
-
SF_analysis.py: distribution of traffic for the spreading factors -
traffic_temporal_analysis.py: temporal analysis of the traffic (per day of week, per hour) -
flow-distribution.py: CDF of the traffic per gateway and device
All the preprocessed data is stored locally in the directory data/.
distrib_XXX_YYY.parquet contains the list of packets (timestamp, frame counter) for a flow identified by its devAddr (XXX) and the first frame counter of the flow (YYY).
Caution: different flows may have packets with the same frame counter (and the same devAddr). These packets are generated by different devices, and may not be chronologically close.