A friend of mine misconfigured their exporter.yaml .
After the exporter failed to start the configuration was fixed and another attempt started.
This however resulted in a failure with
status = StatusCode.ALREADY_EXISTS
System
Reproduction
Have a misconfigured exporter.yaml and try to start the exporter.
Fix the configuration after a failed attempt.
Observed Behaviour
The coordinator keeps an instance of the exporter and will refuse to accept a new instance of this exporter.
Coordinator
DEBUG:grpc._cython.cygrpc:[_cygrpc] Loaded running loop: id(loop)=139766749596624
INFO:root:exporter connected: ipv4:10.88.0.49:49302
DEBUG:root:exporter in_msg startup {
version: "25.0+264-gc246fab8"
name: "emlix-test"
}
ERROR:root:error in exporter message handler
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/labgrid/remote/coordinator.py", line 426, in request_task
raise ExporterError(
labgrid.remote.coordinator.ExporterError: exporter with name 'emlix-test' is already connected from ipv4:10.88.0.49:46262
Exporter
ERROR:root:unexpected grpc error in coordinator message pump task
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/labgrid/remote/exporter.py", line 899, in message_pump
async for out_message in self.stub.ExporterStream(queue_as_aiter(self.out_queue)):
File "/usr/local/lib/python3.11/dist-packages/grpc/aio/_call.py", line 366, in _fetch_stream_responses
await self._raise_for_status()
File "/usr/local/lib/python3.11/dist-packages/grpc/aio/_call.py", line 274, in _raise_for_status
raise _create_rpc_error(
grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
status = StatusCode.ALREADY_EXISTS
details = "startup failed: exporter with name 'emlix-test' is already connected from ipv4:10.88.0.49:46262"
debug_error_string = "UNKNOWN:Error received from peer ipv4:192.168.201.3:20409 {grpc_message:"startup failed: exporter with name \'emlix-test\' is already connected from ipv4:10.88.0.49:46262", grpc_status:6}"
>
DEBUG:root:pump task exited, shutting down exporter
DEBUG:asyncio:Close <_UnixSelectorEventLoop running=False closed=False debug=True>
Expected Behaviour
An exporter that failed to startup properly should not change the state of the coordinator.
Additional information
Even though the exporter clearly fails to startup the return code after being shut down will be 0 in case the configuration is correct.
This should be fixed by not masking errors in
|
except grpc.aio.AioRpcError as e: |
Please also find the attached tarball
repro.tar.gz
for reproduction and logs.
The cleanup routine
|
session = self.exporters.pop(peer) |
is entered occassionally so the error will not always be observed and multiple restarts may be required to trigger this behaviour.
Multiple restart is handled by
systemd for the exporter.
There are certain ways to address this issue:
- fix the underlying grpc code or to actually promote the cancelled status and handle it in
- untangle
|
async def add_resource(self, group_name, resource_name, cls, params): |
and
|
async def run(self) -> None: |
, ie. do a sanity check on the configuration first
- provide a command for labgrid-client to clear a reference to a broken exporter
- add an option to exporter and a field to the startup message to forcefully register the exporter in the coordinator even though a reference exists in
|
if existing := self.get_exporter_by_name(name): |
- make the exporter disconnect properly from the coordinator if it fails due to configuration errors
There is also an unhandled exception AttributeError in
since
self.child was never set.
A friend of mine misconfigured their
exporter.yaml.After the exporter failed to start the configuration was fixed and another attempt started.
This however resulted in a failure with
System
master:c246fab86fe451db46507b77bc7fe58aaad3a79eReproduction
Have a misconfigured
exporter.yamland try to start the exporter.Fix the configuration after a failed attempt.
Observed Behaviour
The coordinator keeps an instance of the exporter and will refuse to accept a new instance of this exporter.
Coordinator
Exporter
Expected Behaviour
An exporter that failed to startup properly should not change the state of the coordinator.
Additional information
Even though the exporter clearly fails to startup the return code after being shut down will be
0in case the configuration is correct.This should be fixed by not masking errors in
labgrid/labgrid/remote/exporter.py
Line 935 in c246fab
Please also find the attached tarball
repro.tar.gz
for reproduction and logs.
The cleanup routine
labgrid/labgrid/remote/coordinator.py
Line 491 in c246fab
Multiple restart is handled by
systemdfor the exporter.There are certain ways to address this issue:
labgrid/labgrid/remote/coordinator.py
Line 486 in c246fab
labgrid/labgrid/remote/exporter.py
Line 1012 in c246fab
labgrid/labgrid/remote/exporter.py
Line 844 in c246fab
labgrid/labgrid/remote/coordinator.py
Line 425 in c246fab
There is also an unhandled exception
AttributeErrorinlabgrid/labgrid/remote/exporter.py
Line 213 in c246fab
self.childwas never set.