coordinator does not handle exporter side cancel

A friend of mine misconfigured their `exporter.yaml` .  
After the exporter failed to start the configuration was fixed and another attempt started.  
This however resulted in a failure with 

```python
status = StatusCode.ALREADY_EXISTS
```

# System
- coordinator running in container
- exporter running in container
- labgrid version: current `master`: [`c246fab86fe451db46507b77bc7fe58aaad3a79e` ](https://github.com/labgrid-project/labgrid/commit/c246fab86fe451db46507b77bc7fe58aaad3a79e)

# Reproduction
Have a misconfigured `exporter.yaml` and try to start the exporter. 
Fix the configuration after a failed attempt.

# Observed Behaviour
The coordinator keeps an instance of the exporter and will refuse to accept a new instance of this exporter.

## Coordinator
```
DEBUG:grpc._cython.cygrpc:[_cygrpc] Loaded running loop: id(loop)=139766749596624
INFO:root:exporter connected: ipv4:10.88.0.49:49302
DEBUG:root:exporter in_msg startup {
  version: "25.0+264-gc246fab8"
  name: "emlix-test"
}

ERROR:root:error in exporter message handler
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/labgrid/remote/coordinator.py", line 426, in request_task
    raise ExporterError(
labgrid.remote.coordinator.ExporterError: exporter with name 'emlix-test' is already connected from ipv4:10.88.0.49:46262
```

## Exporter
```
ERROR:root:unexpected grpc error in coordinator message pump task
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/labgrid/remote/exporter.py", line 899, in message_pump
    async for out_message in self.stub.ExporterStream(queue_as_aiter(self.out_queue)):
  File "/usr/local/lib/python3.11/dist-packages/grpc/aio/_call.py", line 366, in _fetch_stream_responses
    await self._raise_for_status()
  File "/usr/local/lib/python3.11/dist-packages/grpc/aio/_call.py", line 274, in _raise_for_status
    raise _create_rpc_error(
grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
	status = StatusCode.ALREADY_EXISTS
	details = "startup failed: exporter with name 'emlix-test' is already connected from ipv4:10.88.0.49:46262"
	debug_error_string = "UNKNOWN:Error received from peer ipv4:192.168.201.3:20409 {grpc_message:"startup failed: exporter with name \'emlix-test\' is already connected from ipv4:10.88.0.49:46262", grpc_status:6}"
>
DEBUG:root:pump task exited, shutting down exporter
DEBUG:asyncio:Close <_UnixSelectorEventLoop running=False closed=False debug=True>
```
# Expected Behaviour

An exporter that failed to startup properly should not change the state of the coordinator.

# Additional information
Even though the exporter clearly fails to startup the return code after being shut down will be `0` in case the configuration is correct.
This should be fixed by not masking errors in https://github.com/labgrid-project/labgrid/blob/c246fab86fe451db46507b77bc7fe58aaad3a79e/labgrid/remote/exporter.py#L935

Please also find the attached tarball 

[repro.tar.gz](https://github.com/user-attachments/files/26924638/repro.tar.gz)

for reproduction and logs.

The cleanup routine https://github.com/labgrid-project/labgrid/blob/c246fab86fe451db46507b77bc7fe58aaad3a79e/labgrid/remote/coordinator.py#L491 is entered occassionally so the error will not always be observed and multiple restarts may be required to trigger this behaviour.
Multiple restart is handled by `systemd` for the exporter.

There are certain ways to address this issue:
- fix the underlying grpc code or to actually promote the cancelled status and handle it in https://github.com/labgrid-project/labgrid/blob/c246fab86fe451db46507b77bc7fe58aaad3a79e/labgrid/remote/coordinator.py#L486
- untangle https://github.com/labgrid-project/labgrid/blob/c246fab86fe451db46507b77bc7fe58aaad3a79e/labgrid/remote/exporter.py#L1012 and https://github.com/labgrid-project/labgrid/blob/c246fab86fe451db46507b77bc7fe58aaad3a79e/labgrid/remote/exporter.py#L844, ie. do a sanity check on the configuration first
- provide a command for labgrid-client to clear a reference to a broken exporter
- add an option to exporter and a field to the startup message to forcefully register the exporter in the coordinator even though a reference exists in https://github.com/labgrid-project/labgrid/blob/c246fab86fe451db46507b77bc7fe58aaad3a79e/labgrid/remote/coordinator.py#L425
- make the exporter disconnect properly from the coordinator if it fails due to configuration errors

There is also an unhandled exception `AttributeError` in https://github.com/labgrid-project/labgrid/blob/c246fab86fe451db46507b77bc7fe58aaad3a79e/labgrid/remote/exporter.py#L213 since `self.child` was never set.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

coordinator does not handle exporter side cancel #1855

System

Reproduction

Observed Behaviour

Coordinator

Exporter

Expected Behaviour

Additional information

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

coordinator does not handle exporter side cancel #1855

Description

System

Reproduction

Observed Behaviour

Coordinator

Exporter

Expected Behaviour

Additional information

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions