I
Size: a a a
I
VB
VB
VB
kk
V
I
V
I
I
ERROR 2021-06-08 21:29:32,819 grpclib.server request_handler 462 : Application error
Traceback (most recent call last):
File "/home/infatum/deep_learning/venv/lib/python3.8/site-packages/grpclib/server.py", line 440, in request_handler
await method_func(stream)
File "/home/infatum/deep_learning/service/deep_learning.py", line 249, in train
train(X, y, GPUs=self.gpu_count, concat_y=include_y, **self._train_parameters)
File "/home/infatum/deep_learning/DL/torch/torch_distributed_train.py", line 65, in train
mp.spawn(
File "/home/infatum/deep_learning/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/infatum/deep_learning/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/home/infatum/deep_learning/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/infatum/deep_learning/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/home/infatum/deep_learning/DL/torch/torch_distributed_train.py", line 44, in parallel_train
_simple_train(train_loader, **params, parallel=True, rank=rank)
File "/home/infatum/deep_learning/DL/torch/torch_distributed_train.py", line 129, in _simple_train
loss.backward()
File "/home/infatum/deep_learning/venv/lib/python3.8/site-packages/torch/tensor.py", line 245, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/infatum/deep_learning/venv/lib/python3.8/site-packages/torch/autograd/__init__.py", line 145, in backward
Variable._execution_engine.run_backward(
RuntimeError: [/pytorch/third_party/gloo/gloo/transport/tcp/pair.cc:575] Connection closed by peer [127.0.1.1]:927
I
I
I
I
OV
kk
2
B