// loss 突然變成0 python train.py -b=8 INFO: Using device cpu INFO: Network: 1 input channels 7 output channels (classes) Bilinear upscaling INFO: Creating dataset with 868 examples INFO: Starting training: Epochs: 5 Batch size: 8 Learning rate: 0.001 Training size: 782 Validation size: 86 Checkpoints: True Device: cpu Images scaling: 1 Epoch 1/5: 10%|██████████████▏ | 80/782 [01:3313:21, 1.14s/img, loss (batch)=0.886I NFO: Validation cross entropy: 1.86862473487854 Epoch 1/5: 20%|███████████████████████████▊ | 160/782 [03:3411:51, 1.14s/img, loss (batch)=2.35e-7I NFO: Validation cross entropy: 5.887489884504049e-10 Epoch 1/5: 31%|███████████████████████████████████████████▌ | 240/782 [05:4111:29, 1.27s/img, loss (batch)=0I NFO: Validation cross entropy: 0.0 Epoch 1/5: 41%|██████████████████████████████████████████████████████████ | 320/782 [07:4909:16, 1.20s/img, loss (batch)=0I NFO: Validation cross entropy: 0.0 Epoch 1/5: 51%|████████████████████████████████████████████████████████████████████████▋ | 400/782 [09:5507:31, 1.18s/img, loss (batch)=0I NFO: Validation cross entropy: 0.0 Epoch 1/5: 61%|███████████████████████████████████████████████████████████████████████████████████████▏ | 480/782 [12:0205:58, 1.19s/img, loss (batch)=0I NFO: Validation cross entropy: 0.0 Epoch 1/5: 72%|█████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 560/782 [14:0404:16, 1.15s/img, loss (batch)=0I NFO: Validation cross entropy: 0.0 Epoch 1/5: 82%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ | 640/782 [16:1102:49, 1.20s/img, loss (batch)=0I NFO: Validation cross entropy: 0.0 Epoch 1/5: 92%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 720/782 [18:2101:18, 1.26s/img, loss (batch)=0I NFO: Validation cross entropy: 0.0 Epoch 1/5: 94%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 736/782 [19:1701:12, 1.57s/img, loss (batch)=0] Traceback (most recent call last): File "train.py", line 182, in module> val_percent=args.val / 100) File "train.py", line 66, in train_net for batch in train_loader: File "/public/home/lidd/.conda/envs/lgg2/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 819, in __next__ return self._process_data(data) File "/public/home/lidd/.conda/envs/lgg2/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data data.reraise() File "/public/home/lidd/.conda/envs/lgg2/lib/python3.6/site-packages/torch/_utils.py", line 385, in reraise raise self.exc_type(msg) RuntimeError: Caught RuntimeError in DataLoader worker process 4. Original Traceback (most recent call last): File "/public/home/lidd/.conda/envs/lgg2/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop data = fetcher.fetch(index) File "/public/home/lidd/.conda/envs/lgg2/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch return self.collate_fn(data) File "/public/home/lidd/.conda/envs/lgg2/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py", line 74, in default_collate return {key: default_collate([d[key] for d in batch]) for key in elem} File "/public/home/lidd/.conda/envs/lgg2/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py", line 74, in dictcomp> return {key: default_collate([d[key] for d in batch]) for key in elem} File "/public/home/lidd/.conda/envs/lgg2/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py", line 55, in default_collate return torch.stack(batch, 0, out=out) RuntimeError: Expected object of scalar type Double but got scalar type Byte for sequence element 4 in sequence argument at position #1 'tensors'
交叉熵?fù)p失函數(shù)是衡量輸出與標(biāo)簽之間的損失,通過求導(dǎo)確定梯度下降的方向。
一是因?yàn)轭A(yù)測輸出為0,二是因?yàn)闃?biāo)簽為0。
如果是因?yàn)闃?biāo)簽為0,那么一開始loss就可能為0.
檢查參數(shù)初始化
檢查前向傳播的網(wǎng)絡(luò)
檢查loss的計(jì)算格式
檢查梯度下降
是否出現(xiàn)梯度消失。
實(shí)際上是標(biāo)簽出了錯(cuò)誤
補(bǔ)充:pytorch訓(xùn)練出現(xiàn)loss=na
遇到一個(gè)很坑的情況,在pytorch訓(xùn)練過程中出現(xiàn)loss=nan的情況
1.學(xué)習(xí)率太高。
2.loss函數(shù)有問題
3.對于回歸問題,可能出現(xiàn)了除0 的計(jì)算,加一個(gè)很小的余項(xiàng)可能可以解決
4.數(shù)據(jù)本身,是否存在Nan、inf,可以用np.isnan(),np.isinf()檢查一下input和target
5.target本身應(yīng)該是能夠被loss函數(shù)計(jì)算的,比如sigmoid激活函數(shù)的target應(yīng)該大于0,同樣的需要檢查數(shù)據(jù)集
以上為個(gè)人經(jīng)驗(yàn),希望能給大家一個(gè)參考,也希望大家多多支持腳本之家。如有錯(cuò)誤或未考慮完全的地方,望不吝賜教。
標(biāo)簽:常德 鷹潭 黔西 四川 惠州 黑龍江 上海 益陽
巨人網(wǎng)絡(luò)通訊聲明:本文標(biāo)題《Pytorch訓(xùn)練網(wǎng)絡(luò)過程中l(wèi)oss突然變?yōu)?的解決方案》,本文關(guān)鍵詞 Pytorch,訓(xùn)練,網(wǎng)絡(luò),過程中,;如發(fā)現(xiàn)本文內(nèi)容存在版權(quán)問題,煩請?zhí)峁┫嚓P(guān)信息告之我們,我們將及時(shí)溝通與處理。本站內(nèi)容系統(tǒng)采集于網(wǎng)絡(luò),涉及言論、版權(quán)與本站無關(guān)。