动手学深度学习(七):循环神经网络与现代循环神经网络
(一) 文本预处理
对一段文字进行统计预测,首先得进行处理,将字符串处理为单词、字符等词元.
步骤如下:
1. 读取数据集
简单地将文本的每一行读入,可以使用H.G.Well的"time_machine"数据集:
1 2 3 4 5 6 7 8 9 10 11 d2l.DATA_HUB['time_machine' ] = (d2l.DATA_URL + 'timemachine.txt' , '090b5e7e70c295757f55df93cb0a180b9691891a' ) def read_time_machine (): """将时间机器数据集加载到文本行的列表中""" with open (d2l.download('time_machine' ), 'r' ) as f: lines = f.readlines() return [re.sub('[^A-Za-z]+' , ' ' , line).strip().lower() for line in lines] lines = read_time_machine()
2. 词元化
将每一行又分解为若干词语:
1 2 3 4 5 6 7 8 9 10 def tokenize (lines, token='word' ): """将文本行拆分为单词或字符词元""" if token == 'word' : return [line.split() for line in lines] elif token == 'char' : return [list (line) for line in lines] else : print ('错误:未知词元类型:' + token) tokens = tokenize(lines)
3.构建字典词表
将字符映射为字典的数字索引,统计词语出现的频率来分配索引,很少出现的词语会被移除并被映射到一个未知词元“<unk>”,另外还有一些特殊词元例如:填充词元(“<pad>”); 序列开始词元(“<bos>”); 序列结束词元(“<eos>”).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 class Vocab : """文本词表""" def __init__ (self, tokens=None , min_freq=0 , reserved_tokens=None ): if tokens is None : tokens = [] if reserved_tokens is None : reserved_tokens = [] counter = count_corpus(tokens) self ._token_freqs = sorted (counter.items(), key=lambda x: x[1 ], reverse=True ) self .idx_to_token = ['<unk>' ] + reserved_tokens self .token_to_idx = {token: idx for idx, token in enumerate (self .idx_to_token)} for token, freq in self ._token_freqs: if freq < min_freq: break if token not in self .token_to_idx: self .idx_to_token.append(token) self .token_to_idx[token] = len (self .idx_to_token) - 1 def __len__ (self ): return len (self .idx_to_token) def __getitem__ (self, tokens ): if not isinstance (tokens, (list , tuple )): return self .token_to_idx.get(tokens, self .unk) return [self .__getitem__(token) for token in tokens] def to_tokens (self, indices ): if not isinstance (indices, (list , tuple )): return self .idx_to_token[indices] return [self .idx_to_token[index] for index in indices] @property def unk (self ): return 0 @property def token_freqs (self ): return self ._token_freqs def count_corpus (tokens ): """统计词元的频率""" if len (tokens) == 0 or isinstance (tokens[0 ], list ): tokens = [token for line in tokens for token in line] return collections.Counter(tokens)
(二) 语言模型与循环神经网络
语言模型通俗的来讲是:给定一个一句话前面的部分,预测接下来最有可能的一个词是什么.对于一段文字来讲,如果模型能够理解一整段话,那就不能只把一段话拆分为几个词语单独分析,需要结合这个词语的前文以及后文(加上后文就成了双向RNN)来综合分析.
一个基本的循环神经网络如下:
0 t = g ( V s t ) = V f ( U x t + W s t − 1 ) = V f ( U x t + W f ( U x t − 1 + W s t − 2 ) ) = V f ( U x t + W f ( U x t − 1 + W f ( U x t − 2 + W s t − 3 ) ) ) = V f ( U x t + W f ( U x t − 1 + W f ( U x t − 2 + W f ( U x t − 3 + … ) ) ) ) \begin{aligned}
0_{t}& =g(V\mathrm{s}_{t}) \\
&=Vf(U\mathbf{x}_t+W\mathbf{s}_{t-1}) \\
&=Vf(U\mathbf{x}_{t}+Wf(U\mathbf{x}_{t-1}+W\mathbf{s}_{t-2})) \\
&=Vf(U\mathbf{x}_t+Wf(U\mathbf{x}_{t-1}+Wf(U\mathbf{x}_{t-2}+W\mathbf{s}_{t-3}))) \\
&=Vf(U\mathbf{x}_{t}+Wf(U\mathbf{x}_{t-1}+Wf(U\mathbf{x}_{t-2}+Wf(U\mathbf{x}_{t-3}+\ldots))))
\end{aligned}
0 t = g ( V s t ) = V f ( U x t + W s t − 1 ) = V f ( U x t + W f ( U x t − 1 + W s t − 2 ) ) = V f ( U x t + W f ( U x t − 1 + W f ( U x t − 2 + W s t − 3 ) ) ) = V f ( U x t + W f ( U x t − 1 + W f ( U x t − 2 + W f ( U x t − 3 + … ) ) ) )
(三) 循环神经网络的实现
这里就直接看使用框架的简洁实现:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 num_hiddens = 256 rnn_layer = nn.RNN(len (vocab), num_hiddens) state = torch.zeros((1 , batch_size, num_hiddens)) X = torch.rand(size=(num_steps, batch_size, len (vocab))) Y, state_new = rnn_layer(X, state) class RNNModel (nn.Module): """循环神经网络模型""" def __init__ (self, rnn_layer, vocab_size, **kwargs ): super (RNNModel, self ).__init__(**kwargs) self .rnn = rnn_layer self .vocab_size = vocab_size self .num_hiddens = self .rnn.hidden_size if not self .rnn.bidirectional: self .num_directions = 1 self .linear = nn.Linear(self .num_hiddens, self .vocab_size) else : self .num_directions = 2 self .linear = nn.Linear(self .num_hiddens * 2 , self .vocab_size) def forward (self, inputs, state ): X = F.one_hot(inputs.T.long(), self .vocab_size) X = X.to(torch.float32) Y, state = self .rnn(X, state) output = self .linear(Y.reshape((-1 , Y.shape[-1 ]))) return output, state def begin_state (self, device, batch_size=1 ): if not isinstance (self .rnn, nn.LSTM): return torch.zeros((self .num_directions * self .rnn.num_layers, batch_size, self .num_hiddens), device=device) else : return (torch.zeros(( self .num_directions * self .rnn.num_layers, batch_size, self .num_hiddens), device=device), torch.zeros(( self .num_directions * self .rnn.num_layers, batch_size, self .num_hiddens), device=device)) device = d2l.try_gpu() net = RNNModel(rnn_layer, vocab_size=len (vocab)) net = net.to(device) d2l.train_ch8('time traveller' , 10 , net, vocab, device)
(四) 现代循环神经网络
1.双向RNN
双向RNN在之前提到过,增加了反向的隐状态,即:
o t = g ( V s t + V ′ s t ′ ) s t = f ( U x t + W s t − 1 ) s t ′ = f ( U ′ x t + W ′ s t + 1 ′ ) \begin{aligned}
&o_{t} =g(V\mathrm{s}_t+V^{\prime}\mathrm{s}_t^{\prime}) \\
&\mathbf{s}_{t} =f(U\mathbf{x}_t+W\mathbf{s}_{t-1}) \\
&s_{t}^{\prime} =f(U^{\prime}\mathbf{x}_{t}+W^{\prime}\mathbf{s}_{t+1}^{\prime})
\end{aligned}
o t = g ( V s t + V ′ s t ′ ) s t = f ( U x t + W s t − 1 ) s t ′ = f ( U ′ x t + W ′ s t + 1 ′ )
但是也不能盲目将双向循环神经网络应用于任何预测,有时存在严重缺陷.
2.深度RNN
之前介绍的循环神经网络只有一个隐藏层,堆叠两个以上的隐藏层的时候就得到了深度循环神经网络,如图所示:
老实来讲这一章没有深入读懂,涉及到的随机过程的有些知识例如马尔可夫模型/齐普夫定律等等并没有基础,只能做一个大概的阅读分享.