Step 1: Tokenization
Step 2: Build Dictionary
Step 3: Encoding
Step 4: Align Sequences
First, represent words using one-hot vectors.
Second, map the one-hot vectors to low-dimensional vectors
Total #parameter: vocabulary×embedding_dim
Total #parameter: shape(h)× [shape(h)+shape(x)]
SimpleRNN is good at short-term dependence.
SimpleRNN is bad at long-term dependence.
4 × shape(h) × [shape(h)+shape(x)]
Observation: The embedding layer contributes most of the parameters!
Step 1: Train a model on large dataset.
• Perhaps different problem.
• Perhaps different model.
Step 2: Keep only the embedding layer.
Step 3: Train new LSTM and output layers.
参考: