0. Abstarct
- Dynamically Expandable Network (DEN)
- dynamically decide its network capacity as it trains on a sequence of tasks
- trained in an online manner by performing selective retraining
1 Introduction
- lifelong learning → trained in an online manner by performing selective retraining
- [ Strategy 1 ] Fine-tune : training both origin & new task → degenerate ( catastrophic forgetting)
- how can we ensure that the knowledge sharing through the network is beneficial for all tasks?
- [ Strategy 2 ] Regularization : prevents the parameters from drastic changes
- [ Our Strategy ]
- retrain the network at each task t such that each new task utilizes and changes only the relevant part of the previous trained network
- still allowing to expand the network capacity when necessary
- Challenges
- 1) Achieving scalability and efficiency in training
- 2) Deciding when to expand the network, and how many neurons to add
- 3) Preventing semantic drift, or catastrophic forgetting
2 Related Work
(1) Lifelong learning
- continual learning
(2) Preventing catastrophic forgetting
- catastrophic forgetting : where the retraining of the network for new tasks results in the network forgetting what are learned for previous tasks
- Solution
- regularizer (e.g. l2 regularizer)
- Elastic Weight Consolidation (EWC) : regularizes the model parameter at each step
- Solution
(3) Dynamic network expansion
- : neural networks that can dynamically increase its capacity during training
- incrementally train adenoising autoencoder
- nonparametric NN model → also find the minimum dimensionality of each layer that can reduce the loss
- multi-task setting → considered X
3 Incremental Learning of a Dynamically Expandable Network
- goal : to learn models for a sequence of T tasks ( T : unbounded )
(1) Algorithm 1 Incremental Learning of a Dynamically Expandable Network
- lifelong learning agent at time t : aim to minimize the loss
- let the network to maximally utilize the knowledge obtained from the previous tasks → dynamically expand
(2) Algorithm 2 Selective Retraining
- most naive method : retraining the entire model every time → costly
- DEN : selective retraining (retraining only the weights that are affected by the new task)
- Step1
- l1-regularization for sparsity in the weights (each neuron is connected to only few neurons )
- Step2
- fit a sparse linear model to predict task t using topmost hidden units of the neural network via solving the following problem:
- Step 3
- BFS on the network starting from those selected nodes → to identify all input units -- connected -- output units
- train only the weights of the selected subnetwork S
- l2 regularizer (already sparse)
(3) Algorithm 3 Dynamic Network Expansion
- new task is highly relevant to the old ones (aggregated partial knowledge obtained from each task is sufficient to explain the new task)
- accurately represent X → new nuerons are needed
- group sparse regularization → to dynamically decide how many neurons to add w/o repeated retraining
- After selective training
- checks if the loss is below certain threshold
- → group sparse regularization
- → unnecessary hidden units will be dropped altogether
- → expect the model to capture new features that were not previously represented in t-1
(4) Algorithm 4 Network Split/Duplication
- CL's crucial problem
- semantic drift = catastrophic forgetting
- λ |Wt - Wt−1|
- l2 regularization → enforce the solution Wt to be found close to Wt−1
- high λ → try to preserve the knowledge learned at previous tasks
- + Split / Duplicate
- have features that are optimal for two different tasks
- can be performed for all hidden units in parallel
- after split/duplicate → need to train the weights again
(5) Timestamped Inference
- timestamp each newly added unit j
4 Experiment
(1) Baselines and our model
1) Feedforward networks2) Convolutional networks
(2) Base network settings.
(3) Datasets
1) MNIST-Variation2) CIFAR-1003) AWA (Animals with Attributes)
4.1 Quantative Evaluation
(1) Effect of selective retraining
(2) Effect of network expansion
(3) Effect of network split/duplication and timestamped inference
5 Conclusion
'Paper > Continual Learning' 카테고리의 다른 글
[CL] End-to-End Incremental Learning (0) | 2022.08.02 |