[CL] Lifelong Learning with Dynamically Expandable Networks (ICLR 2018)

https://openreview.net/pdf?id=Sk7KsfW0-

0. Abstarct

Dynamically Expandable Network (DEN)
- dynamically decide its network capacity as it trains on a sequence of tasks
- trained in an online manner by performing selective retraining

1 Introduction

lifelong learning → trained in an online manner by performing selective retraining
- [ Strategy 1 ] Fine-tune : training both origin & new task → degenerate ( catastrophic forgetting)
how can we ensure that the knowledge sharing through the network is beneficial for all tasks?
- [ Strategy 2 ] Regularization : prevents the parameters from drastic changes
[ Our Strategy ]
- retrain the network at each task t such that each new task utilizes and changes only the relevant part of the previous trained network
- still allowing to expand the network capacity when necessary
- Challenges
  - 1) Achieving scalability and efficiency in training
  - 2) Deciding when to expand the network, and how many neurons to add
  - 3) Preventing semantic drift, or catastrophic forgetting

2 Related Work

(1) Lifelong learning

continual learning

(2) Preventing catastrophic forgetting

catastrophic forgetting : where the retraining of the network for new tasks results in the network forgetting what are learned for previous tasks
- Solution
  - regularizer (e.g. l2 regularizer)
  - Elastic Weight Consolidation (EWC) : regularizes the model parameter at each step

(3) Dynamic network expansion

: neural networks that can dynamically increase its capacity during training
- incrementally train adenoising autoencoder
- nonparametric NN model → also find the minimum dimensionality of each layer that can reduce the loss
multi-task setting → considered X

3 Incremental Learning of a Dynamically Expandable Network

DEN

Selective retraining / Dynamic network expansion / Network split/duplication

goal : to learn models for a sequence of T tasks ( T : unbounded )

(1) Algorithm 1 Incremental Learning of a Dynamically Expandable Network

lifelong learning agent at time t : aim to minimize the loss

let the network to maximally utilize the knowledge obtained from the previous tasks → dynamically expand

Modules

(2) Algorithm 2 Selective Retraining

most naive method : retraining the entire model every time → costly
DEN : selective retraining (retraining only the weights that are affected by the new task)
Step1
- l1-regularization for sparsity in the weights (each neuron is connected to only few neurons )

Step2
- fit a sparse linear model to predict task t using topmost hidden units of the neural network via solving the following problem:

Step 3
- BFS on the network starting from those selected nodes → to identify all input units -- connected -- output units
- train only the weights of the selected subnetwork S
- l2 regularizer (already sparse)

(3) Algorithm 3 Dynamic Network Expansion

new task is highly relevant to the old ones (aggregated partial knowledge obtained from each task is sufficient to explain the new task)
- accurately represent X → new nuerons are needed
group sparse regularization → to dynamically decide how many neurons to add w/o repeated retraining

After selective training
- checks if the loss is below certain threshold
- → group sparse regularization
- → unnecessary hidden units will be dropped altogether
- → expect the model to capture new features that were not previously represented in t-1

(4) Algorithm 4 Network Split/Duplication

CL's crucial problem
- semantic drift = catastrophic forgetting
λ |Wt - Wt−1|
- l2 regularization → enforce the solution Wt to be found close to Wt−1
- high λ → try to preserve the knowledge learned at previous tasks

+ Split / Duplicate
- have features that are optimal for two different tasks
- can be performed for all hidden units in parallel
- after split/duplicate → need to train the weights again

(5) Timestamped Inference

timestamp each newly added unit j

~~4 Experiment~~

~~(1) Baselines and our model~~

~~1) Feedforward networks~~
~~2) Convolutional networks~~

~~(2) Base network settings.~~

~~(3) Datasets~~

~~1) MNIST-Variation~~
~~2) CIFAR-100~~
~~3) AWA (Animals with Attributes)~~

~~4.1 Quantative Evaluation~~

~~(1) Effect of selective retraining~~

~~(2) Effect of network expansion~~

~~(3) Effect of network split/duplication and timestamped inference~~

~~5 Conclusion~~

~~6 REFERENCES~~

'Paper > Continual Learning' 카테고리의 다른 글

[CL] End-to-End Incremental Learning (0)	2022.08.02

IIIIIIIIIIII

[CL] Lifelong Learning with Dynamically Expandable Networks (ICLR 2018)

0. Abstarct

1 Introduction

2 Related Work

3 Incremental Learning of a Dynamically Expandable Network

DEN

Modules

'Paper > Continual Learning' 카테고리의 다른 글

티스토리툴바

[CL] Lifelong Learning with Dynamically Expandable Networks (ICLR 2018)

0. Abstarct

1 Introduction

2 Related Work

3 Incremental Learning of a Dynamically Expandable Network

DEN

Modules

'Paper > Continual Learning' 카테고리의 다른 글

'Paper/Continual Learning' Related Articles

티스토리툴바