This commit is contained in:
wassname
2020-02-16 09:40:32 +08:00
parent 0c4e2f0497
commit e9355317ef
18 changed files with 1532 additions and 76 deletions
File diff suppressed because one or more lines are too long
BIN
View File
Binary file not shown.

Before

Width:  |  Height:  |  Size: 36 KiB

BIN
View File
Binary file not shown.

Before

Width:  |  Height:  |  Size: 38 KiB

BIN
View File
Binary file not shown.

Before

Width:  |  Height:  |  Size: 39 KiB

BIN
View File
Binary file not shown.

Before

Width:  |  Height:  |  Size: 38 KiB

BIN
View File
Binary file not shown.

Before

Width:  |  Height:  |  Size: 36 KiB

BIN
View File
Binary file not shown.

Before

Width:  |  Height:  |  Size: 38 KiB

BIN
View File
Binary file not shown.

Before

Width:  |  Height:  |  Size: 39 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 44 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 44 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 43 KiB

After

Width:  |  Height:  |  Size: 44 KiB

BIN
View File
Binary file not shown.

After

Width:  |  Height:  |  Size: 46 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 40 KiB

BIN
View File
Binary file not shown.

After

Width:  |  Height:  |  Size: 41 KiB

+92 -56
View File
@@ -1,29 +1,58 @@
# Using recurrent attentive neural processes for forecasting power usage
# Neural Processes for sequential data
This repo implements ["Recurrent Attentive Neural Process for Sequential Data"](https://arxiv.org/abs/1910.09323) (ANP-RNN) and tests them on real data.
This repo implements ["Recurrent Attentive Neural Process for Sequential Data"](https://arxiv.org/abs/1910.09323) (ANP-RNN) on a toy regression problem. And also tests it on real smart meter data.
![](docs/anp-rnn_4.png)
- [Neural Processes for sequential data](#neural-processes-for-sequential-data)
- [Models](#models)
- [Results](#results)
- [Example outputs](#example-outputs)
- [Example NP](#example-np)
- [Example ANP outputs (sequential)](#example-anp-outputs-sequential)
- [Example ANP-RNN outputs](#example-anp-rnn-outputs)
- [Replicating DeepMind's tensorflow ANP behaviour](#replicating-deepminds-tensorflow-anp-behaviour)
- [Usage](#usage)
- [Smartmeter Data](#smartmeter-data)
- [Code](#code)
- [See also:](#see-also)
## Models
- ANP-RNN ["Recurrent Attentive Neural Process for Sequential Data"](https://arxiv.org/abs/1910.09323)
- ANP: [Attentive Neural Processes](https://arxiv.org/abs/1901.05761)
- NP: [Neural Processes](https://arxiv.org/abs/1807.01622)
This implementation has lots of options so you can run it as a ANP-RNN, or ANP or NP.
I've also made lots of tweaks for flexibility and stability and [replicated the DeepMind ANP results](anp_1d_regression.ipynb) in pytorch. The replication qualitatively seems like a better match than the other pytorch versions of ANP (as of 2019-11-01). You can see other code repositories in the see also section.
![](docs/np_lstm.jpeg)
This implementation has lots of options so you can run it as a [Attentive Neural Process](https://arxiv.org/abs/1901.05761) (ANP), or NP.
I've always made lots of tweaks for flexibility and stability and [replicated the DeepMind ANP results](anp_1d_regression.ipynb) in pytorch. The replication qualitatively seems like a better match than the other pytorch versions of ANP (as of 2019-11-01).
## Results
Results on [*Smartmeter* prediction](./smartmeters-ANP-RNN.ipynb) (lower is better)
|Model|val_np_loss|val_mse_loss|
|--|--|--|
|**ANP-RNN_imp**|**-1.38**|.00423
|ANP-RNN|-1.27|0.0047|
|ANP|-1.3|0.0072|
|NP|-1.3|0.0040|
|LSTM| | |
## Usage
Results on [toy 1d regression](./anp-rnn_1d_regression.ipynb) (lower is better)
- clone this repository
- see requirements.txt for requirements and version
- Start and run the notebook [smartmeters.ipynb](https://github.com/wassname/attentive-neural-processes/blob/master/smartmeters.ipynb)
## Data
- Some data is included, you can get more from https://www.kaggle.com/jeanmidev/smart-meters-in-london/version/11
- Inputs are:
- Weather
- Time features: time of day, day of week, month of year, etc
- Bank holidays
- Position in sequence: days since start of window
- Target is: mean power usage on block 0
|model|val_loss|
|-----|---------|
| **ANP-RNN(impr)**| **-1.3217**|
| ANP-RNN| -0.62|
| ANP| -0.4228|
| ANP(impr)| -0.3182|
| NP| -1.2687 |
## Example outputs
@@ -32,58 +61,28 @@ Here the black dots are input data, the dotted line is the true data. The blue l
I chose a difficult example below, it's a window in the test set that deviates from the previous pattern. Given 3 days inputs, it must predict the next day, and the next day has higher power usage than previously. The trained model manages to predict it based on the inputs.
### Example ANP-RNN outputs
### Example NP
![](docs/anp-rnn_2.png)
Here we see underfitting, since the curve doesn't match the data
![](docs/anp-rnn_3.png)
![](docs/np_4.png)
![](docs/anp-rnn_4.png)
### Example ANP outputs (sequential)
![](docs/1.png)
Here we see overfitting, but the uncertainty seems to small, and the fit could be improved
![](docs/4.png)
![](docs/anp_4.png)
![](docs/7.png)
### Example ANP-RNN outputs
![](docs/12.png)****
This has a better calibrated uncertainty and a better fit
![](docs/19.png)
### Example LSTM Baseline outputs
Compare this to a quick LSTM baseline below, which didn't predict this divergance from the pattern. (Bear in mind that I didn't tweak this model as much). The uncertainty and prediction are also less smooth and the log probability is lower.
An LSTM with an encoder style similar to ANP's:
![](docs/lstm_with_context.png)
and a normal LSTM:
![](docs/lstm_baseline.png)
## Code
This is based on the code listed in the next section, with some changes. The most notable ones add stability, others are to make sure it can handle predicting into the future:
Changes for a predictive use case:
- target points are always in the future, context is in the past
- context and targets are still sampled randomly during training
![](docs/anp-rnn_4.png)
Changes for stability:
- in eval mode, take mean of latent space, and mean of output isntead of sampling
- use log_variance where possible (there is a flag to try without this, and it seems to help)
- and add a minimum bound to std (in log domain) to avoid mode collapse (one path using log_var one not)
- use log_prob loss (not mseloss or BCELoss)
- use pytorch attention (which has dropout) instead of custom attention
- use_deterministic option
- use batchnorm and dropout on channel dimensions
- check and skip nonfinite values because for extreme inputs we can still get nan's
## Replicating tensorflow ANP behaviour
## Replicating DeepMind's tensorflow ANP behaviour
I put some work into replicating the behaviour shown in the [original deepmind tensorflow notebook](https://github.com/deepmind/neural-processes/blob/master/attentive_neural_process.ipynb).
@@ -99,6 +98,43 @@ And a ANP-RNN
It's just a qualitative comparison but we see the same kind of overfitting with uncertainty being tight where lots of data points exist, and wide where they do not. However this repo seems to miss points occasionally.
## Usage
- clone this repository
- see requirements.txt for requirements and version
- Start and run the notebook [smartmeters.ipynb](smartmeters-ANP-RNN.ipynb)
- To see a toy 1d regression problem, look at [anp-rnn_1d_regression.ipynb](anp-rnn_1d_regression.ipynb)
## Smartmeter Data
- Some data is included, you can get more from https://www.kaggle.com/jeanmidev/smart-meters-in-london/version/11
- Inputs are:
- Weather
- Time features: time of day, day of week, month of year, etc
- Bank holidays
- Position in sequence: days since start of window
- Target is: mean power usage on block 0
## Code
This is based on the code listed in the next section, with some changes. The most notable ones add stability, others are to make sure it can handle predicting into the future:
Changes for a sequential/predictive use case:
- target points are always in the future, context is in the past
- context and targets are still sampled randomly during training
Changes for stability:
- in eval mode, take mean of latent space, and mean of output isntead of sampling
- use log_variance where possible (there is a flag to try without this, and it seems to help)
- and add a minimum bound to std (in log domain) to avoid mode collapse (one path using log_var one not)
- use log_prob loss (not mseloss or BCELoss)
- use pytorch attention (which has dropout and is faster) instead of custom attention
- use_deterministic option, although it seems to do better with this off
- use batchnorm and dropout on channel dimensions
- check and skip nonfinite values because for extreme inputs we can still get nan's. Also gradient clipping
- use pytorch lightning for early stopping, hyperparam opt, and reduce learning rate on plateau
## See also:
A list of projects I used as reference or modified to make this one:
+923 -1
View File
@@ -28,7 +28,7 @@
"source": [
"Results on *Smartmeter* prediction\n",
"\n",
"|Model|test_loss|\n",
"|Model|val_loss|\n",
"|--|--| \n",
"|ANP-RNN|-1.27|\n",
"|ANP-RNN_imp|-1.38|\n",
@@ -2180,6 +2180,928 @@
"text": [
"step 13169, {'val_loss': '-1.0961132049560547', 'val/kl': '0.0003247860004194081', 'val/mse': '0.006203540600836277', 'val/std': '0.05735539272427559'}\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"step 15364, {'val_loss': '-0.8419629335403442', 'val/kl': '0.00034610298462212086', 'val/mse': '0.005819730460643768', 'val/std': '0.05368044599890709'}\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"step 17559, {'val_loss': '-1.3719918727874756', 'val/kl': '0.00039435975486412644', 'val/mse': '0.0042431410402059555', 'val/std': '0.059302836656570435'}\n",
"Epoch 7: reducing learning rate of group 0 to 5.8704e-04.\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"step 19754, {'val_loss': '-1.5102424621582031', 'val/kl': '0.00029231738881208', 'val/mse': '0.0033912782091647387', 'val/std': '0.046705130487680435'}\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"step 21949, {'val_loss': '-1.423159122467041', 'val/kl': '0.00035497310454957187', 'val/mse': '0.0035869088023900986', 'val/std': '0.047349609434604645'}\n",
"\n",
"logger.metrics [{'val_loss': -1.423159122467041, 'val/kl': 0.00035497310454957187, 'val/mse': 0.0035869088023900986, 'val/std': 0.047349609434604645, 'epoch': 9}]\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[I 2020-02-16 07:44:25,994] Finished trial#29 resulted in value: -1.423159122467041. Current best value is -1.5058155059814453 with parameters: {'attention_dropout': 0.2, 'attention_layers': 1, 'batch_size': 16, 'batchnorm': True, 'context_in_target': True, 'det_enc_cross_attn_type': 'ptmultihead', 'det_enc_self_attn_type': 'multihead', 'dropout': 0, 'grad_clip': 40, 'hidden_dim': 32, 'latent_dim': 64, 'latent_enc_self_attn_type': 'ptmultihead', 'learning_rate': 0.008663362578308754, 'max_nb_epochs': 10, 'min_std': 0.005, 'n_decoder_layers': 8, 'n_det_encoder_layers': 2, 'n_latent_encoder_layers': 2, 'num_context': 96, 'num_extra_target': 96, 'num_heads': 8, 'num_workers': 3, 'use_deterministic_path': False, 'use_lvar': True, 'use_rnn': False, 'use_self_attn': False, 'vis_i': 670, 'x_dim': 17, 'y_dim': 1}.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"trial 30 params {'attention_dropout': 0.2, 'attention_layers': 2, 'batch_size': 16, 'batchnorm': True, 'context_in_target': True, 'det_enc_cross_attn_type': 'ptmultihead', 'det_enc_self_attn_type': 'multihead', 'dropout': 0, 'grad_clip': 40, 'hidden_dim': 32, 'latent_dim': 64, 'latent_enc_self_attn_type': 'ptmultihead', 'learning_rate': 0.0022419054847050398, 'max_nb_epochs': 10, 'min_std': 0.005, 'n_decoder_layers': 8, 'n_det_encoder_layers': 2, 'n_latent_encoder_layers': 2, 'num_context': 96, 'num_extra_target': 96, 'num_heads': 8, 'num_workers': 3, 'use_deterministic_path': False, 'use_lvar': False, 'use_rnn': False, 'use_self_attn': False, 'vis_i': 670, 'x_dim': 17, 'y_dim': 1}\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:root:gpu available: True, used: True\n",
"INFO:root:VISIBLE GPUS: 0\n",
"INFO:root:\n",
" Name Type Params\n",
"0 model LatentModel 94 K\n",
"1 model._latent_encoder LatentEncoder 8 K\n",
"2 model._latent_encoder._input_layer Linear 608 \n",
"3 model._latent_encoder._encoder ModuleList 2 K\n",
"4 model._latent_encoder._encoder.0 NPBlockRelu2d 1 K\n",
".. ... ... ...\n",
"87 model._decoder._decoder.7.act ReLU 0 \n",
"88 model._decoder._decoder.7.dropout Dropout2d 0 \n",
"89 model._decoder._decoder.7.norm BatchNorm2d 192 \n",
"90 model._decoder._mean Linear 97 \n",
"91 model._decoder._std Linear 97 \n",
"\n",
"[92 rows x 3 columns]\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Validation sanity check', layout=Layout(flex='2'), max=5.…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"step 0, {'val_loss': '0.7910884022712708', 'val/kl': '0.00010800049494719133', 'val/mse': '0.2537727653980255', 'val/std': '0.654058575630188'}\n",
"\r"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "8d7f1df292af4d14ba291db5e80b3bec",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=1.0, bar_style='info', layout=Layout(flex='2'), max=1.0), HTML(value='')), …"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"step 2194, {'val_loss': '-1.1204354763031006', 'val/kl': '0.0014955311780795455', 'val/mse': '0.00617215083912015', 'val/std': '0.09348662197589874'}\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"step 4389, {'val_loss': '-1.3880571126937866', 'val/kl': '0.0010640228865668178', 'val/mse': '0.004546670243144035', 'val/std': '0.06830815225839615'}\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"step 6584, {'val_loss': '-1.3446733951568604', 'val/kl': '0.0006483710603788495', 'val/mse': '0.005048147868365049', 'val/std': '0.06271157413721085'}\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"step 8779, {'val_loss': '-1.5619605779647827', 'val/kl': '0.0007974352920427918', 'val/mse': '0.003406693460419774', 'val/std': '0.05695287138223648'}\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"step 10974, {'val_loss': '-1.567822813987732', 'val/kl': '0.0007052362198010087', 'val/mse': '0.0034684455022215843', 'val/std': '0.0511687695980072'}\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"step 13169, {'val_loss': '-1.469671368598938', 'val/kl': '0.0006657325429841876', 'val/mse': '0.0036809651646763086', 'val/std': '0.04665883630514145'}\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"step 15364, {'val_loss': '-1.4495060443878174', 'val/kl': '0.0009186150855384767', 'val/mse': '0.003881386946886778', 'val/std': '0.04494263231754303'}\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"step 17559, {'val_loss': '-1.4232021570205688', 'val/kl': '0.0007268059998750687', 'val/mse': '0.0037214024923741817', 'val/std': '0.04474468529224396'}\n",
"Epoch 7: reducing learning rate of group 0 to 2.2419e-04.\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"step 19754, {'val_loss': '-1.4388623237609863', 'val/kl': '0.0007217807578854263', 'val/mse': '0.003411452053114772', 'val/std': '0.03921413794159889'}\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"step 21949, {'val_loss': '-1.4176470041275024', 'val/kl': '0.0006476619746536016', 'val/mse': '0.003419178072363138', 'val/std': '0.03866790235042572'}\n",
"\n",
"logger.metrics [{'val_loss': -1.4176470041275024, 'val/kl': 0.0006476619746536016, 'val/mse': 0.003419178072363138, 'val/std': 0.03866790235042572, 'epoch': 9}]\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[I 2020-02-16 08:18:44,472] Finished trial#30 resulted in value: -1.4176470041275024. Current best value is -1.5058155059814453 with parameters: {'attention_dropout': 0.2, 'attention_layers': 1, 'batch_size': 16, 'batchnorm': True, 'context_in_target': True, 'det_enc_cross_attn_type': 'ptmultihead', 'det_enc_self_attn_type': 'multihead', 'dropout': 0, 'grad_clip': 40, 'hidden_dim': 32, 'latent_dim': 64, 'latent_enc_self_attn_type': 'ptmultihead', 'learning_rate': 0.008663362578308754, 'max_nb_epochs': 10, 'min_std': 0.005, 'n_decoder_layers': 8, 'n_det_encoder_layers': 2, 'n_latent_encoder_layers': 2, 'num_context': 96, 'num_extra_target': 96, 'num_heads': 8, 'num_workers': 3, 'use_deterministic_path': False, 'use_lvar': True, 'use_rnn': False, 'use_self_attn': False, 'vis_i': 670, 'x_dim': 17, 'y_dim': 1}.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"trial 31 params {'attention_dropout': 0, 'attention_layers': 3, 'batch_size': 16, 'batchnorm': True, 'context_in_target': True, 'det_enc_cross_attn_type': 'multihead', 'det_enc_self_attn_type': 'multihead', 'dropout': 0, 'grad_clip': 40, 'hidden_dim': 32, 'latent_dim': 64, 'latent_enc_self_attn_type': 'ptmultihead', 'learning_rate': 0.0065892056326513826, 'max_nb_epochs': 10, 'min_std': 0.005, 'n_decoder_layers': 8, 'n_det_encoder_layers': 2, 'n_latent_encoder_layers': 2, 'num_context': 96, 'num_extra_target': 96, 'num_heads': 8, 'num_workers': 3, 'use_deterministic_path': False, 'use_lvar': True, 'use_rnn': False, 'use_self_attn': False, 'vis_i': 670, 'x_dim': 17, 'y_dim': 1}\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:root:gpu available: True, used: True\n",
"INFO:root:VISIBLE GPUS: 0\n",
"INFO:root:\n",
" Name Type Params\n",
"0 model LatentModel 124 K\n",
"1 model._latent_encoder LatentEncoder 8 K\n",
"2 model._latent_encoder._input_layer Linear 608 \n",
"3 model._latent_encoder._encoder ModuleList 2 K\n",
"4 model._latent_encoder._encoder.0 NPBlockRelu2d 1 K\n",
".. ... ... ...\n",
"146 model._decoder._decoder.7.act ReLU 0 \n",
"147 model._decoder._decoder.7.dropout Dropout2d 0 \n",
"148 model._decoder._decoder.7.norm BatchNorm2d 192 \n",
"149 model._decoder._mean Linear 97 \n",
"150 model._decoder._std Linear 97 \n",
"\n",
"[151 rows x 3 columns]\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Validation sanity check', layout=Layout(flex='2'), max=5.…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"step 0, {'val_loss': '1.5510637760162354', 'val/kl': '0.504607617855072', 'val/mse': '0.33672571182250977', 'val/std': '0.9382523894309998'}\n",
"\r"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "e68b6e1b39af43209e02eec071204c46",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=1.0, bar_style='info', layout=Layout(flex='2'), max=1.0), HTML(value='')), …"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"step 2194, {'val_loss': '-0.9993453621864319', 'val/kl': '0.00492581631988287', 'val/mse': '0.0076696309261024', 'val/std': '0.10437598824501038'}\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"step 4389, {'val_loss': '-1.3951125144958496', 'val/kl': '0.0013331789523363113', 'val/mse': '0.004176048096269369', 'val/std': '0.05822036787867546'}\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"step 6584, {'val_loss': '-1.4472817182540894', 'val/kl': '0.001394702005200088', 'val/mse': '0.0040388815104961395', 'val/std': '0.05575721710920334'}\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"step 8779, {'val_loss': '-1.4240511655807495', 'val/kl': '0.0011677203001454473', 'val/mse': '0.003808689536526799', 'val/std': '0.05027220398187637'}\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"step 10974, {'val_loss': '-1.51090669631958', 'val/kl': '0.001719470601528883', 'val/mse': '0.0037067916709929705', 'val/std': '0.05691952630877495'}\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"step 13169, {'val_loss': '-1.556814193725586', 'val/kl': '0.0011921778786927462', 'val/mse': '0.003143388545140624', 'val/std': '0.04716252535581589'}\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"step 15364, {'val_loss': '-1.5254632234573364', 'val/kl': '0.0011767667019739747', 'val/mse': '0.003485812107101083', 'val/std': '0.048899002373218536'}\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"step 17559, {'val_loss': '-1.4824448823928833', 'val/kl': '0.0012150286929681897', 'val/mse': '0.0035667528863996267', 'val/std': '0.0452319011092186'}\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"step 19754, {'val_loss': '-1.4969969987869263', 'val/kl': '0.0013163189869374037', 'val/mse': '0.00328625226393342', 'val/std': '0.04428704082965851'}\n",
"Epoch 8: reducing learning rate of group 0 to 6.5892e-04.\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"step 21949, {'val_loss': '-1.4131932258605957', 'val/kl': '0.0012195135932415724', 'val/mse': '0.0033923806622624397', 'val/std': '0.0400848314166069'}\n",
"\n",
"logger.metrics [{'val_loss': -1.4131932258605957, 'val/kl': 0.0012195135932415724, 'val/mse': 0.0033923806622624397, 'val/std': 0.0400848314166069, 'epoch': 9}]\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[I 2020-02-16 08:47:42,528] Finished trial#31 resulted in value: -1.4131932258605957. Current best value is -1.5058155059814453 with parameters: {'attention_dropout': 0.2, 'attention_layers': 1, 'batch_size': 16, 'batchnorm': True, 'context_in_target': True, 'det_enc_cross_attn_type': 'ptmultihead', 'det_enc_self_attn_type': 'multihead', 'dropout': 0, 'grad_clip': 40, 'hidden_dim': 32, 'latent_dim': 64, 'latent_enc_self_attn_type': 'ptmultihead', 'learning_rate': 0.008663362578308754, 'max_nb_epochs': 10, 'min_std': 0.005, 'n_decoder_layers': 8, 'n_det_encoder_layers': 2, 'n_latent_encoder_layers': 2, 'num_context': 96, 'num_extra_target': 96, 'num_heads': 8, 'num_workers': 3, 'use_deterministic_path': False, 'use_lvar': True, 'use_rnn': False, 'use_self_attn': False, 'vis_i': 670, 'x_dim': 17, 'y_dim': 1}.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"trial 32 params {'attention_dropout': 0.2, 'attention_layers': 2, 'batch_size': 16, 'batchnorm': True, 'context_in_target': True, 'det_enc_cross_attn_type': 'ptmultihead', 'det_enc_self_attn_type': 'multihead', 'dropout': 0, 'grad_clip': 40, 'hidden_dim': 32, 'latent_dim': 64, 'latent_enc_self_attn_type': 'multihead', 'learning_rate': 0.004388551085821375, 'max_nb_epochs': 10, 'min_std': 0.005, 'n_decoder_layers': 8, 'n_det_encoder_layers': 4, 'n_latent_encoder_layers': 8, 'num_context': 96, 'num_extra_target': 96, 'num_heads': 8, 'num_workers': 3, 'use_deterministic_path': False, 'use_lvar': True, 'use_rnn': False, 'use_self_attn': False, 'vis_i': 670, 'x_dim': 17, 'y_dim': 1}\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:root:gpu available: True, used: True\n",
"INFO:root:VISIBLE GPUS: 0\n",
"INFO:root:\n",
" Name Type Params\n",
"0 model LatentModel 102 K\n",
"1 model._latent_encoder LatentEncoder 14 K\n",
"2 model._latent_encoder._input_layer Linear 608 \n",
"3 model._latent_encoder._encoder ModuleList 8 K\n",
"4 model._latent_encoder._encoder.0 NPBlockRelu2d 1 K\n",
".. ... ... ...\n",
"127 model._decoder._decoder.7.act ReLU 0 \n",
"128 model._decoder._decoder.7.dropout Dropout2d 0 \n",
"129 model._decoder._decoder.7.norm BatchNorm2d 192 \n",
"130 model._decoder._mean Linear 97 \n",
"131 model._decoder._std Linear 97 \n",
"\n",
"[132 rows x 3 columns]\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Validation sanity check', layout=Layout(flex='2'), max=5.…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"step 0, {'val_loss': '1.6433210372924805', 'val/kl': '0.502083420753479', 'val/mse': '0.3020164668560028', 'val/std': '1.1032137870788574'}\n",
"\r"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "5e06748306f340e3809faaac521f4844",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=1.0, bar_style='info', layout=Layout(flex='2'), max=1.0), HTML(value='')), …"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"step 2194, {'val_loss': '-0.5562781095504761', 'val/kl': '0.004524527583271265', 'val/mse': '0.01594124175608158', 'val/std': '0.20407840609550476'}\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"step 4389, {'val_loss': '-1.2800618410110474', 'val/kl': '0.001454153680242598', 'val/mse': '0.004913512151688337', 'val/std': '0.07995617389678955'}\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"step 6584, {'val_loss': '-1.4909727573394775', 'val/kl': '0.0005825856351293623', 'val/mse': '0.0038258214481174946', 'val/std': '0.05889587849378586'}\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"step 8779, {'val_loss': '-1.4236963987350464', 'val/kl': '0.0012530366657301784', 'val/mse': '0.0038965647108852863', 'val/std': '0.051243484020233154'}\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"step 10974, {'val_loss': '-1.4802192449569702', 'val/kl': '0.0008309829281643033', 'val/mse': '0.0034029236994683743', 'val/std': '0.05015803128480911'}\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"step 13169, {'val_loss': '1679.2408447265625', 'val/kl': '1679.53564453125', 'val/mse': '606.5656127929688', 'val/std': '752.9714965820312'}\n",
"Epoch 5: reducing learning rate of group 0 to 4.3886e-04.\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"step 15364, {'val_loss': '-1.515012502670288', 'val/kl': '0.0006384316366165876', 'val/mse': '0.0032132412306964397', 'val/std': '0.04366536810994148'}\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"step 17559, {'val_loss': '23501292.0', 'val/kl': '23501292.0', 'val/mse': '8748267.0', 'val/std': '6837.646484375'}\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"step 19754, {'val_loss': '-1.4798544645309448', 'val/kl': '0.0004828128730878234', 'val/mse': '0.0033622710034251213', 'val/std': '0.04285237193107605'}\n"
]
}
],
"source": [
+21 -11
View File
@@ -33,7 +33,7 @@ class SequenceDfDataSet(torch.utils.data.Dataset):
self.transforms = transforms
def __len__(self):
return len(self.data) - +self.hparams.window_length - self.hparams.target_length
return len(self.data) - self.hparams.window_length - self.hparams.target_length - 1
def iloc(self, idx):
k = idx + self.hparams.window_length + self.hparams.target_length
@@ -41,8 +41,14 @@ class SequenceDfDataSet(torch.utils.data.Dataset):
i = j - self.hparams.window_length
assert i >= 0
assert idx <= len(self.data)
x_rows = self.data.iloc[i:j].copy()
y_rows = self.data.iloc[k].to_frame().T.copy()
# x_rows = x_rows.drop(columns=self.label_names)
# Note the NP models do have access to the previous labels for the context, we will allow the LSTM to do the same. Although it will likely just return an autoregressive solution for the first half...
x_rows.loc[x_rows.index[self.hparams.window_length:], self.label_names] = 0
assert (x_rows.loc[x_rows.index[self.hparams.window_length:], self.label_names]==0).all().all()
y_rows = self.data[self.label_names].iloc[i+1:j+1].copy()
# print(i,j,k)
# add seconds since start of window index
@@ -56,11 +62,11 @@ class SequenceDfDataSet(torch.utils.data.Dataset):
def __getitem__(self, idx):
x_rows, y_rows = self.iloc(idx)
y = y_rows[self.label_names].astype(np.float32).values
x = x_rows.astype(np.float32).values
y = y_rows[self.label_names].astype(np.float32).values
return (
self.transforms(x).squeeze(0).float(),
self.transforms(y[:, None,])[:, 0, 0].float(),
self.transforms(y).squeeze(0).squeeze(-1).float(),
)
@@ -79,15 +85,15 @@ class LSTMNet(nn.Module):
)
self.hidden_out_size = (
self.hparams.hidden_size
* self.hparams.lstm_layers
* (self.hparams.bidirectional + 1)
)
self.linear = nn.Linear(self.hidden_out_size, 1)
def forward(self, x):
outputs, (h_out, _) = self.lstm1(x)
h_out = h_out.permute((1, 0, 2)).reshape((-1, self.hidden_out_size))
return self.linear(h_out)
# outputs: [B, T, num_direction * H]
y = self.linear(outputs).squeeze(2)
return y
class LSTM_PL(pl.LightningModule):
@@ -122,9 +128,13 @@ class LSTM_PL(pl.LightningModule):
def validation_end(self, outputs):
# TODO send an image to tensroboard, like in the lighting_anp.py file
if self.hparams["vis_i"] > 0:
if int(self.hparams["vis_i"]) > 0:
loader = self.val_dataloader()[0]
vis_i = min(self.hparams["vis_i"], len(loader.dataset))
vis_i = min(int(self.hparams["vis_i"]), len(loader.dataset))
if isinstance(self.hparams["vis_i"], str):
image = plot_from_loader(loader, self, vis_i=vis_i)
plt.show()
else:
image = plot_from_loader_to_tensor(loader, self, vis_i=vis_i)
self.logger.experiment.add_image(
"val/image", image, self.trainer.global_step
@@ -227,7 +237,7 @@ class LSTM_PL(pl.LightningModule):
return parser
def plot_from_loader(loader, model, vis_i=670, n=100):
def plot_from_loader(loader, model, vis_i=670, n=1):
dset_test = loader.dataset
label_names = dset_test.label_names
y_trues = []
@@ -241,7 +251,7 @@ def plot_from_loader(loader, model, vis_i=670, n=100):
model.eval()
with torch.no_grad():
y_hat = model.forward(x)
y_hat = y_hat.cpu().numpy()
y_hat = y_hat.cpu().squeeze(0).numpy()
dt = y_rows.iloc[0].name
+1 -1
View File
@@ -17,4 +17,4 @@ class ObjectDict(dict):
@property
def __dict__(self):
return self
return dict(self)