results

2026-06-27 16:44:27 +08:00 · 2020-02-16 09:40:32 +08:00
parent 0c4e2f0497
commit e9355317ef
18 changed files with 1532 additions and 76 deletions
@@ -1,29 +1,58 @@
-# Using recurrent attentive neural processes for forecasting power usage
+# Neural Processes for sequential data

-This repo implements ["Recurrent Attentive Neural Process for Sequential Data"](https://arxiv.org/abs/1910.09323) (ANP-RNN) and tests them on real data.
+This repo implements ["Recurrent Attentive Neural Process for Sequential Data"](https://arxiv.org/abs/1910.09323) (ANP-RNN) on a toy regression problem. And also tests it on real smart meter data.
+
+![](docs/anp-rnn_4.png)
+
+
+- [Neural Processes for sequential data](#neural-processes-for-sequential-data)
+  - [Models](#models)
+  - [Results](#results)
+  - [Example outputs](#example-outputs)
+    - [Example NP](#example-np)
+    - [Example ANP outputs (sequential)](#example-anp-outputs-sequential)
+    - [Example ANP-RNN outputs](#example-anp-rnn-outputs)
+  - [Replicating DeepMind's tensorflow ANP behaviour](#replicating-deepminds-tensorflow-anp-behaviour)
+  - [Usage](#usage)
+  - [Smartmeter Data](#smartmeter-data)
+  - [Code](#code)
+  - [See also:](#see-also)
+
+## Models
+
+- ANP-RNN ["Recurrent Attentive Neural Process for Sequential Data"](https://arxiv.org/abs/1910.09323) 
+- ANP: [Attentive Neural Processes](https://arxiv.org/abs/1901.05761)
+- NP: [Neural Processes](https://arxiv.org/abs/1807.01622)
+
+
+This implementation has lots of options so you can run it as a ANP-RNN, or ANP or NP.
+
+I've also made lots of tweaks for flexibility and stability and [replicated the DeepMind ANP results](anp_1d_regression.ipynb) in pytorch. The replication qualitatively seems like a better match than the other pytorch versions of ANP (as of 2019-11-01). You can see other code repositories in the see also section.

 ![](docs/np_lstm.jpeg)

-This implementation has lots of options so you can run it as a [Attentive Neural Process](https://arxiv.org/abs/1901.05761) (ANP), or NP.

-I've always made lots of tweaks for flexibility and stability and [replicated the DeepMind ANP results](anp_1d_regression.ipynb) in pytorch. The replication qualitatively seems like a better match than the other pytorch versions of ANP (as of 2019-11-01).
+## Results

+Results on [*Smartmeter* prediction](./smartmeters-ANP-RNN.ipynb) (lower is better)

+|Model|val_np_loss|val_mse_loss|
+|--|--|--|
+|**ANP-RNN_imp**|**-1.38**|.00423
+|ANP-RNN|-1.27|0.0047|
+|ANP|-1.3|0.0072|
+|NP|-1.3|0.0040|
+|LSTM| | |

-## Usage
+Results on [toy 1d regression](./anp-rnn_1d_regression.ipynb)  (lower is better)

- clone this repository
- see requirements.txt for requirements and version
- Start and run the notebook [smartmeters.ipynb](https://github.com/wassname/attentive-neural-processes/blob/master/smartmeters.ipynb)
-
-## Data
- Some data is included, you can get more from https://www.kaggle.com/jeanmidev/smart-meters-in-london/version/11
- Inputs are: 
-  - Weather
-  - Time features: time of day, day of week, month of year, etc
-  - Bank holidays
-  - Position in sequence: days since start of window
- Target is: mean power usage on block 0
+|model|val_loss|
+|-----|---------|
+| **ANP-RNN(impr)**| **-1.3217**|
+| ANP-RNN| -0.62|
+| ANP| -0.4228|
+| ANP(impr)| -0.3182|
+| NP|  -1.2687 |

 ## Example outputs

@@ -32,58 +61,28 @@ Here the black dots are input data, the dotted line is the true data. The blue l
 I chose a difficult example below, it's a window in the test set that deviates from the previous pattern. Given 3 days inputs, it must predict the next day, and the next day has higher power usage than previously. The trained model manages to predict it based on the inputs.


-### Example ANP-RNN outputs
+### Example NP

-![](docs/anp-rnn_2.png)
+Here we see underfitting, since the curve doesn't match the data

-![](docs/anp-rnn_3.png)
+![](docs/np_4.png)

-![](docs/anp-rnn_4.png)

 ### Example ANP outputs (sequential)

-![](docs/1.png)
+Here we see overfitting, but the uncertainty seems to small, and the fit could be improved

-![](docs/4.png)
+![](docs/anp_4.png)

-![](docs/7.png)
+### Example ANP-RNN outputs

-![](docs/12.png)****
+This has a better calibrated uncertainty and a better fit

-![](docs/19.png)
-
-### Example LSTM Baseline outputs
-
-Compare this to a quick LSTM baseline below, which didn't predict this divergance from the pattern. (Bear in mind that I didn't tweak this model as much). The uncertainty and prediction are also less smooth and the log probability is lower.
-
-An LSTM with an encoder style similar to ANP's:
-
-![](docs/lstm_with_context.png)
-
-and a normal LSTM:
-
-![](docs/lstm_baseline.png)
-
-## Code
-
-This is based on the code listed in the next section, with some changes. The most notable ones add stability, others are to make sure it can handle predicting into the future:
-
-Changes for a predictive use case:
- target points are always in the future, context is in the past
- context and targets are still sampled randomly during training
+![](docs/anp-rnn_4.png)


-Changes for stability:
- in eval mode, take mean of latent space, and mean of output isntead of sampling
- use log_variance where possible (there is a flag to try without this, and it seems to help)
-  - and add a minimum bound to std (in log domain) to avoid mode collapse (one path using log_var one not)
- use log_prob loss (not mseloss or BCELoss)
- use pytorch attention (which has dropout) instead of custom attention
- use_deterministic option
- use batchnorm and dropout on channel dimensions
- check and skip nonfinite values because for extreme inputs we can still get nan's

-## Replicating tensorflow ANP behaviour
+## Replicating DeepMind's tensorflow ANP behaviour

 I put some work into replicating the behaviour shown in the [original deepmind tensorflow notebook](https://github.com/deepmind/neural-processes/blob/master/attentive_neural_process.ipynb).

@@ -99,6 +98,43 @@ And a ANP-RNN
 It's just a qualitative comparison but we see the same kind of overfitting with uncertainty being tight where lots of data points exist, and wide where they do not. However this repo seems to miss points occasionally.


+
+## Usage
+
+- clone this repository
+- see requirements.txt for requirements and version
+- Start and run the notebook [smartmeters.ipynb](smartmeters-ANP-RNN.ipynb)
+- To see a toy 1d regression problem, look at [anp-rnn_1d_regression.ipynb](anp-rnn_1d_regression.ipynb)
+
+## Smartmeter Data
+- Some data is included, you can get more from https://www.kaggle.com/jeanmidev/smart-meters-in-london/version/11
+- Inputs are: 
+  - Weather
+  - Time features: time of day, day of week, month of year, etc
+  - Bank holidays
+  - Position in sequence: days since start of window
+- Target is: mean power usage on block 0
+
+
+## Code
+
+This is based on the code listed in the next section, with some changes. The most notable ones add stability, others are to make sure it can handle predicting into the future:
+
+Changes for a sequential/predictive use case:
+- target points are always in the future, context is in the past
+- context and targets are still sampled randomly during training
+
+Changes for stability:
+- in eval mode, take mean of latent space, and mean of output isntead of sampling
+- use log_variance where possible (there is a flag to try without this, and it seems to help)
+  - and add a minimum bound to std (in log domain) to avoid mode collapse (one path using log_var one not)
+- use log_prob loss (not mseloss or BCELoss)
+- use pytorch attention (which has dropout and is faster) instead of custom attention
+- use_deterministic option, although it seems to do better with this off
+- use batchnorm and dropout on channel dimensions
+- check and skip nonfinite values because for extreme inputs we can still get nan's. Also gradient clipping
+- use pytorch lightning for early stopping, hyperparam opt, and reduce learning rate on plateau
+
 ## See also:

 A list of projects I used as reference or modified to make this one:
@@ -28,7 +28,7 @@
   "source": [
    "Results on *Smartmeter* prediction\n",
    "\n",
-    "|Model|test_loss|\n",
+    "|Model|val_loss|\n",
    "|--|--| \n",
    "|ANP-RNN|-1.27|\n",
    "|ANP-RNN_imp|-1.38|\n",
@@ -2180,6 +2180,928 @@
     "text": [
      "step 13169, {'val_loss': '-1.0961132049560547', 'val/kl': '0.0003247860004194081', 'val/mse': '0.006203540600836277', 'val/std': '0.05735539272427559'}\n"
     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "step 15364, {'val_loss': '-0.8419629335403442', 'val/kl': '0.00034610298462212086', 'val/mse': '0.005819730460643768', 'val/std': '0.05368044599890709'}\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "step 17559, {'val_loss': '-1.3719918727874756', 'val/kl': '0.00039435975486412644', 'val/mse': '0.0042431410402059555', 'val/std': '0.059302836656570435'}\n",
+      "Epoch     7: reducing learning rate of group 0 to 5.8704e-04.\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "step 19754, {'val_loss': '-1.5102424621582031', 'val/kl': '0.00029231738881208', 'val/mse': '0.0033912782091647387', 'val/std': '0.046705130487680435'}\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "step 21949, {'val_loss': '-1.423159122467041', 'val/kl': '0.00035497310454957187', 'val/mse': '0.0035869088023900986', 'val/std': '0.047349609434604645'}\n",
+      "\n",
+      "logger.metrics [{'val_loss': -1.423159122467041, 'val/kl': 0.00035497310454957187, 'val/mse': 0.0035869088023900986, 'val/std': 0.047349609434604645, 'epoch': 9}]\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "[I 2020-02-16 07:44:25,994] Finished trial#29 resulted in value: -1.423159122467041. Current best value is -1.5058155059814453 with parameters: {'attention_dropout': 0.2, 'attention_layers': 1, 'batch_size': 16, 'batchnorm': True, 'context_in_target': True, 'det_enc_cross_attn_type': 'ptmultihead', 'det_enc_self_attn_type': 'multihead', 'dropout': 0, 'grad_clip': 40, 'hidden_dim': 32, 'latent_dim': 64, 'latent_enc_self_attn_type': 'ptmultihead', 'learning_rate': 0.008663362578308754, 'max_nb_epochs': 10, 'min_std': 0.005, 'n_decoder_layers': 8, 'n_det_encoder_layers': 2, 'n_latent_encoder_layers': 2, 'num_context': 96, 'num_extra_target': 96, 'num_heads': 8, 'num_workers': 3, 'use_deterministic_path': False, 'use_lvar': True, 'use_rnn': False, 'use_self_attn': False, 'vis_i': 670, 'x_dim': 17, 'y_dim': 1}.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "trial 30 params {'attention_dropout': 0.2, 'attention_layers': 2, 'batch_size': 16, 'batchnorm': True, 'context_in_target': True, 'det_enc_cross_attn_type': 'ptmultihead', 'det_enc_self_attn_type': 'multihead', 'dropout': 0, 'grad_clip': 40, 'hidden_dim': 32, 'latent_dim': 64, 'latent_enc_self_attn_type': 'ptmultihead', 'learning_rate': 0.0022419054847050398, 'max_nb_epochs': 10, 'min_std': 0.005, 'n_decoder_layers': 8, 'n_det_encoder_layers': 2, 'n_latent_encoder_layers': 2, 'num_context': 96, 'num_extra_target': 96, 'num_heads': 8, 'num_workers': 3, 'use_deterministic_path': False, 'use_lvar': False, 'use_rnn': False, 'use_self_attn': False, 'vis_i': 670, 'x_dim': 17, 'y_dim': 1}\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:root:gpu available: True, used: True\n",
+      "INFO:root:VISIBLE GPUS: 0\n",
+      "INFO:root:\n",
+      "                                  Name           Type Params\n",
+      "0                                model    LatentModel   94 K\n",
+      "1                model._latent_encoder  LatentEncoder    8 K\n",
+      "2   model._latent_encoder._input_layer         Linear  608  \n",
+      "3       model._latent_encoder._encoder     ModuleList    2 K\n",
+      "4     model._latent_encoder._encoder.0  NPBlockRelu2d    1 K\n",
+      "..                                 ...            ...    ...\n",
+      "87       model._decoder._decoder.7.act           ReLU    0  \n",
+      "88   model._decoder._decoder.7.dropout      Dropout2d    0  \n",
+      "89      model._decoder._decoder.7.norm    BatchNorm2d  192  \n",
+      "90                model._decoder._mean         Linear   97  \n",
+      "91                 model._decoder._std         Linear   97  \n",
+      "\n",
+      "[92 rows x 3 columns]\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(FloatProgress(value=0.0, description='Validation sanity check', layout=Layout(flex='2'), max=5.…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "step 0, {'val_loss': '0.7910884022712708', 'val/kl': '0.00010800049494719133', 'val/mse': '0.2537727653980255', 'val/std': '0.654058575630188'}\n",
+      "\r"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "8d7f1df292af4d14ba291db5e80b3bec",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(FloatProgress(value=1.0, bar_style='info', layout=Layout(flex='2'), max=1.0), HTML(value='')), …"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "step 2194, {'val_loss': '-1.1204354763031006', 'val/kl': '0.0014955311780795455', 'val/mse': '0.00617215083912015', 'val/std': '0.09348662197589874'}\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "step 4389, {'val_loss': '-1.3880571126937866', 'val/kl': '0.0010640228865668178', 'val/mse': '0.004546670243144035', 'val/std': '0.06830815225839615'}\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "step 6584, {'val_loss': '-1.3446733951568604', 'val/kl': '0.0006483710603788495', 'val/mse': '0.005048147868365049', 'val/std': '0.06271157413721085'}\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "step 8779, {'val_loss': '-1.5619605779647827', 'val/kl': '0.0007974352920427918', 'val/mse': '0.003406693460419774', 'val/std': '0.05695287138223648'}\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "step 10974, {'val_loss': '-1.567822813987732', 'val/kl': '0.0007052362198010087', 'val/mse': '0.0034684455022215843', 'val/std': '0.0511687695980072'}\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "step 13169, {'val_loss': '-1.469671368598938', 'val/kl': '0.0006657325429841876', 'val/mse': '0.0036809651646763086', 'val/std': '0.04665883630514145'}\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "step 15364, {'val_loss': '-1.4495060443878174', 'val/kl': '0.0009186150855384767', 'val/mse': '0.003881386946886778', 'val/std': '0.04494263231754303'}\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "step 17559, {'val_loss': '-1.4232021570205688', 'val/kl': '0.0007268059998750687', 'val/mse': '0.0037214024923741817', 'val/std': '0.04474468529224396'}\n",
+      "Epoch     7: reducing learning rate of group 0 to 2.2419e-04.\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "step 19754, {'val_loss': '-1.4388623237609863', 'val/kl': '0.0007217807578854263', 'val/mse': '0.003411452053114772', 'val/std': '0.03921413794159889'}\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "step 21949, {'val_loss': '-1.4176470041275024', 'val/kl': '0.0006476619746536016', 'val/mse': '0.003419178072363138', 'val/std': '0.03866790235042572'}\n",
+      "\n",
+      "logger.metrics [{'val_loss': -1.4176470041275024, 'val/kl': 0.0006476619746536016, 'val/mse': 0.003419178072363138, 'val/std': 0.03866790235042572, 'epoch': 9}]\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "[I 2020-02-16 08:18:44,472] Finished trial#30 resulted in value: -1.4176470041275024. Current best value is -1.5058155059814453 with parameters: {'attention_dropout': 0.2, 'attention_layers': 1, 'batch_size': 16, 'batchnorm': True, 'context_in_target': True, 'det_enc_cross_attn_type': 'ptmultihead', 'det_enc_self_attn_type': 'multihead', 'dropout': 0, 'grad_clip': 40, 'hidden_dim': 32, 'latent_dim': 64, 'latent_enc_self_attn_type': 'ptmultihead', 'learning_rate': 0.008663362578308754, 'max_nb_epochs': 10, 'min_std': 0.005, 'n_decoder_layers': 8, 'n_det_encoder_layers': 2, 'n_latent_encoder_layers': 2, 'num_context': 96, 'num_extra_target': 96, 'num_heads': 8, 'num_workers': 3, 'use_deterministic_path': False, 'use_lvar': True, 'use_rnn': False, 'use_self_attn': False, 'vis_i': 670, 'x_dim': 17, 'y_dim': 1}.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "trial 31 params {'attention_dropout': 0, 'attention_layers': 3, 'batch_size': 16, 'batchnorm': True, 'context_in_target': True, 'det_enc_cross_attn_type': 'multihead', 'det_enc_self_attn_type': 'multihead', 'dropout': 0, 'grad_clip': 40, 'hidden_dim': 32, 'latent_dim': 64, 'latent_enc_self_attn_type': 'ptmultihead', 'learning_rate': 0.0065892056326513826, 'max_nb_epochs': 10, 'min_std': 0.005, 'n_decoder_layers': 8, 'n_det_encoder_layers': 2, 'n_latent_encoder_layers': 2, 'num_context': 96, 'num_extra_target': 96, 'num_heads': 8, 'num_workers': 3, 'use_deterministic_path': False, 'use_lvar': True, 'use_rnn': False, 'use_self_attn': False, 'vis_i': 670, 'x_dim': 17, 'y_dim': 1}\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:root:gpu available: True, used: True\n",
+      "INFO:root:VISIBLE GPUS: 0\n",
+      "INFO:root:\n",
+      "                                   Name           Type Params\n",
+      "0                                 model    LatentModel  124 K\n",
+      "1                 model._latent_encoder  LatentEncoder    8 K\n",
+      "2    model._latent_encoder._input_layer         Linear  608  \n",
+      "3        model._latent_encoder._encoder     ModuleList    2 K\n",
+      "4      model._latent_encoder._encoder.0  NPBlockRelu2d    1 K\n",
+      "..                                  ...            ...    ...\n",
+      "146       model._decoder._decoder.7.act           ReLU    0  \n",
+      "147   model._decoder._decoder.7.dropout      Dropout2d    0  \n",
+      "148      model._decoder._decoder.7.norm    BatchNorm2d  192  \n",
+      "149                model._decoder._mean         Linear   97  \n",
+      "150                 model._decoder._std         Linear   97  \n",
+      "\n",
+      "[151 rows x 3 columns]\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(FloatProgress(value=0.0, description='Validation sanity check', layout=Layout(flex='2'), max=5.…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "step 0, {'val_loss': '1.5510637760162354', 'val/kl': '0.504607617855072', 'val/mse': '0.33672571182250977', 'val/std': '0.9382523894309998'}\n",
+      "\r"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "e68b6e1b39af43209e02eec071204c46",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(FloatProgress(value=1.0, bar_style='info', layout=Layout(flex='2'), max=1.0), HTML(value='')), …"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "step 2194, {'val_loss': '-0.9993453621864319', 'val/kl': '0.00492581631988287', 'val/mse': '0.0076696309261024', 'val/std': '0.10437598824501038'}\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "step 4389, {'val_loss': '-1.3951125144958496', 'val/kl': '0.0013331789523363113', 'val/mse': '0.004176048096269369', 'val/std': '0.05822036787867546'}\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "step 6584, {'val_loss': '-1.4472817182540894', 'val/kl': '0.001394702005200088', 'val/mse': '0.0040388815104961395', 'val/std': '0.05575721710920334'}\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "step 8779, {'val_loss': '-1.4240511655807495', 'val/kl': '0.0011677203001454473', 'val/mse': '0.003808689536526799', 'val/std': '0.05027220398187637'}\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "step 10974, {'val_loss': '-1.51090669631958', 'val/kl': '0.001719470601528883', 'val/mse': '0.0037067916709929705', 'val/std': '0.05691952630877495'}\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "step 13169, {'val_loss': '-1.556814193725586', 'val/kl': '0.0011921778786927462', 'val/mse': '0.003143388545140624', 'val/std': '0.04716252535581589'}\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "step 15364, {'val_loss': '-1.5254632234573364', 'val/kl': '0.0011767667019739747', 'val/mse': '0.003485812107101083', 'val/std': '0.048899002373218536'}\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "step 17559, {'val_loss': '-1.4824448823928833', 'val/kl': '0.0012150286929681897', 'val/mse': '0.0035667528863996267', 'val/std': '0.0452319011092186'}\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "step 19754, {'val_loss': '-1.4969969987869263', 'val/kl': '0.0013163189869374037', 'val/mse': '0.00328625226393342', 'val/std': '0.04428704082965851'}\n",
+      "Epoch     8: reducing learning rate of group 0 to 6.5892e-04.\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "step 21949, {'val_loss': '-1.4131932258605957', 'val/kl': '0.0012195135932415724', 'val/mse': '0.0033923806622624397', 'val/std': '0.0400848314166069'}\n",
+      "\n",
+      "logger.metrics [{'val_loss': -1.4131932258605957, 'val/kl': 0.0012195135932415724, 'val/mse': 0.0033923806622624397, 'val/std': 0.0400848314166069, 'epoch': 9}]\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "[I 2020-02-16 08:47:42,528] Finished trial#31 resulted in value: -1.4131932258605957. Current best value is -1.5058155059814453 with parameters: {'attention_dropout': 0.2, 'attention_layers': 1, 'batch_size': 16, 'batchnorm': True, 'context_in_target': True, 'det_enc_cross_attn_type': 'ptmultihead', 'det_enc_self_attn_type': 'multihead', 'dropout': 0, 'grad_clip': 40, 'hidden_dim': 32, 'latent_dim': 64, 'latent_enc_self_attn_type': 'ptmultihead', 'learning_rate': 0.008663362578308754, 'max_nb_epochs': 10, 'min_std': 0.005, 'n_decoder_layers': 8, 'n_det_encoder_layers': 2, 'n_latent_encoder_layers': 2, 'num_context': 96, 'num_extra_target': 96, 'num_heads': 8, 'num_workers': 3, 'use_deterministic_path': False, 'use_lvar': True, 'use_rnn': False, 'use_self_attn': False, 'vis_i': 670, 'x_dim': 17, 'y_dim': 1}.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "trial 32 params {'attention_dropout': 0.2, 'attention_layers': 2, 'batch_size': 16, 'batchnorm': True, 'context_in_target': True, 'det_enc_cross_attn_type': 'ptmultihead', 'det_enc_self_attn_type': 'multihead', 'dropout': 0, 'grad_clip': 40, 'hidden_dim': 32, 'latent_dim': 64, 'latent_enc_self_attn_type': 'multihead', 'learning_rate': 0.004388551085821375, 'max_nb_epochs': 10, 'min_std': 0.005, 'n_decoder_layers': 8, 'n_det_encoder_layers': 4, 'n_latent_encoder_layers': 8, 'num_context': 96, 'num_extra_target': 96, 'num_heads': 8, 'num_workers': 3, 'use_deterministic_path': False, 'use_lvar': True, 'use_rnn': False, 'use_self_attn': False, 'vis_i': 670, 'x_dim': 17, 'y_dim': 1}\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:root:gpu available: True, used: True\n",
+      "INFO:root:VISIBLE GPUS: 0\n",
+      "INFO:root:\n",
+      "                                   Name           Type Params\n",
+      "0                                 model    LatentModel  102 K\n",
+      "1                 model._latent_encoder  LatentEncoder   14 K\n",
+      "2    model._latent_encoder._input_layer         Linear  608  \n",
+      "3        model._latent_encoder._encoder     ModuleList    8 K\n",
+      "4      model._latent_encoder._encoder.0  NPBlockRelu2d    1 K\n",
+      "..                                  ...            ...    ...\n",
+      "127       model._decoder._decoder.7.act           ReLU    0  \n",
+      "128   model._decoder._decoder.7.dropout      Dropout2d    0  \n",
+      "129      model._decoder._decoder.7.norm    BatchNorm2d  192  \n",
+      "130                model._decoder._mean         Linear   97  \n",
+      "131                 model._decoder._std         Linear   97  \n",
+      "\n",
+      "[132 rows x 3 columns]\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(FloatProgress(value=0.0, description='Validation sanity check', layout=Layout(flex='2'), max=5.…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "step 0, {'val_loss': '1.6433210372924805', 'val/kl': '0.502083420753479', 'val/mse': '0.3020164668560028', 'val/std': '1.1032137870788574'}\n",
+      "\r"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "5e06748306f340e3809faaac521f4844",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(FloatProgress(value=1.0, bar_style='info', layout=Layout(flex='2'), max=1.0), HTML(value='')), …"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "step 2194, {'val_loss': '-0.5562781095504761', 'val/kl': '0.004524527583271265', 'val/mse': '0.01594124175608158', 'val/std': '0.20407840609550476'}\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "step 4389, {'val_loss': '-1.2800618410110474', 'val/kl': '0.001454153680242598', 'val/mse': '0.004913512151688337', 'val/std': '0.07995617389678955'}\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "step 6584, {'val_loss': '-1.4909727573394775', 'val/kl': '0.0005825856351293623', 'val/mse': '0.0038258214481174946', 'val/std': '0.05889587849378586'}\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "step 8779, {'val_loss': '-1.4236963987350464', 'val/kl': '0.0012530366657301784', 'val/mse': '0.0038965647108852863', 'val/std': '0.051243484020233154'}\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "step 10974, {'val_loss': '-1.4802192449569702', 'val/kl': '0.0008309829281643033', 'val/mse': '0.0034029236994683743', 'val/std': '0.05015803128480911'}\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "step 13169, {'val_loss': '1679.2408447265625', 'val/kl': '1679.53564453125', 'val/mse': '606.5656127929688', 'val/std': '752.9714965820312'}\n",
+      "Epoch     5: reducing learning rate of group 0 to 4.3886e-04.\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "step 15364, {'val_loss': '-1.515012502670288', 'val/kl': '0.0006384316366165876', 'val/mse': '0.0032132412306964397', 'val/std': '0.04366536810994148'}\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "step 17559, {'val_loss': '23501292.0', 'val/kl': '23501292.0', 'val/mse': '8748267.0', 'val/std': '6837.646484375'}\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(FloatProgress(value=0.0, description='Validating', layout=Layout(flex='2'), max=117.0, style=Pr…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "step 19754, {'val_loss': '-1.4798544645309448', 'val/kl': '0.0004828128730878234', 'val/mse': '0.0033622710034251213', 'val/std': '0.04285237193107605'}\n"
+     ]
    }
   ],
   "source": [
@@ -33,7 +33,7 @@ class SequenceDfDataSet(torch.utils.data.Dataset):
        self.transforms = transforms

    def __len__(self):
-        return len(self.data) - +self.hparams.window_length - self.hparams.target_length
+        return len(self.data) - self.hparams.window_length - self.hparams.target_length - 1

    def iloc(self, idx):
        k = idx + self.hparams.window_length + self.hparams.target_length
@@ -41,8 +41,14 @@ class SequenceDfDataSet(torch.utils.data.Dataset):
        i = j - self.hparams.window_length
        assert i >= 0
        assert idx <= len(self.data)
+
        x_rows = self.data.iloc[i:j].copy()
-        y_rows = self.data.iloc[k].to_frame().T.copy()
+        # x_rows = x_rows.drop(columns=self.label_names)
+        # Note the NP models do have access to the previous labels for the context, we will allow the LSTM to do the same. Although it will likely just return an autoregressive solution for the first half...
+        x_rows.loc[x_rows.index[self.hparams.window_length:], self.label_names] = 0
+        assert (x_rows.loc[x_rows.index[self.hparams.window_length:], self.label_names]==0).all().all()
+
+        y_rows = self.data[self.label_names].iloc[i+1:j+1].copy()
        #         print(i,j,k)

        # add seconds since start of window index
@@ -56,11 +62,11 @@ class SequenceDfDataSet(torch.utils.data.Dataset):
    def __getitem__(self, idx):
        x_rows, y_rows = self.iloc(idx)

-        y = y_rows[self.label_names].astype(np.float32).values
        x = x_rows.astype(np.float32).values
+        y = y_rows[self.label_names].astype(np.float32).values
        return (
            self.transforms(x).squeeze(0).float(),
-            self.transforms(y[:, None,])[:, 0, 0].float(),
+            self.transforms(y).squeeze(0).squeeze(-1).float(),
        )


@@ -79,15 +85,15 @@ class LSTMNet(nn.Module):
        )
        self.hidden_out_size = (
            self.hparams.hidden_size
-            * self.hparams.lstm_layers
            * (self.hparams.bidirectional + 1)
        )
        self.linear = nn.Linear(self.hidden_out_size, 1)

    def forward(self, x):
        outputs, (h_out, _) = self.lstm1(x)
-        h_out = h_out.permute((1, 0, 2)).reshape((-1, self.hidden_out_size))
-        return self.linear(h_out)
+        # outputs: [B, T, num_direction * H]
+        y = self.linear(outputs).squeeze(2)
+        return y       


 class LSTM_PL(pl.LightningModule):
@@ -122,9 +128,13 @@ class LSTM_PL(pl.LightningModule):

    def validation_end(self, outputs):
        # TODO send an image to tensroboard, like in the lighting_anp.py file
-        if self.hparams["vis_i"] > 0:
+        if int(self.hparams["vis_i"]) > 0:
            loader = self.val_dataloader()[0]
-            vis_i = min(self.hparams["vis_i"], len(loader.dataset))
+            vis_i = min(int(self.hparams["vis_i"]), len(loader.dataset))
+        if isinstance(self.hparams["vis_i"], str):
+            image = plot_from_loader(loader, self, vis_i=vis_i)
+            plt.show()
+        else:
            image = plot_from_loader_to_tensor(loader, self, vis_i=vis_i)
            self.logger.experiment.add_image(
                "val/image", image, self.trainer.global_step
@@ -227,7 +237,7 @@ class LSTM_PL(pl.LightningModule):
        return parser


-def plot_from_loader(loader, model, vis_i=670, n=100):
+def plot_from_loader(loader, model, vis_i=670, n=1):
    dset_test = loader.dataset
    label_names = dset_test.label_names
    y_trues = []
@@ -241,7 +251,7 @@ def plot_from_loader(loader, model, vis_i=670, n=100):
        model.eval()
        with torch.no_grad():
            y_hat = model.forward(x)
-            y_hat = y_hat.cpu().numpy()
+            y_hat = y_hat.cpu().squeeze(0).numpy()

        dt = y_rows.iloc[0].name

@@ -17,4 +17,4 @@ class ObjectDict(dict):
    
    @property
    def __dict__(self):
-        return self
+        return dict(self)