add seq2seq model

This commit is contained in:
wassname
2020-03-01 11:48:14 +08:00
parent 6bccf20534
commit 45117ff491
13 changed files with 2460 additions and 273 deletions
+174
View File
@@ -0,0 +1,174 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
Binary file not shown.

After

Width:  |  Height:  |  Size: 30 KiB

+31 -8
View File
@@ -20,6 +20,8 @@ This repository has lots of options so you can run it as a ANP-RNN, or ANP or NP
I've also made lots of tweaks for flexibility and stability and [replicated the DeepMind ANP results](anp_1d_regression.ipynb) in pytorch. The replication qualitatively seems like a better match than the other pytorch versions of ANP (as of 2019-11-01). You can see other code repositories in the see also section.
It's not heavily documented, because most of my code never gets read or used. If you are using it, and it's confusing, make a github issue are we will add comments or docs together.
- [Neural Processes for sequential data](#neural-processes-for-sequential-data)
- [Experiment: Comparing models on real world data](#experiment-comparing-models-on-real-world-data)
@@ -37,7 +39,9 @@ I've also made lots of tweaks for flexibility and stability and [replicated the
- [Smartmeter Data](#smartmeter-data)
- [Code](#code)
- [ANP-RNN diagram](#anp-rnn-diagram)
- [Tips](#tips)
- [See also:](#see-also)
- [Citing](#citing)
## Experiment: Comparing models on real world data
@@ -192,6 +196,19 @@ Changes for stability:
![](docs/anp-rnn-mcdropout.png)
## Tips
- Make you normalise all data, ideally the output two, this seems to be very important
- Batchnorm, lvar, dropout: it's unclear to me how to make these help reliably
- The deterministic path had unclear value, I found it best to leave it out
- The size and comparitive size of the context and target is important for performance.
- If the context is too long and complex the model cannot summarize it
- If the target is too long and complex hte model cannot fit it well
- If the context is in the target, the model may collapse to just fitting this. To fix
- make it small
- or make the loss on this part downweighted, this seems like the best approach since x_context->y_context may still be a usefull secondary task
- or do not include context in target
## See also:
A list of projects I used as reference or modified to make this one:
@@ -206,14 +223,20 @@ I'm very grateful for all these authors for sharing their work. It was a pleasur
Neural process papers:
- [2019, Attentive Neural Processes](https://arxiv.org/abs/1910.09323) (using attention to prevent underfitting)
- [2019, Functional Neural Processes](https://arxiv.org/abs/1906.08324)
- [2019, Recurrent Neural Processes](https://arxiv.org/abs/1906.05915) (2d and 3d over time)
- [2019, Spatiotemporal Modeling using Recurrent
Neural Processes](https://www.ri.cmu.edu/wp-content/uploads/2019/08/msr_thesis_document.pdf) (infilling spatial information, using a RNN for time information, no code)
- [2018, Conditional Neural Processes](https://arxiv.org/abs/1807.01613) [code](https://github.com/deepmind/neural-processes)
- [2018, Neural Processes](https://arxiv.org/abs/1807.01622)
- [2019-10-17, "Recurrent Attentive Neural Process for Sequential Data"](https://arxiv.org/abs/1910.09323) - LSTM on X before encoder, no code
- [2019-10-29, "Convolutional Conditional Neural Processes"](https://arxiv.org/abs/1910.13556). [code](https://github.com/cambridge-mlg/convcnp)
- [2019-10-01, "Wasserstein Neural Processes"](https://arxiv.org/abs/1910.00668) would be helpfull if the output dist never converges for your problem
- [2019-08-08, "Spatiotemporal Modeling using Recurrent Neural Processes"](https://www.ri.cmu.edu/wp-content/uploads/2019/08/msr_thesis_document.pdf) (infilling spatial information, using a RNN for time information, no code)
- [2019-06-13, "Recurrent Neural Processes"](https://arxiv.org/abs/1906.05915) (2d and 3d over time, using LSTM in encoder/decoder, no code)
- [2019-06-19, "The Functional Neural Processes"](https://arxiv.org/abs/1906.08324)
- [2019-01-17, "Attentive Neural Processes"](https://arxiv.org/abs/1901.05761) (using attention to prevent underfitting) [code](https://github.com/deepmind/neural-processes)
- [2018-07-04, "Conditional Neural Processes"](https://arxiv.org/abs/1807.01613) [code](https://github.com/deepmind/neural-processes)
- [2018-07-04, "Neural Processes"](https://arxiv.org/abs/1807.01622)
Blogposts:
- [2018, Neural Processes as distributions over functions
- [2018-08-10, "Neural Processes as distributions over functions"
](https://kasparmartens.rbind.io/post/np/)
# Citing
If you like our work and end up using this code for your reseach give us a shout-out by citing or acknowledging
+20 -3
View File
@@ -1227,9 +1227,26 @@
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"execution_count": 1,
"metadata": {
"ExecuteTime": {
"end_time": "2020-02-16T13:45:00.767510Z",
"start_time": "2020-02-16T13:45:00.758163Z"
}
},
"outputs": [
{
"ename": "NameError",
"evalue": "name 'loss' is not defined",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-1-de191f53719d>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mloss\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mNameError\u001b[0m: name 'loss' is not defined"
]
}
],
"source": []
},
{
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
+177 -137
View File
File diff suppressed because one or more lines are too long
+14 -13
View File
@@ -23,16 +23,18 @@ def collate_fns(max_num_context, max_num_extra_target, sample, sort=True, contex
x = torch.from_numpy(x).float()
y = torch.from_numpy(y).float()
x[:, :max_num_context, -1] = 0 # Feature to let the model know this is past data
n=x[:, max_num_context:, -1].shape[1]
x[:, max_num_context:, -1] = torch.arange(1, n + 1) / 1.0 / n # Feature to let the model know this is past data
# Last feature will show how far in time a point is from out last context
assert (np.diff(x[:, :, 0], 1)>=0).all(), 'first features should be ordered e.g. seconds'
assert (x[:, max_num_context, -1]==0.).all(), 'last features should be empty'
time = x[:, :, 0]
t0 = x[:, max_num_context, 0][:, None]
x[:, :, -1] = time - t0 # Feature to let the model know this is past data
x_context = x[:, :max_num_context]
y_context = y[:, :max_num_context]
x_target_extra = x[:, max_num_context:]
y_target_extra = y[:, max_num_context:]
if sample:
@@ -53,9 +55,8 @@ def collate_fns(max_num_context, max_num_extra_target, sample, sort=True, contex
x_target = x_target_extra
y_target = y_target_extra
assert (x_context[:, :, -1]==0).all()
assert (x[:, -1, -1] > 0).all()
# assert (x[:, 0, -1] == 0).all()
assert (x[:, 0, -1] < 0).all()
return x_context, y_context, x_target, y_target
@@ -75,19 +76,19 @@ class SmartMeterDataSet(torch.utils.data.Dataset):
rows = rows.sort_values('tstp')
# make sure tstp, which is our x axis, is the first value
columns = ['tstp'] + list(set(rows.columns) - set(['tstp']))
columns = ['tstp'] + list(set(rows.columns) - set(['tstp'])) + ['future']
rows['future'] = 0.
rows = rows[columns]
# This will be the last row, and will change it upon sample to let the model know some points are in the future
rows['future']=1
x = rows.drop(columns=self.label_names)
y = rows[self.label_names]
x = rows.drop(columns=self.label_names).copy()
y = rows[self.label_names].copy()
return x, y
def __getitem__(self, i):
x,y = self.get_rows(i)
x, y = self.get_rows(i)
return x.values, y.values
def __len__(self):
@@ -140,7 +141,7 @@ def get_smartmeter_df(indir=Path('./data/smart-meters-in-london'), use_logy=Fals
# Also find bank holidays
df_hols = pd.read_csv(indir/'uk_bank_holidays.csv', parse_dates=[0])
holidays = set(df_hols['Bank holidays'].dt.round('D'))
holidays = set(df_hols['Bank holidays'].dt.round('D'))
df['holiday'] = df.tstp.apply(lambda dt:dt.floor('D') in holidays).astype(int)
@@ -158,7 +159,7 @@ def get_smartmeter_df(indir=Path('./data/smart-meters-in-london'), use_logy=Fals
df = df.dropna()
if use_logy:
df['energy(kWh/hh)'] = np.log(df['energy(kWh/hh)']+eps)
df['energy(kWh/hh)'] = np.log(df['energy(kWh/hh)']+1e-4)
df = df.sort_values('tstp')
# split data
+59 -38
View File
@@ -25,13 +25,14 @@ class LatentModelPL(pl.LightningModule):
def training_step(self, batch, batch_idx):
assert all(torch.isfinite(d).all() for d in batch)
context_x, context_y, target_x, target_y = batch
y_pred, kl, loss, loss_mse, y_std = self.forward(context_x, context_y, target_x, target_y)
y_pred, losses, extra = = self.forward(context_x, context_y, target_x, target_y)
y_std = extra['dist'].scale
tensorboard_logs = {
"train/loss": loss,
"train/kl": kl.mean(),
"train/std": y_std.mean(),
"train/mse": loss_mse.mean(),
"train/mse": F.mse_loss(y_pred, target_y).mean(),
"train_loss": losses['loss'],
"train/kl": losses['loss_kl'].mean(),
"train/std": losses['y_std'].mean(),
"train/mse": losses['loss_mse'].mean(),
}
assert torch.isfinite(loss)
# print('device', next(self.model.parameters()).device)
@@ -40,45 +41,65 @@ class LatentModelPL(pl.LightningModule):
def validation_step(self, batch, batch_idx):
assert all(torch.isfinite(d).all() for d in batch)
context_x, context_y, target_x, target_y = batch
y_pred, kl, loss, loss_mse, y_std = self.forward(context_x, context_y, target_x, target_y)
y_pred, losses, extra = self.forward(context_x, context_y, target_x, target_y)
y_std = extra['dist'].scale
tensorboard_logs = {
"val_loss": loss,
"val/kl": kl.mean(),
"val/mse": loss_mse.mean(),
"val/std": y_std.mean(),
"val/mse": F.mse_loss(y_pred, target_y).mean(),
"val_loss": losses['loss'], # This exact key is needed for metrics
"val/kl": losses['loss_kl'].mean(),
"val/mse": losses['loss_mse'].mean(),
"val/std": losses['y_std'].mean(),
}
return {"val_loss": loss, "log": tensorboard_logs}
# def training_end(self, outputs):
# logs = self.agg_logs(outputs)
# tensorboard_logs_str = {k: f'{v}' for k, v in logs["log"].items()}
# print(f"step train {self.trainer.global_step}, {tensorboard_logs_str}")
# return logs
def validation_end(self, outputs):
if int(self.hparams["vis_i"]) > 0:
# https://github.com/PytorchLightning/pytorch-lightning/blob/f8d9f8f/pytorch_lightning/core/lightning.py#L293
loader = self.val_dataloader()[0]
vis_i = min(int(self.hparams["vis_i"]), len(loader.dataset))
# print('vis_i', vis_i)
if isinstance(self.hparams["vis_i"], str):
image = plot_from_loader(loader, self, i=int(vis_i))
plt.show()
self.show_image()
logs = self.agg_logs(outputs)
tensorboard_logs_str = {k: f'{v}' for k, v in logs["log"].items()}
print(f"step val {self.trainer.global_step}, {tensorboard_logs_str}")
return logs
def show_image(self):
# https://github.com/PytorchLightning/pytorch-lightning/blob/f8d9f8f/pytorch_lightning/core/lightning.py#L293
loader = self.val_dataloader()[0]
vis_i = min(int(self.hparams["vis_i"]), len(loader.dataset))
# print('vis_i', vis_i)
if isinstance(self.hparams["vis_i"], str):
image = plot_from_loader(loader, self, i=int(vis_i))
plt.show()
else:
image = plot_from_loader_to_tensor(loader, self, i=vis_i)
self.logger.experiment.add_image('val/image', image, self.trainer.global_step)
def agg_logs(self, outputs):
if isinstance(outputs, dict):
outputs = [outputs]
aggs = {}
for j in outputs[0]:
if isinstance(outputs[0][j], dict):
# Take mean of sub dicts
keys = outputs[0][j].keys()
aggs[j] = {k: torch.stack([x[j][k] for x in outputs if k in x[j]]).mean() for k in keys}
else:
image = plot_from_loader_to_tensor(loader, self, i=vis_i)
self.logger.experiment.add_image('val/image', image, self.trainer.global_step)
keys = outputs[0]["log"].keys()
# tensorboard_logs = {}
# for k in keys:
# tensorboard_logs[k] = torch.stack([x["log"][k] for x in outputs if k in x["log"]]).mean()
tensorboard_logs = {k: torch.stack([x["log"][k] for x in outputs if k in x["log"]]).mean() for k in keys}
# Take mean of numbers
aggs[j] = torch.stack([x[j] for x in outputs if j in x]).mean()
return aggs
avg_loss = torch.stack([x["val_loss"] for x in outputs]).mean()
assert torch.isfinite(avg_loss)
tensorboard_logs_str = {k: f'{v}' for k, v in tensorboard_logs.items()}
print(f"step {self.trainer.global_step}, {tensorboard_logs_str}")
# Log hparams with metric, doesn't work
# self.logger.experiment.add_hparams(self.hparams.__dict__, {"avg_val_loss": avg_loss})
return {"avg_val_loss": avg_loss, "log": tensorboard_logs}
# # Log hparams with metric, doesn't work
# # self.logger.experiment.add_hparams(self.hparams.__dict__, {"avg_val_loss": avg_loss})
# if f"{name}_loss" in outputs[0].keys():
# avg_loss = torch.stack([x[f"{name}_loss"] for x in outputs]).mean()
# assert torch.isfinite(avg_loss)
# else:
# avg_loss = 0
# return {f"avg_{name}_loss": avg_loss, "log": tensorboard_logs, "progress_bar": {}}
def test_step(self, *args, **kwargs):
return self.validation_step(*args, **kwargs)
@@ -87,8 +108,8 @@ class LatentModelPL(pl.LightningModule):
return self.validation_end(*args, **kwargs)
def configure_optimizers(self):
optim = torch.optim.AdamW(self.parameters(), lr=self.hparams["learning_rate"])
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optim, patience=2, verbose=True, min_lr=1e-5) # note early stopping has patient 3
optim = torch.optim.Adam(self.parameters(), lr=self.hparams["learning_rate"], weight_decay=0)
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optim, patience=1, verbose=True, min_lr=1e-7) # note early stopping has patience 3
return [optim], [scheduler]
def _get_cache_dfs(self):
+256
View File
@@ -0,0 +1,256 @@
import os
import numpy as np
import pandas as pd
import torch
from tqdm.auto import tqdm
from torch import nn
from torch.nn import functional as F
from torch.utils.data import DataLoader
from torchvision.datasets import MNIST
from test_tube import Experiment, HyperOptArgumentParser
from src.data.smart_meter import collate_fns, SmartMeterDataSet, get_smartmeter_df
import torchvision.transforms as transforms
from src.plot import plot_from_loader_to_tensor, plot_from_loader
from argparse import ArgumentParser
import json
import pytorch_lightning as pl
import math
from matplotlib import pyplot as plt
import torch
import io
import PIL
from torchvision.transforms import ToTensor
from src.data.smart_meter import get_smartmeter_df
from src.utils import ObjectDict
def log_prob_sigma(value, loc, log_scale):
"""A slightly more stable (not confirmed yet) log prob taking in log_var instead of scale.
modified from https://github.com/pytorch/pytorch/blob/2431eac7c011afe42d4c22b8b3f46dedae65e7c0/torch/distributions/normal.py#L65
"""
var = torch.exp(log_scale * 2)
return (
-((value - loc) ** 2) / (2 * var) - log_scale - math.log(math.sqrt(2 * math.pi))
)
class Seq2SeqNet(nn.Module):
def __init__(self, hparams, _min_std = 0.05):
super().__init__()
self.hparams = hparams
self._min_std = _min_std
self.encoder = nn.LSTM(
input_size=self.hparams.input_size,
hidden_size=self.hparams.hidden_size,
batch_first=True,
num_layers=self.hparams.lstm_layers,
bidirectional=self.hparams.bidirectional,
dropout=self.hparams.lstm_dropout,
)
self.decoder = nn.LSTM(
input_size=self.hparams.input_size_decoder,
hidden_size=self.hparams.hidden_size,
batch_first=True,
num_layers=self.hparams.lstm_layers,
bidirectional=self.hparams.bidirectional,
dropout=self.hparams.lstm_dropout,
)
self.hidden_out_size = (
self.hparams.hidden_size
* (self.hparams.bidirectional + 1)
)
self.mean = nn.Linear(self.hidden_out_size, self.hparams.output_size)
self.std = nn.Linear(self.hidden_out_size, self.hparams.output_size)
def forward(self, context_x, context_y, target_x, target_y=None):
x = torch.cat([context_x, context_y], -1)
_, (h_out, cell) = self.encoder(x)
# hidden = [batch size, n layers * n directions, hid dim]
# cell = [batch size, n layers * n directions, hid dim]
outputs, (_, _) = self.decoder(target_x, (h_out, cell))
# output = [batch size, seq len, hid dim * n directions]
# outputs: [B, T, num_direction * H]
mean = self.mean(outputs)
log_sigma = self.std(outputs)
log_sigma = torch.clamp(log_sigma, math.log(self._min_std), -math.log(self._min_std))
sigma = torch.exp(log_sigma)
y_dist=torch.distributions.Normal(mean, sigma)
# Loss
loss_mse =loss_p = None
if target_y is not None:
loss_mse = F.mse_loss(mean, target_y, reduction='none')
loss_p = -log_prob_sigma(target_y, mean, log_sigma)
if self.hparams["context_in_target"]:
loss_p[:context_x.size(1)] /= 100
loss_mse[:context_x.size(1)] /= 100
# # Don't catch loss on context window
# mean = mean[:, self.hparams.num_context:]
# log_sigma = log_sigma[:, self.hparams.num_context:]
y_pred = y_dist.rsample if self.training else y_dist.loc
return y_pred, dict(loss_p=loss_p.mean(), loss_mse=loss_mse.mean()), dict(log_sigma=log_sigma, dist=y_dist)
class LSTMSeq2Seq_PL(pl.LightningModule):
def __init__(self, hparams):
# TODO make label name configurable
# TODO make data source configurable
super().__init__()
self.hparams = ObjectDict()
self.hparams.update(
hparams.__dict__ if hasattr(hparams, "__dict__") else hparams
)
self.model = Seq2SeqNet(self.hparams)
self._dfs = None
def forward(self, context_x, context_y, target_x, target_y):
return self.model(context_x, context_y, target_x, target_y)
def training_step(self, batch, batch_idx):
# REQUIRED
assert all(torch.isfinite(d).all() for d in batch)
context_x, context_y, target_x, target_y = batch
y_dist, losses, extra = self.forward(context_x, context_y, target_x, target_y)
loss = losses['loss_p'] # + loss_mse
tensorboard_logs = {
"train/loss": loss,
'train/loss_mse': losses['loss_mse'],
"train/loss_p": losses['loss_p'],
"train/sigma": torch.exp(extra['log_sigma']).mean()}
return {"loss": loss, "log": tensorboard_logs}
def validation_step(self, batch, batch_idx):
context_x, context_y, target_x, target_y = batch
assert all(torch.isfinite(d).all() for d in batch)
y_dist, losses, extra = self.forward(context_x, context_y, target_x, target_y)
loss = losses['loss_p'] # + loss_mse
tensorboard_logs = {
"val_loss": loss,
'val/loss_mse': losses['loss_mse'],
"val/loss_p": losses['loss_p'],
"val/sigma": torch.exp(extra['log_sigma']).mean()}
return {"val_loss": loss, "log": tensorboard_logs}
def validation_end(self, outputs):
if int(self.hparams["vis_i"]) > 0:
self.show_image()
avg_loss = torch.stack([x["val_loss"] for x in outputs]).mean()
keys = outputs[0]["log"].keys()
tensorboard_logs = {
k: torch.stack([x["log"][k] for x in outputs if k in x["log"]]).mean()
for k in keys
}
tensorboard_logs_str = {k: f"{v}" for k, v in tensorboard_logs.items()}
print(f"step {self.trainer.global_step}, {tensorboard_logs_str}")
assert torch.isfinite(avg_loss)
return {"avg_val_loss": avg_loss, "log": tensorboard_logs}
def show_image(self):
# https://github.com/PytorchLightning/pytorch-lightning/blob/f8d9f8f/pytorch_lightning/core/lightning.py#L293
loader = self.val_dataloader()[0]
vis_i = min(int(self.hparams["vis_i"]), len(loader.dataset))
# print('vis_i', vis_i)
if isinstance(self.hparams["vis_i"], str):
image = plot_from_loader(loader, self, i=int(vis_i))
plt.show()
else:
image = plot_from_loader_to_tensor(loader, self, i=vis_i)
self.logger.experiment.add_image('val/image', image, self.trainer.global_step)
def test_step(self, *args, **kwargs):
return self.validation_step(*args, **kwargs)
def test_end(self, *args, **kwargs):
return self.validation_end(*args, **kwargs)
def configure_optimizers(self):
optim = torch.optim.Adam(self.parameters(), lr=self.hparams["learning_rate"])
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
optim, patience=2, verbose=True, min_lr=1e-5
) # note early stopping has patient 3
return [optim], [scheduler]
def _get_cache_dfs(self):
if self._dfs is None:
df_train, df_test = get_smartmeter_df()
# self._dfs = dict(df_train=df_train[:600], df_test=df_test[:600])
self._dfs = dict(df_train=df_train, df_test=df_test)
return self._dfs
@pl.data_loader
def train_dataloader(self):
df_train = self._get_cache_dfs()['df_train']
data_train = SmartMeterDataSet(
df_train, self.hparams["num_context"], self.hparams["num_extra_target"]
)
return torch.utils.data.DataLoader(
data_train,
batch_size=self.hparams["batch_size"],
shuffle=True,
collate_fn=collate_fns(
self.hparams["num_context"], self.hparams["num_extra_target"], sample=True, context_in_target=self.hparams["context_in_target"]
),
num_workers=self.hparams["num_workers"],
)
@pl.data_loader
def val_dataloader(self):
df_test = self._get_cache_dfs()['df_test']
data_test = SmartMeterDataSet(
df_test, self.hparams["num_context"], self.hparams["num_extra_target"]
)
return torch.utils.data.DataLoader(
data_test,
batch_size=self.hparams["batch_size"],
shuffle=False,
collate_fn=collate_fns(
self.hparams["num_context"], self.hparams["num_extra_target"], sample=False, context_in_target=self.hparams["context_in_target"]
),
)
@pl.data_loader
def test_dataloader(self):
df_test = self._get_cache_dfs()['df_test']
data_test = SmartMeterDataSet(
df_test, self.hparams["num_context"], self.hparams["num_extra_target"]
)
return torch.utils.data.DataLoader(
data_test,
batch_size=self.hparams["batch_size"],
shuffle=False,
collate_fn=collate_fns(
self.hparams["num_context"], self.hparams["num_extra_target"], sample=False, context_in_target=self.hparams["context_in_target"]
),
)
@staticmethod
def add_model_specific_args(parent_parser):
"""
Specify the hyperparams for this LightningModule
"""
# MODEL specific
parser = HyperOptArgumentParser(parents=[parent_parser])
parser.add_argument("--learning_rate", default=0.002, type=float)
parser.add_argument("--batch_size", default=16, type=int)
parser.add_argument("--lstm_dropout", default=0.5, type=float)
parser.add_argument("--hidden_size", default=16, type=int)
parser.add_argument("--input_size", default=8, type=int)
parser.add_argument("--input_size_decoder", default=8, type=int)
parser.add_argument("--lstm_layers", default=8, type=int)
parser.add_argument("--bidirectional", default=False, type=bool)
# training specific (for this model)
parser.add_argument("--num_context", type=int, default=12)
parser.add_argument("--num_extra_target", type=int, default=2)
parser.add_argument("--max_nb_epochs", default=10, type=int)
parser.add_argument("--num_workers", default=4, type=int)
return parser
+33 -27
View File
@@ -35,25 +35,28 @@ def kl_loss_var(prior_mu, log_var_prior, post_mu, log_var_post):
class LatentModel(nn.Module):
def __init__(self,
x_dim,
y_dim,
hidden_dim=32,
latent_dim=32,
latent_enc_self_attn_type="dot",
det_enc_self_attn_type="dot",
det_enc_cross_attn_type="dot",
n_latent_encoder_layers=3,
n_det_encoder_layers=3,
n_decoder_layers=3,
use_deterministic_path=True,
min_std=0.01,
x_dim, # features in input
y_dim, # number of features in output
hidden_dim=32, # size of hidden space
latent_dim=32, # size of latent space
latent_enc_self_attn_type="ptmultihead", # type of attention: "uniform", "dot", "multihead" "ptmultihead": see attentive neural processes paper
det_enc_self_attn_type="ptmultihead",
det_enc_cross_attn_type="ptmultihead",
n_latent_encoder_layers=2,
n_det_encoder_layers=2, # number of deterministic encoder layers
n_decoder_layers=2,
use_deterministic_path=False,
min_std=0.01, # To avoid collapse use a minimum standard deviation, should be much smaller than variation in labels
dropout=0,
use_self_attn=False,
attention_dropout=0,
batchnorm=False,
use_lvar=False,
attention_layers=2,
use_rnn=False,
use_lvar=False, # Alternative loss calculation, may be more stable
attention_layers=2,
use_rnn=True, # use RNN/LSTM?
use_lstm_le=False, # use another LSTM in latent encoder instead of MLP
use_lstm_de=False, # use another LSTM in determinstic encoder instead of MLP
use_lstm_d=False, # use another lstm in decoder instead of MLP
**kwargs,
):
@@ -84,6 +87,7 @@ class LatentModel(nn.Module):
batchnorm=batchnorm,
min_std=min_std,
use_lvar=use_lvar,
use_lstm=use_lstm_le,
)
self._deterministic_encoder = DeterministicEncoder(
@@ -98,6 +102,7 @@ class LatentModel(nn.Module):
dropout=dropout,
batchnorm=batchnorm,
attention_dropout=attention_dropout,
use_lstm=use_lstm_de,
)
self._decoder = Decoder(
@@ -111,37 +116,34 @@ class LatentModel(nn.Module):
use_lvar=use_lvar,
n_decoder_layers=n_decoder_layers,
use_deterministic_path=use_deterministic_path,
use_lstm=use_lstm_d,
)
self._use_deterministic_path = use_deterministic_path
self._use_lvar = use_lvar
def forward(self, context_x, context_y, target_x, target_y=None):
num_targets = target_x.size(1)
if self._use_rnn:
# see https://arxiv.org/abs/1910.09323 where x is substituted with h = RNN(x)
# x need to be provided as [B, T, H]
x = torch.cat([context_x, target_x], dim=1)
# h: [B, T, num_direction * H]
h, _ = self._lstm(x)
context_x = h[:, :context_x.shape[1], :]
target_x = h[:, context_x.shape[1]:, :]
target_x, _ = self._lstm(target_x)
context_x, _ = self._lstm(context_x)
dist_prior, log_var_prior = self._latent_encoder(context_x, context_y)
if target_y is not None:
dist_post, log_var_post = self._latent_encoder(target_x,
target_y)
dist_post, log_var_post = self._latent_encoder(target_x, target_y)
z = dist_post.loc
else:
z = dist_prior.loc
num_targets = target_x.size(1)
z = z.unsqueeze(1).repeat(1, num_targets, 1) # [B, T_target, H]
if self._use_deterministic_path:
r = self._deterministic_encoder(context_x, context_y,
target_x) # [B, T_target, H]
target_x) # [B, T_target, H]
else:
r = None
@@ -150,14 +152,18 @@ class LatentModel(nn.Module):
if self._use_lvar:
log_p = log_prob_sigma(target_y, dist.loc, log_sigma).mean(-1) # [B, T_target, Y].mean(-1)
if self.hparams["context_in_target"]:
log_p[:, :context_x.size(1)] /= 100
kl_loss = kl_loss_var(dist_prior.loc, log_var_prior,
dist_post.loc, log_var_post).mean(-1) # [B, R].mean(-1)
else:
log_p = dist.log_prob(target_y).mean(-1)
if self.hparams["context_in_target"]:
log_p[:, :context_x.size(1)] /= 100 # There's the temptation for it to fit only on context, where it knows the answer, and learn very low uncertainty.
kl_loss = torch.distributions.kl_divergence(
dist_post, dist_prior).mean(-1)
dist_post, dist_prior).mean(-1) # [B, R].mean(-1)
kl_loss = kl_loss[:, None].expand(log_p.shape)
mse_loss = F.mse_loss(dist.loc, target_y)
mse_loss = F.mse_loss(dist.loc, target_y, reduce=None)[:, :context_x.size(1)].mean()
loss = (kl_loss - log_p).mean()
else:
@@ -167,4 +173,4 @@ class LatentModel(nn.Module):
loss = None
y_pred = dist.rsample() if self.training else dist.loc
return y_pred, kl_loss, loss, mse_loss, dist.scale
return y_pred, dict(loss=loss, loss_p=loss_p.mean(), loss_kl=loss_kl, loss_mse=mse_loss.mean()), dict(log_sigma=log_sigma, dist=dist)
+45 -43
View File
@@ -6,6 +6,24 @@ import numpy as np
# from .attention import Attention as PtAttention
class LSTMBlock(nn.Module):
def __init__(
self, in_channels, out_channels, dropout=0, batchnorm=False, bias=False, num_layers=1
):
super().__init__()
self._lstm = nn.LSTM(
input_size=in_channels,
hidden_size=out_channels,
num_layers=num_layers,
dropout=dropout,
batch_first=True,
bias=bias
)
def forward(self, x):
return self._lstm(x)[0]
class NPBlockRelu2d(nn.Module):
"""Block for Neural Processes."""
@@ -208,17 +226,14 @@ class LatentEncoder(nn.Module):
use_lvar=False,
use_self_attn=False,
attention_layers=2,
use_lstm=False
):
super().__init__()
self._input_layer = nn.Linear(input_dim, hidden_dim)
self._encoder = nn.ModuleList(
[
NPBlockRelu2d(
hidden_dim, hidden_dim, batchnorm=batchnorm, dropout=dropout
)
for _ in range(n_encoder_layers)
]
)
# self._input_layer = nn.Linear(input_dim, hidden_dim)
if use_lstm:
self._encoder = LSTMBlock(input_dim, hidden_dim, batchnorm=batchnorm, dropout=dropout, num_layers=n_encoder_layers)
else:
self._encoder = BatchMLP(input_dim, hidden_dim, batchnorm=batchnorm, dropout=dropout, num_layers=n_encoder_layers)
if use_self_attn:
self._self_attention = Attention(
hidden_dim,
@@ -232,15 +247,14 @@ class LatentEncoder(nn.Module):
self._log_var = nn.Linear(hidden_dim, latent_dim)
self._min_std = min_std
self._use_lvar = use_lvar
self._use_lstm = use_lstm
self._use_self_attn = use_self_attn
def forward(self, x, y):
encoder_input = torch.cat([x, y], dim=-1)
# Pass final axis through MLP
encoded = self._input_layer(encoder_input)
for layer in self._encoder:
encoded = torch.relu(layer(encoded))
encoded = self._encoder(encoder_input)
# Aggregator: take the mean over all points
if self._use_self_attn:
@@ -282,21 +296,15 @@ class DeterministicEncoder(nn.Module):
batchnorm=False,
dropout=0,
attention_dropout=0,
use_lstm=False,
):
super().__init__()
self._use_self_attn = use_self_attn
self._input_layer = nn.Linear(input_dim, hidden_dim)
self._d_encoder = nn.ModuleList(
[
NPBlockRelu2d(
hidden_dim,
hidden_dim,
batchnorm=batchnorm,
dropout=attention_dropout,
)
for _ in range(n_d_encoder_layers)
]
)
# self._input_layer = nn.Linear(input_dim, hidden_dim)
if use_lstm:
self._d_encoder = LSTMBlock(input_dim, hidden_dim, batchnorm=batchnorm, dropout=dropout, num_layers=n_d_encoder_layers)
else:
self._d_encoder = BatchMLP(input_dim, hidden_dim, batchnorm=batchnorm, dropout=dropout, num_layers=n_d_encoder_layers)
if use_self_attn:
self._self_attention = Attention(
hidden_dim,
@@ -317,14 +325,12 @@ class DeterministicEncoder(nn.Module):
d_encoder_input = torch.cat([context_x, context_y], dim=-1)
# Pass final axis through MLP
d_encoded = self._input_layer(d_encoder_input)
for layer in self._d_encoder:
d_encoded = torch.relu(layer(d_encoded))
d_encoded = self._d_encoder(d_encoder_input)
if self._use_self_attn:
d_encoded = self._self_attention(d_encoded, d_encoded, d_encoded)
# Apply attention
# Apply attention as mean aggregation
h = self._cross_attention(context_x, d_encoded, target_x)
return h
@@ -343,6 +349,7 @@ class Decoder(nn.Module):
use_lvar=False,
batchnorm=False,
dropout=0,
use_lstm=False,
):
super(Decoder, self).__init__()
self._target_transform = nn.Linear(x_dim, hidden_dim)
@@ -350,14 +357,11 @@ class Decoder(nn.Module):
hidden_dim_2 = 2 * hidden_dim + latent_dim
else:
hidden_dim_2 = hidden_dim + latent_dim
self._decoder = nn.ModuleList(
[
NPBlockRelu2d(
hidden_dim_2, hidden_dim_2, batchnorm=batchnorm, dropout=dropout
)
for _ in range(n_decoder_layers)
]
)
if use_lstm:
self._decoder = LSTMBlock(hidden_dim_2, hidden_dim_2, batchnorm=batchnorm, dropout=dropout, num_layers=n_decoder_layers)
else:
self._decoder = BatchMLP(hidden_dim_2, hidden_dim_2, batchnorm=batchnorm, dropout=dropout, num_layers=n_decoder_layers)
self._mean = nn.Linear(hidden_dim_2, y_dim)
self._std = nn.Linear(hidden_dim_2, y_dim)
self._use_deterministic_path = use_deterministic_path
@@ -371,19 +375,17 @@ class Decoder(nn.Module):
if self._use_deterministic_path:
z = torch.cat([r, z], dim=-1)
representation = torch.cat([z, x], dim=-1)
r = torch.cat([z, x], dim=-1)
# Pass final axis through MLP
for layer in self._decoder:
representation = torch.relu(layer(representation))
r = self._decoder(r)
# Get the mean and the variance
mean = self._mean(representation)
log_sigma = self._std(representation)
mean = self._mean(r)
log_sigma = self._std(r)
# Bound or clamp the variance
if self._use_lvar:
log_sigma = torch.clamp(log_sigma, math.log(self._min_std), -math.log(1e-5))
log_sigma = torch.clamp(log_sigma, math.log(self._min_std), -math.log(self._min_std))
sigma = torch.exp(log_sigma)
else:
sigma = self._min_std + (1 - self._min_std) * F.softplus(log_sigma)
+18 -4
View File
@@ -13,8 +13,10 @@ eps = 1e-5
def plot_rows(
target_y_rows: pd.DataFrame,
x_context_rows: pd.DataFrame,
x_target_rows: pd.DataFrame,
context_y_rows: pd.DataFrame,
target_y_rows: pd.DataFrame,
pred_y: np.array,
std: np.array,
undo_log=False,
@@ -23,8 +25,10 @@ def plot_rows(
"""Plots the predicted mean and variance and the context points.
Args:
target_y_rows
x_context_rows
x_target_rows
context_y_rows: dataframe with datetime index, and labels
target_y_rows:
pred_y: An array of shape [B,num_targets,1] that contains the
predicted means of the y values at the target points in target_x.
std: An array of shape [B,num_targets,1] that contains the
@@ -38,6 +42,8 @@ def plot_rows(
j = 0
label = "energy(kWh/hh)"
# Plot input data
# Start with true data and use it to get ylimits (that way they are constant)
plt.plot(target_y_rows.index, target_y_rows.values, "k:", linewidth=2, label="true")
plt.plot(context_y_rows.index, context_y_rows.values, "k:", linewidth=2, label="true")
@@ -97,6 +103,8 @@ def plot_from_loader(
max_num_context = context_x.shape[1]
y_context_rows = y_rows[:max_num_context]
y_target_extra_rows = y_rows[max_num_context:]
x_context_rows = x_rows[:max_num_context]
x_target_extra_rows = x_rows[max_num_context:]
dt = y_target_extra_rows.index[0]
# # for the plotting we are doing to run prediction on the context points too
@@ -104,17 +112,23 @@ def plot_from_loader(
target_x = torch.cat([context_x, target_x_extra], 1)
target_y = torch.cat([context_y, target_y_extra], 1)
y_target_rows = y_rows
x_target_rows = x_rows
model.eval()
with torch.no_grad():
y_pred, kl, loss_test, loss_mse, y_std = model(context_x, context_y, target_x, target_y)
y_pred, losses, extra = model(context_x, context_y, target_x, target_y)
loss_test = losses["loss"] if "loss" in losses else 0.
y_std = extra["dist"].scale
if plot:
plt.figure()
plt.title(title + f" loss={loss_test: 2.2g} {dt}")
plot_rows(
y_target_rows,
x_context_rows,
x_target_rows,
y_context_rows,
y_target_rows,
y_pred.detach().cpu().numpy(),
y_std.detach().cpu().numpy(),
undo_log=False,