readme and results

2026-06-27 16:29:15 +08:00 · 2020-02-16 07:25:45 +08:00
parent f6b822e3c4
commit 0c4e2f0497
5 changed files with 12471 additions and 173 deletions
@@ -1,12 +1,12 @@
 # Using recurrent attentive neural processes for forecasting power usage

-And also implements [Recurrent Attentive Neural Process for Sequential Data](https://arxiv.org/abs/1910.09323) (ANP-RNN)
+This repo implements ["Recurrent Attentive Neural Process for Sequential Data"](https://arxiv.org/abs/1910.09323) (ANP-RNN) and tests them on real data.

 ![](docs/np_lstm.jpeg)

 This implementation has lots of options so you can run it as a [Attentive Neural Process](https://arxiv.org/abs/1901.05761) (ANP), or NP.

-I've always made lots of weaks for flexibility and stability and [replicated the DeepMind ANP results](anp_1d_regression.ipynb) in pytorch. The replication, qualitativily seems like a better match than the other pytorch versions of ANP (as of 2019-11-01).
+I've always made lots of tweaks for flexibility and stability and [replicated the DeepMind ANP results](anp_1d_regression.ipynb) in pytorch. The replication qualitatively seems like a better match than the other pytorch versions of ANP (as of 2019-11-01).



@@ -29,10 +29,10 @@ I've always made lots of weaks for flexibility and stability and [replicated the

 Here the black dots are input data, the dotted line is the true data. The blue line is the prediction, and the blue shadow is the uncertainty to one standard deviation.

-I chose a a difficult example below, it's a window in the test set that deviates from the previous pattern. Given 3 days inputs, it must predict the next day, and the next day has higher power usage than previously. The trained model manages to predict it based on the inputs.
+I chose a difficult example below, it's a window in the test set that deviates from the previous pattern. Given 3 days inputs, it must predict the next day, and the next day has higher power usage than previously. The trained model manages to predict it based on the inputs.


-### ANP-RNN Results
+### Example ANP-RNN outputs

 ![](docs/anp-rnn_2.png)

@@ -40,7 +40,7 @@ I chose a a difficult example below, it's a window in the test set that deviates

 ![](docs/anp-rnn_4.png)

-### ANP Results (sequential)
+### Example ANP outputs (sequential)

 ![](docs/1.png)

@@ -52,7 +52,7 @@ I chose a a difficult example below, it's a window in the test set that deviates

 ![](docs/19.png)

-### LSTM Baseline
+### Example LSTM Baseline outputs

 Compare this to a quick LSTM baseline below, which didn't predict this divergance from the pattern. (Bear in mind that I didn't tweak this model as much). The uncertainty and prediction are also less smooth and the log probability is lower.

@@ -70,7 +70,7 @@ This is based on the code listed in the next section, with some changes. The mos

 Changes for a predictive use case:
 - target points are always in the future, context is in the past
- context and and targets are still sampled randomly during training
+- context and targets are still sampled randomly during training


 Changes for stability:
@@ -90,9 +90,12 @@ I put some work into replicating the behaviour shown in the [original deepmind t
 Compare deepmind:
 - ![](docs/deepmind1.png)

-And this repo (anp_1d_regression.ipynb)
+And this repo with an ANP (anp_1d_regression.ipynb)
 - ![](docs/replicate2.png)

+And a ANP-RNN
+- ![](docs/anp_rnn_1d.png)
+
 It's just a qualitative comparison but we see the same kind of overfitting with uncertainty being tight where lots of data points exist, and wide where they do not. However this repo seems to miss points occasionally.


@@ -105,4 +108,4 @@ A list of projects I used as reference or modified to make this one:
 - Second pytorch implementation KurochkinAlexey (has some bugs currently) https://github.com/KurochkinAlexey/Attentive-neural-processes/blob/master/anp_1d_regression.ipynb
 - If you want to try vanilla neural processes: https://github.com/EmilienDupont/neural-processes/blob/master/example-1d.ipynb

-I'm very gratefull for all these authors for sharing their work. It was a pleasure to dive deep into these models compare the differen't implementations.
+I'm very grateful for all these authors for sharing their work. It was a pleasure to dive deep into these models compare the different implementations.
@@ -87,7 +87,7 @@ class LatentModelPL(pl.LightningModule):
        return self.validation_end(*args, **kwargs)

    def configure_optimizers(self):
-        optim = torch.optim.Adam(self.parameters(), lr=self.hparams["learning_rate"])
+        optim = torch.optim.WAdam(self.parameters(), lr=self.hparams["learning_rate"])
        scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optim, patience=2, verbose=True, min_lr=1e-5) # note early stopping has patient 3
        return [optim], [scheduler]

@@ -151,13 +151,8 @@ class LatentModelPL(pl.LightningModule):
        """
        # MODEL specific
        parser = HyperOptArgumentParser(strategy=parent_parser.strategy, parents=[parent_parser], add_help=False)
-        parser.opt_range("--learning_rate", default=1e-4, type=float, tunable=True, high=1e-2, low=1e-5, log_base=10)
-        parser.add_argument("--batch_size", default=16, type=int)
-
-        parser.add_argument("--x_dim", default=16, type=int)
-        parser.add_argument("--y_dim", default=1, type=int)
-        parser.add_argument("--vis_i", default=670, type=int)
-
+        parser.opt_range("--learning_rate", default=1e-3, type=float, tunable=True, high=1e-2, low=1e-5, log_base=10)
+        
        parser.opt_list("--hidden_dim", default=128, type=int, tunable=True, options=[8*2**i for i in range(8)])
        parser.opt_list("--latent_dim", default=128, type=int, tunable=True, options=[8*2**i for i in range(8)])
        parser.add_argument("--num_heads", default=8, type=int)
@@ -177,13 +172,22 @@ class LatentModelPL(pl.LightningModule):
        parser.opt_list("--det_enc_cross_attn_type", default="multihead", type=str, tunable=True, options=['uniform', 'dot', 'multihead', 'ptmultihead'])

        parser.opt_list("--use_lvar", default=False, type=bool, tunable=True, options=[False, True])
+        parser.opt_list("--use_rnn", default=False, type=bool, tunable=True, options=[False, True])
        parser.opt_list("--use_deterministic_path", default=True, tunable=True, type=bool, options=[False, True])
-
+        parser.opt_list("--use_self_attn", default=True, tunable=True, type=bool, options=[False, True])
+        parser.opt_list("--batchnorm", default=True, tunable=True, type=bool, options=[False, True])
+        
        # training specific (for this model)
+        parser.add_argument("--context_in_target", default=True, type=bool)
        parser.add_argument("--grad_clip", default=0, type=float)
        parser.add_argument("--num_context", type=int, default=24 * 2)
        parser.add_argument("--num_extra_target", type=int, default=24)
        parser.add_argument("--max_nb_epochs", default=20, type=int)
        parser.add_argument("--num_workers", default=4, type=int)
+
+        parser.add_argument("--batch_size", default=16, type=int)
+        parser.add_argument("--x_dim", default=16, type=int)
+        parser.add_argument("--y_dim", default=1, type=int)
+        parser.add_argument("--vis_i", default=670, type=int)
        return parser