Attempting to replicate "A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem" by Jiang et. al. 2017 [1].
tl;dr I managed to get 8% growth on training data, but it disapeared on test data. However, RL papers can be very difficult to replicate due to bugs, framework differences, and hyperparameter sensistivity.
About
This paper trains an agent to choose a good portfolio of cryptocurrencies. It's reported that it can give 4-fold returns in 50 days and the paper seems to do all the right things so I wanted to see if I could acheive the same results.
This repo includes an environment for portfolio management (with unit tests). Hopefully others will find this usefull as I am not aware of any other implementations (as of 2017-07-17).
Author: wassname
License: AGPLv3
[1] Jiang, Zhengyao, Dixing Xu, and Jinjun Liang. "A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem." arXiv preprint arXiv:1706.10059 (2017).
Differences in implementation
The main differences from Jiang et. al. 2017 are:
- The first step in a deep learning project should be to make sure the model can overfit, this provides a sanity check. So I am first trying to acheive good results with no trading costs.
- I have not used portfolio vector memory. For ease of implementation I made the information available by replacing the oldest timestep. Your model can slice it, or a Dense or CNN models can just be given the information.
- Instead of DPG (deterministic policy gradient) I tried and DDPG (deep deterministic policy gradient) and VPG (vanilla policy gradient) with generalized advantage estimation and PPO.
- I tried to replicate the best performing CNN model from the paper and haven't attempted the LSTM or RNN models.
- instead of selecting 12 assets for each window I chose 5 assets that have existed for the longest time
- My topology had an extra layer see issue 3
Results
I have managed to overfit to the training data with no trading costs but it could not generalise to the test data. So far there have been poor results. I have not yet tried hyperparameter optimisation so it could be that parameter tweaking will allow the model to fit, or I may have subtle bugs.
- VPG model, training: 1.008 in 15 hours, 1.9 in 50 days
- VPG model, test: 1.00 in 15 hours, 1.0 in 50 days
- DDPG model, test
This test period is directly after the training period and it looks like the usefullness of the models learned knowledge may decay as it moves away from its training interval.
Installing
git clone https://github.com/wassname/rl-portfolio-management.gitcd rl-portfolio-managementpip install -r requirements/requirements.txtjupyter-notebook- Then open tensorflow-VPG.ipynb in jupyter
- Or try an alternative agent with tensorforce-PPO.ipynb and train
Files
- enviroments/portfolio.py - contains an openai environment for porfolio trading
- tensorforce-VPG.ipynb - notebook to try a policy gradient agent
- tensorforce-PPO - notebook to try a PPO agent
- data/poloniex_30m.hdf - hdf file with cryptocurrency 30 minutes prices
Using the environment
There are three environments defined here to use them:
import gym
import rl_portfolio_management.environments # this registers them
env = gym.envs.spec('CryptoPortfolioEIIE-v0').make()
print("CryptoPortfolioEIIE has an state shape suitable for an EIIE model (see https://arxiv.org/abs/1706.10059)")
print("shape =", env.reset().shape)
# shape = (5, 50, 3)
env = gym.envs.spec('CryptoPortfolioMLP-v0').make()
print("CryptoPortfolioMLP has an state shape for a dense model")
print("shape =", env.reset().shape)
# shape = (750,)
env = gym.envs.spec('CryptoPortfolioAtari-v0').make()
print("CryptoPortfolioAtari has an state shape for models that expect an image, such as ones tuned on Atari games")
print("shape =", env.reset().shape)
# shape = (50, 50, 3)
Or define your own:
import rl_portfolio_management.environments import PortfolioEnv
df_train = pd.read_hdf('./data/poloniex_30m.hf', key='train')
env = PortfolioEnv(
df=df_train,
steps=256,
scale=True,
augment=0.00,
trading_cost=0.0025,
time_cost=0.00,
window_length=50,
output_mode='mlp'
)
Tests
We have partial test coverage of the environment, just run:
python -m pytest
