so last year at nips I was slated to give a talk at the deep RL workshop and I wasn't sure what I was going to talk about because everything I had prepared I had already talked about it so many times that I just didn't want to didn't want to give another talk on it so I I asked Peter for his advice on what I should talk about and he said that Entering had given the talk earlier in the conference I called the nuts and bolts of deep learning where he sort of went through the flowchart of what you do when you see a new problem and like if you if you're overfitting you regular eyes and if you're underfitting then you use a bigger model and so on so so Peter suggested to come up to write a talk called the nuts and bolts of deep RL research where I would talk about some of the similar lessons and the tips and tricks for the RL setting so I put together a talk for that and actually people seem to like it so I'll give a slightly updated version of that talk right now so I'm going to talk about a few different things some of which are general and sort of apply to RL using reinforcement learning in general and some of them pertain to particular classes of methods like policy grading methods and these are just sort of little tips and tricks for how you how you get your algorithm to work and what you do day to day so let's say you have a totally new problem you're trying to solve like you have you have some new tasks and you figured out how to you defined an observation in an action space and you have your neural network policy or Q function but and you want to start learning learning how to solve it but you but you've never tried it before um so or okay or if you have a new algorithm you're trying to get working that you've never you you've never used it before so so what do you do what's the first thing you do if you have a new algorithm so so that I mean my first advice would be to use the small problems so you can run a lot of experiments really quickly and do a hyper parameter search and it's really useful too to be able to visualize the learning process in as many ways as possible so look at the state visitation like how that's evolving over time and look at how well your value function is fitting and so on so like I spent a lot of time looking at the pendulum problem where you're trying to swing up a pendulum because this problem has a 2d state space where it's just the angular and the angular velocity of the pendulum and I would visit visualize here's exactly what the value function looks like here's exactly what the state distribution looks like and here's how they evolve over time so I would get a sense for like what's if my algorithm isn't working is it because it's like oscillating in some funny way or maybe it's just giving a bad fit or maybe the function it's learnt the value function alerting isn't smooth enough and so on so I would say try to visualize everything and maybe use small problems where you can visualize everything also yeah it's useful to construct toy problems where your idea is going to be the strongest where you think okay if this idea has any possibility of working it's going to work there so for example let's say you're trying to do something with hierarchical reinforcement learning then construct some problem where there's some kind of obvious hierarchy that it should learn and you'll be able to tell if it's doing the right thing also construct the the problems where it's going to be weakest obviously and also as a counterpoint to that don't over fit your method to some contrived problem so let's say you've come up with some toy problem where your method is really good then don't realize that it's a toy problem and don't like tweak everything to just work on this toy problem perfectly because yeah it's also pretty useful to have medium-sized problems that you're very familiar with and you know exactly how fast the learning should be and what the reward should be at every iteration and so on so a few problems that I use a lot like training on pong Atari and the hopper would the hopper like problem which is this simulated robot problem with this hopping robot and I know exactly how fast an algorithm that's working should learn on these problems so so I can sort of it's it makes it easier to tune things if you have okay that's if you have a new algorithm let's say you have a new task I would recommend just making the task easier until you start seeing some signs of life you see it learning something so so there are various ways you can make it easier you can try doing some feature engineering so your input features you think that the you think that the policy should be a simple function of your input features like let's say you're trying to get pong to work and you tried setting it up with the images as input and you weren't learning anything then you can set up the problem where you pass in XY coordinates as input and then try running your algorithm and it's a much simpler function you're trying to learn so that's much more likely at work and then you can try to make it harder and harder until you're solving the full problem another way you can make it easier is by shaping the reward function that means you if you come up with some reward function that gives you fast feedback code on whether you're doing the right thing or not so let's say we can define one task where we have this reaching robot and we just give it a reward if it reaches if it hits the target so it gets a reward of one if it hits the target in zero otherwise so that might be hard to learn because you're not getting any feedback as you're flailing around but we could define a better shaped reward function where the where it's just distance to target then learning is going to be much faster in that problem there's also the problem on exactly how to turn your problem into a pom DP in the first place so so often it's not clear what your observation features should be and it's not even clear what the reward function should be so or it's not clear if this problem you're trying to solve is if it's feasible at all so so let's say you're trying to solve you you have some game or some robotics task or something new like and you you want to turn it into a reinforcement learning problem but you're not sure if this is feasible at all so the first thing to do is to just visualize a random policy acting on this problem and see see what happens so if the random policy occasionally does the right thing then there's a high chance of reinforcement learning is going to work because bringing forth a policy grading method is just going to take this random behavior and it's going to make the look the good behaviors more likely so it'll gradually like hone in on the good behaviors whereas if you're never doing the right thing then then there's RL isn't going to get any signal that tells it to do the right thing sometimes RL is able to learn even though it seems like it it's not clear how it's going to learn like learning how to walk it's not clear that that should work but because you would think that like you really have to have the whole thing in the whole policy in place before it does anything useful but as it turns out you sort of learn to take one step and then fall over and then take two steps and then fall over and so on until you've got a proper walking gait okay another thing to do is to make to make your observations make sure your observations are useable try to look at them as a human and see if you can control the system using the same observations you're giving to the agent so let's say you're doing some pre-processing on your images look at those pre processed images yourself and make sure you're not like losing too much detail when you downsample them or losing too much or in the color transformations and so on another thing to do is you want to make sure that everything is reasonably scaled so that for example well as a rule of thumb you usually want everything to be mean 0 and standard deviation 1 for the observations and for the rewards well it's a little less obvious but that's a reasonable heuristic so so you might want to like a scaler using some kind of filter I mean that's that's another good thing you can do but if you don't want to mess with some kind of filters on your observations and rewards what you can do it you can just kind of if you're allowed to define those yourself then you might want to just scale them yourself so what I'd recommend doing is plot histograms of all of your observations and your rewards and make sure that for each component of the observations and rewards you've scaled it properly so that it has the right mean insanity deviation and it doesn't have crazy outliers okay another thing to do is you should have some good baselines that you can use whenever you see a new when you whenever you see a new problem so just it's not clear which algorithm is going to work beforehand so make sure you you just have a bunch of a bunch of like well tune things that you can run on each problem yeah okay the question was if you're gonna do some kind of reward normalization should you do this over your whole training like all of your training data or just like the recent data I would yeah that's a there's a lot of subtlety there so I would say use all of your data so far because you're making everything non-stationary if you do some kind of filtering actually I'm going to talk about this at a later slide so anyway I would recommend as just a few baselines you should have a cross and to be method some policy grading methods some kind of cue learning or sarsa type method there's a lot of code online now that you can use other people's code that that's already written so you can use like we have this open AI baselines repository and also our L lab has a bunch of algorithms okay another thing to do which people often get tripped up on especially when they're trying to reproduce published work is so you implement the algorithm based on the paper and then it doesn't really learn anything at all and then you think oh maybe mike is my code like wrong or what happened so I would say early on you might need to run with more samples than expected so one hyper parameter that you can usually adjust is how big of a batch size to use or how many samples to use and I would say sometimes you should use more samples than you think you're going to need because usually things just work better when you have more samples almost always so often sometimes when you're trying to reproduce a published paper you've got it mostly right but not exactly right like maybe you haven't scaled everything properly or there's some like there's some really like obscure hyper parameter that you have wrong and then you just find that the code doesn't learn anything so then I would say just try to make it work a little bit and then you can work from there and try to tweak all the hyper parameters to to get up to the like to get fully up to the publish performance but if you want to just get something working at all often you need to use bigger batch sizes and you thought because if your batch size is too small than the nor the noise will overwhelm the signal and you won't learn anything so like for example for TRP oh I wasn't seeing any learning for a while and then it turned out it's just because I was using too small of a batch size and I had to use a hundred thousand time steps of a batch for the batch size but and for Atari they for dqn the type of parameters that were found to be best where you update every ten thousand time steps you update your queue function every ten thousand time steps and you have a 1 million time steps in your replay buffer which is a lot okay so now I'll talk about some guidelines for on for the ongoing development and tuning process as opposed to the initial process of I have a totally new problem or a new algorithm that I want to see some signs of life on so let's say you get something working I recommend looking how sensitive your algorithm is to every hyper parameter and if it's too sensitive it it's not actually a robust algorithm then you shouldn't be happy with it you probably just got luck lucky on that one problem and it's it's actually kind of possible to have a method that does that is a fluke and it works in one way because it's I mean one problem because of some funny dynamics but then it doesn't work in general so you kind of have to it need some serious improvements so yeah so that's okay there's also a few things you can look at to see that actually I'm going to talk about more of these kind of Diagnostics a little later but there are some indicators that'll tell you if that if your algorithm is working besides just looking at the final performance but other in look for other indicators that are going to tell you that your optimization process is kind of healthy so this is going to vary based on the algorithm but for example you can look at whether your value function is actually accurate like whether it's actually predicting returns well you can look at how big the updates are in terms of some either parameter space or the output space standard Diagnostics for deep networks like you can look at norms of gradients and so on okay one thing that takes some discipline but is very useful is to have a system for continually benchmarking your code and that includes all of your code not just the one thing you're tuning right now because often it's easy to tune your algorithm to work well in one problem and then mess up the performance on other problems and it's really easy to overfit on single problems when you're just adjusting hyper parameters so I'd really recommend having some kind of benchmark you can run frequently and some kind of battery of benchmarks that you've run occasionally along as similar lines of like overfitting of sort of reading too far into noise or over interpreting noise it's really easy to just to think you're improving your algorithm or you're making it worse but really you're just seeing random noise so so you can see seven different tasks these are the Jim Moo Joko tasks like half cheetah and hopper and so on and you have three different algorithms here the red one the green one and the blue one and you can see ok let's I mean we can see that the performance is a little different on all the problems but let's it looks like the the red let's see does the green which one looks like the best well it kind of varies by problem like the blue one looks better on this problem and the red one is worse on this problem and so on but as it turns out these are all the exact same algorithms and just random seeds different random seeds so so it's easy to imagine that you're just looking at one of these problems then you see that blue curve and you think you get really excited than you think you found some huge improvement to your algorithm but it's really that you just got a lucky seed that one run so yeah really you've got to run your algorithm multiple times an average and even if you're averaging over a lot of seeds like even if you had like 20 seeds here there's a still a pretty big error bar so it's yeah that makes it particularly hard I mean I'd recommend having like multiple tasks and multiple seeds and if you don't do that then you're probably just overfitting unless you see a really drastically large improvement another thing to do is it's easy to keep adding little modifications to your algorithm until it gets really complicated and then you're not sure and then you think you have this really complicated algorithm which is perfect but it turns out that most of the things you did are unnecessary because base some of the tricks substitute for each other this is often true because a lot of tricks help because they're like normalizing things in a better way or improving your optimization like making your optimization less susceptible to like big spikes I don't know a lot of different modifications you make have similar effects so so often you you can remove them and simplify your algorithm and this is pretty important so it's like especially with regard to changes that do whitening these kind of these kind of all substitute for each other and also substitute for changes to your optimization algorithm and yeah I would and I would simplify things because it's then it's more likely that your insights will generalize to other problems and also lastly it's pretty useful to automate your experiments because otherwise you're going to end up spending all your day your whole day just watching your code prints out numbers and and it's actually really it's it's really tempting to spend all day doing that but I would I mean especially if you need to run multiple random seeds then it's then you you really need to get your work flow down so the year you're automating this process and launching lots of experiments at the same time so I'd recommend just getting set up with one of these cloud computing services so you can just launch experiments on remote instances and pull the results back when you're done question oh yeah question is you have a recommendation on what framework to use to keep track of your experiment results I personally use no framework at all and I just have like ipython notebooks and scripts that collect a bunch of data that's stored in various log files so I just have scripts that read all my log files and plot them I don't use some people like having databases and stuff where they store all their hyper parameter results but on I think I don't find it necessary personally okay so now I'm let's see I'm going to talk about general tuning strategies for RL and then after that I'll talk about some specific tuning strategies for different classes of algorithms okay so one thing is widening or standardizing your data so if your observations have unknown range you should definitely standardize them I would do that by computing a running estimate of the mean and the standard deviation and then just transform it Z transforming it like this and I would recommend computing the mean and the standard deviation over all data you've seen so far not just your recent data because otherwise you're effectively changing your data in some way that the policy doesn't know about like you have your that your policy grading algorithm doesn't know about like your policy grading algorithm is actually optimizing some objective so then if you just go and change the problem out from under it then you're often going to make things a lot worse like if you rescale your observations then your optimization algorithm didn't know about that so you might just collapse the performance so that's why I would recommend using your whole all of your data from the start of time so that at least it's going to slow down over time how fast it's how fast your scalings are changing so yeah that's what I would recommend doing with the observations and for the rewards I'd recommend rescaling it but not shifting them because that affects the agents will to live so if you if you shift the mean reward that'll affect whether how long it wants to survive you're actually changing the problem ok another yeah you might also want to try to standardize prediction targets in the same way though that's a little more complicated to do using okay yeah so question is what about pca widening instead of just this element why scaling yeah that could that could definitely help I haven't I haven't experimented with that but yeah that could help it's hard to predict with like with neural nets if it's going to help or not because they seem to be pretty good at disentangling things so I know that if you have things that are terribly scaled like they're from negative one thousand two one thousand and other coordinates are from negative point point one then it's gonna be slow for learning so this kind of scaling helps a lot even though you're having their own networks okay there's some parameters that are really generally important like discount factor that determines whether you're that determines how long how far away you're doing credit assignments so whether you're paying attention to effects that are delayed by a certain time so if your discount is gamma equals point 99 then you're basically ignoring effects that are more delayed by a hundred time steps so so you're kind of short-sighted that gamma is controlling your shortsightedness and you might want to actually look at if how long that corresponds to in real time so usually in reinforcement learning you're sort of discretizing time in a certain way and it's worth paying attention to like is that 100 time steps like three seconds of real time or what and what happens during that time also note that if you have TD lamda kind of methods for either for value function estimation or for policy grading methods you can get away with using a Lambda gamma that's really close to one like 0.999 and things aren't going to go unstable because if you have a lower land of like 0.9 then that's going to make it so the algorithm is still stable even though gamma is really close to one also okay so so as I mentioned you might want to in in practice we're usually discretizing some continuous-time system so then it's worth seeing if the problem can actually be solved at this discretization level so so for example in a game let's say you're you're doing frame skip a meaning that you repeat the action multiple times as a human can you control it at this rate or is it just impossible to control is it just too like you're doing the action too many times in a row and you have to slow responses to control it and I would also just look at the what the random exploration looks like and if you make sure that you're exploring like the the this Croatian is going to determine like how far your Brownian motion goes because if you're doing the same action many times in a row then you're going to be able to then you're going to tend to explore further so so it's worth just looking at what the random exploration does and and choosing your time discretization in a sensible way so that it does interesting things question yeah so the question is if you have a DQ n how would you get started like tuning it with tuning all the hyper parameters actually I'm going to talk about DQ n pretty soon so yeah I'll get to that okay also look at the episode returns very closely look at don't just look at the mean look at the minimum and the maximum so the maximum especially if you have a deterministic system if you have a certain maximum return that's basically something that your policy can hone in on pretty straightforwardly because if if you just do that every time then you're going to increase your mean return to that level so so it's worth so so it's useful to look at the max return to see if your policy is ever doing like the right thing according to that max return or if it's just kind of stuck and it's never discovering the high return strategy also look at the episode in length which is sometimes more informative than the episode reward like if because sometimes well yeah I won't go into details on that like well if you have a game you're it might mean that like you might be losing every time so you're never seeing yourself win but the episode length will tell you if you're losing slower so you might see an improvement in episode length at the beginning but not in reward okay for Policy gradient there are specific strategies or prediction there are specific Diagnostics that are really helpful so look at the entropy really carefully if your entropy is going down too fast that means your policy is becoming deterministic and it's not going to explore anything so so be careful and also if it's not going down your policy is never going to be that good because it's always really random else so you can sort of alleviate this issue by using an entropy bonus or a KL penalty so by stopping yourself from move changing the policy the probability distribution too fast as a side effect you also prevent the entropy from going down too fast when you use the KL penalties I also look at the KL as a diagnostic like look at how big of an update you're doing in terms of KL divergence if your KL is like 0.01 that's a pretty small update but if it's like a 10 that's a really big update question oh yeah how do you question is how do you measure entropy so so if you have for most policies you can compute the entropy analytically so if you have a discrete action space then you usually can just compute it analytically and if you have a continuous policy you're usually you're using a Gaussian distribution or something so you can compute the differential entropy analytically so here we're talking about entropy in action space so the average over state space of the action space entropy what you actually might care about even more is the entropy in state space but you have no hope at actually calculating that except maybe to do some really crude approximation of it okay yeah so KL is really useful look at explain variants like whether your value function is actually explaining is actually a good predictor of the returns or if it's just worse than predicting nothing so if you just predict zeroes then your explained variance is zero but sometimes if you have some neural network that's predicting then you find that it's actually negative because it's overfitting or it's just noisy and it's not doing anything useful so that probably means you need to tune some hyper parameters so that your neural networks actually predicting better than the constant predicting zero question okay yeah question is why does the KL spike give you a loss in performance well it doesn't always be a lot it's not always a loss in performance sometimes it's a gain in performance but in practice it's usually a loss in performance because it usually the approximation that your policy gradient is just taking you way outside the region where your local approximation to the policy performance is accurate so you're you're probably just overshooting like if you take your policy and you take a really big step in any direction you're probably making it worse so so that's so usually if you take a big step you're getting worse like if you have a convex function if you take a big step in any direction you're probably going to make it worse let's see okay initialize your policy that's pretty important more important than in supervised learning because that in determines what data you're going to see initially and you're going to learn from at the beginning so I would recommend using have initializing the final layer to be either zero or really small so that at least you you have the maximum and you sort of explore randomly at the beginning we randomly at the beginning as opposed to having some kind of particular like policy that has a strong opinion on the right thing to do which is based on no information at all okay that's for Policy gradient for Q learning so a few thing a few things one is okay you often it helps to have a really big replay buffer and to be able to do this you need to be a little careful about memory usage so it's worth putting in the extra effort to do that learning rate schedules are often quite helpful here in practice as our exploration schedules so in qdq any you're usually using epsilon greedy and it often helps to do to play with the schedule on that also it converges pretty slowly and it has a miss serious warmup period at the beginning often so so sometimes you just so I actually have a lot of admiration for the authors who originally got this the people people who got this to work originally because they had to just let their code run for a while before it did anything so so you have to have a lot of patience - a lot of bravery to do that ok this is just miscellaneous advice for not necessarily for tuning algorithms but just for for personal development so I recommend reading older textbooks and theses not just the latest conference papers because often they up in them like there are more dense source of useful information whereas each conference paper just has one idea ok yeah don't get too stuck on problems because often you actually have a legitimately good algorithm but it's has like some flaws so its might fail miserably at some easy problem so in RL there's some like simple problems like cart will swing up where you have this stick and you're trying to swing it up by moving the cart around and this problem like you might have a great algorithm but it's gonna in my but like some of the state-of-the-art algorithms are gonna fail on that problem unless you really tuned them carefully and that's just because maybe it's not exactly the right problem to start to I mean maybe like the thing that makes this problem hard is not the thing that your algorithm is doing that's interesting so you might have like come up with a better policy grading method but still it'll converge to the same local minimum on that swing up problem and you're not gonna fix that problem so I I would say just don't get too stuck on a single problem that your method bails on and enough in like maybe the ultimate algorithm will solve all of these problems but we're not there yet so you might as well just try to improve and some like decently large subset of problems so also like one funny thing is the dqn performs pretty poorly on a lot of problems especially with continuous control I think it does I mean for cartful it probably solves it pretty well if with a reasonable amount of tuning but some of the other like fairly small continuous control problems it fails on but that doesn't mean it's like that doesn't mean it's a bad algorithm because it solves a different problem extremely well so yeah I would say just these these things are at least right now it's not gonna you shouldn't expect to be able to solve everything with the same method without any tuning also techniques from supervised learning often don't transfer over to reinforcement learning so so don't be surprised if you find that I guess that's not I said this slide was gonna be a bad personal development that's not about personal development but yeah I guess this is just a grab bag of miscellaneous advice so yeah so like Bachner a lot of people look at what people are doing in RL and they think why aren't you using batch norm or drop out or or big networks why are you using like two layers of 64 units and it's not like people didn't think of trying these other things they tried them and then they found that those other like architectures and methods don't actually help here I mean if you figure out how to make batch norm and drop out actually help in RL that'll actually be really great and a big that would be a big development but yeah I don't know it's not totally straightforward all right that's all thank you okay I have a few minutes for questions yeah so the question is how long do you wait until you've decided that your algorithm is new at work either because your code is wrong or it's just too hard I don't have a good general answer to that I think the problem is worse for some algorithms and others I'd say for policy gradient methods you don't see that burnin period as much like often if it's going to learn it'll learn it at the beginning but that's not always true either I mean sometimes it will kind of take some time to get into the right numerical regime so I don't yeah I don't have general advice I would say you have to just I would say go back and start with the easy problems and you'll get some intuition about whether you're you should expect a you should expect a burn in period or not where it's not learning anything see I want to get some people in the back because okay oh yeah question is - do I use unit tests I use unit tests for code that where there's it's doing a very particular mathematical thing that you can actually write a test for like let's say I'm computing the KL divergence then I'll write a test to check I don't know their various ways of testing it so and it's easy to get those things wrong like you have it's I don't know as you're off by a constant or something so yeah I would write tests for I write tests for things where it's nothing that there's a very well-defined correct thing to do it's harder to write it for an algorithm where it has a lot of different moving parts where you it's not clear how fast it should learn and it's also there's some randomness involved so if you try to write a test saying I should be at performance 100 after this many iterations it might fail just out of random noise but yeah I think probably unit tests are a good idea oh yeah so the question is do I have guidelines on matching the algorithm to the task like when to use policy gradients versus a value iteration style method it's yeah it's hard to give some general guidelines I think people have found that and and the guidelines I give you might just be just be kind of historical accidents like someone got this to work here and this to work there so I think the well certainly if you don't care that much about sample complexity policy gradient methods are are probably are probably the way to go if you don't care about sample complexity or using off policy data then policy grading methods are probably the safest bet because you I don't know it's more understandable exactly what it's doing it's just doing gradient descent whereas q-learning it's a little bit indirect what it's doing so it's and it in practice is more finicky yeah if you do care about sample complexity though or need off policy data then hue learning is usually better or yeah or a few students like sample complexity is relevant if your simulator is expensive of course I would also say that people have found that dqn and relatives have worked well on game-like tasks with images as input whereas policy grading methods work better on the continuous control tasks like these robotic locomotion problems though that this might not be fundamental it might be more of a historical accident let's oh yeah recommendations on older textbooks let's see there's like brutes a cuss for take us as books that's approximate dinette what is it optimal control and why am i blanking on the name optimal control and dynamic programming something like that and the set I mean sudden embargo is a good one to read butterman has a textbook kind of a classic textbook on Markov decision processes that's in the RL space then there's books on numerical optimization that are good and yeah I'd say obviously the machine learning textbooks have a lot of good material that might be useful in the RL setting too oh yeah can I comment on evolution strategies and the blog posts the the opening I blog post on it let's see do you have any specific questions about it or like how it compares oh yeah okay yeah yeah so there's there's a lot of policy grading methods out there and some of them are quite complicated so we've had a couple of talks on them so far like all these different work it's excessively more complicated policy grading methods but then there's this old algorithm called evolution strategies which is an extremely simple algorithm and and there's a paper by some of my colleagues where they show it was called evolutionary strategies as a scaleable alternative to reinforcement learning which really meant like to policy grading methods so and they claimed that it worked basically as well as policy grading methods or at least it's sort of in the same order oh and beer is one of the authors of that paper so the claim was that it works it works like similarly well to policy grading method so why should we bother with these policy grading methods if es works just as well well I think in practice it works well it works but it works not not as well like it's it takes me the sample complexity is is is worse by some constant factor or it's not clear that it's a constant factor or if this factor scales with the size of the network but it's it is a lot it is significantly slower and the question is just what is that constant factor so is that constant factor like one or is it three or is it 10 or 100 so that's not that's going to vary between problems and also the that paper had some innovations in exactly how to parameterize the networks and so forth that made everything better numerically everything better scaled so that yes did work well but I would say that if you that it's usually quote like I don't know it's usually a pretty decent constant factor slower than policy grading methods especially the more advanced ones like the PPO and actor so so i'm i think it's it's not really a clear win in the RL setting where policy gradients work I think if policy gradients work it's usually going to be a lot better and the es is going to be is going to be better on problems where policy gradients aren't going to work for some reason like if you've got really long you depend time dependencies where the discounts are gonna are gonna ignore them then es might be less sensitive to that let's see I think I'm okay last question oh yeah favorite hyper parameter optimization framework I've used some of these than I just like to use the uniform random sampling yeah that works really well I mean you just run a bunch of experiments with random hyper parameters and then you just look at the results the next day and to do some regression to figure out which parameters actually mattered and then you've run another experiment with better parameter ranges and so on so I use the human version of it because often it's just - it's like a it's it's useful to be able to look at the results yourself - to get some to figure out which parameters actually matter so you're not wasting a lot of computation because that information transferred between problems all right [Applause]