What are some hyperparameter tuning strategies?
There are several strategies for hypertuning but I would argue that the three most popular nowadays are the following:

Grid Search is an exhaustive approach such that for each hyperparameter, the user needs to manually give a list of values for the algorithm to try. After these values are selected, grid search then evaluates the algorithm using each and every combination of hyperparameters and returns the combination that gives the optimal result (i.e. lowest MAE). Because grid search evaluates the given algorithm using all combinations, it’s easy to see that this can be quite computationally expensive and can lead to suboptimal results specifically since the user needs to specify specific values for these hyperparameters, which is prone for error and requires domain knowledge.

Random Search is similar to grid search but differs in the sense that rather than specifying which values to try for each hyperparameter, an upper and lower bound of values for each hyperparameter is given instead. With uniform probability, random values within these bounds are then chosen and similarly, the best combination is returned to the user. Although this seems less intuitive, no domain knowledge is necessary and theoretically much more of the parameter space can be explored.

In a completely different framework, Bayesian Optimization is thought of as a more statistical way of optimization and is commonly used when using neural networks, specifically since one evaluation of a neural network can be computationally costly. In numerous research papers, this method heavily outperforms Grid Search and Random Search and is currently used on the Google Cloud Platform as well as AWS. Because an indepth explanation requires a heavy background in bayesian statistics and gaussian processes (and maybe even some game theory), a “simple” explanation is that a much simpler/faster acquisition function intelligently chooses (using a surrogate function such as probability of improvement or GPUCB) which hyperparameter values to try on the computationally expensive, original algorithm. Using the result of the initial combination of values on the expensive/original function, the acquisition function takes the result of the expensive/original algorithm into account and uses it as its prior knowledge to again come up with another set of hyperparameters to choose during the next iteration. This process continues either for a specified number of iterations or for a specified amount of time and similarly the combination of hyperparameters that performs the best on the expensive/original algorithm is chosen.