Refine cross-validation strategy descriptions and improve manuscript clarity in…

Refine cross-validation strategy descriptions and improve manuscript clarity in the methodology section

Refine cross-validation strategy descriptions and improve manuscript clarity in…
af3b872e · Joaquín Irazábal González · ac17fc59 · af3b872e
Commit af3b872e authored May 26, 2026 by Joaquín Irazábal González
Show whitespace changes
Inline Side-by-side

Showing with 9 additions and 9 deletions

ComparisonSurrogatesOptimizationBDSL.tex Manuscript/ComparisonSurrogatesOptimizationBDSL.tex +9 -9

No files found.
--- a/Manuscript/ComparisonSurrogatesOptimizationBDSL.tex
+++ b/Manuscript/ComparisonSurrogatesOptimizationBDSL.tex
@@ -256,7 +256,7 @@ For every geometry family, a separate regression model is trained for each targe
 Hyperparameters are optimized using Bayesian optimization \cite{Snoek2012}, with 40 evaluations per model and RMSE as the refit criterion. The first 10 evaluations are randomly sampled to explore the search space, while the remaining 30 are guided by the Bayesian surrogate model. This strategy provides a more efficient alternative to exhaustive grid search, particularly considering the number of geometry families, target outputs and candidate algorithms analysed. It also allows a broader exploration of continuous hyperparameter ranges. The hyperparameter search spaces considered are summarized in Appendix \ref{app:hyperparameter_search_spaces}.
-The cross-validation strategy is adapted to the dataset size. Leave-One-Out validation is used for $N\leq20$, repeated four-fold cross-validation with five repetitions for $21\leq N\leq80$, and shuffled five-fold cross-validation for larger datasets. For small datasets, the search spaces of tree-based models are additionally restricted to reduce overfitting.
+The cross-validation strategy is adapted to the dataset size. Leave-One-Out validation is used for $N\leq20$, repeated four-fold cross-validation with five repetitions for $21\leq N\leq80$ and five-fold cross-validation for larger datasets. For small datasets, the search spaces of tree-based models are additionally restricted to reduce overfitting.
 Model selection is performed in two stages. First, for each candidate algorithm, Bayesian hyperparameter optimization is carried out using cross-validated Root Mean Squared Error (RMSE) as the refit criterion. The best hyperparameter configuration is therefore the one with the lowest mean RMSE. Then, the best configurations from all candidate algorithms are compared. The model with the lowest mean RMSE defines a competitive threshold and all models with an RMSE within 5\% of this value are retained. Among these competitive models, the final selection is based on the lowest relative RMSE dispersion, computed as the standard deviation of the fold-wise RMSE divided by the mean RMSE. If two models have the same dispersion, the one with the lower mean RMSE is preferred. This procedure, summarized in Figure \ref{fig:BayesianSearchCV}, favours surrogates that are both accurate and stable.
@@ -286,7 +286,7 @@ For each output variable, a final RBF surrogate is trained using all available F
 The proposed methodology seeks to balance damage among the dissipative windows while keeping it below a prescribed threshold. At the same time, it limits damage in the surrounding frame and promotes the highest possible energy dissipation through the activation of the windows. The geometric optimization is carried out using DE \cite{Storn1997}, a population-based global optimizer that does not require gradient information and is therefore suitable for nonlinear and non-convex surrogate response surfaces. In the current implementation, DE is run with a maximum of 500 iterations, a population size factor of 25 and a convergence tolerance of $10^{-6}$. Once an optimal candidate is obtained, an adaptive FEM validation loop is applied to verify the predicted geometry before acceptance.
-For each candidate geometry $\mathbf{x}$, the trained surrogate models predict the window distortions $\hat{\varepsilon}_{xy,i}$, the window damage indicators $\hat{\mathcal{D}}_i$, and the frame damage indicator $\hat{\mathcal{D}}_f$. Damage is therefore controlled in all regions of the device, but with different mechanical relevance: frame damage is penalized more severely because it may compromise the structural integrity of the damper, whereas the window damage penalties are formulated to promote comparable damage levels among windows and avoid concentrating the dissipative demand in a single region. The dissipative contribution is estimated from $\hat{\varepsilon}_{xy,i}^2$, the window thickness and the corresponding area of the window, since the energy dissipated by each window depends not only on the distortion level but also on the amount of material involved. This term is several orders of magnitude smaller than the damage penalties and is intentionally left unscaled. As a result, damage control remains the dominant criterion, while the dissipative term acts as a tie-breaker among geometries with similar damage performance, favouring those with higher distortion and, consequently, greater energy dissipation capacity.
+For each candidate geometry $\mathbf{x}$, the trained surrogate models predict the window distortions $\hat{\varepsilon}_{xy,i}$, the window damage indicators $\hat{\mathcal{D}}_i$ and the frame damage indicator $\hat{\mathcal{D}}_f$. Damage is therefore controlled in all regions of the device, but with different mechanical relevance: frame damage is penalized more severely because it may compromise the structural integrity of the damper, whereas the window damage penalties are formulated to promote comparable damage levels among windows and avoid concentrating the dissipative demand in a single region. The dissipative contribution is estimated from $\hat{\varepsilon}_{xy,i}^2$, the window thickness and the corresponding area of the window, since the energy dissipated by each window depends not only on the distortion level but also on the amount of material involved. This term is several orders of magnitude smaller than the damage penalties and is intentionally left unscaled. As a result, damage control remains the dominant criterion, while the dissipative term acts as a tie-breaker among geometries with similar damage performance, favouring those with higher distortion and, consequently, greater energy dissipation capacity.
 The implemented objective function to be minimized is
 \begin{equation}
@@ -341,7 +341,7 @@ The supervised-learning comparison shows a clear hierarchy among the candidate s
 These results indicate that kernel-based models are particularly well suited to the present surrogate task. SVR provides the best compromise between accuracy and computational cost, while GPR is the second most competitive supervised strategy, especially in some higher-dimensional cases. Tree-based models, although robust, are less frequently selected and MLP models are not competitive in terms of computational efficiency for the dataset sizes considered here. As mentioned in previous sections, this behaviour motivated the additional evaluation of RBF interpolation as a simpler surrogate alternative. In contrast to the supervised models, RBF models were trained in less than one second per output, making them especially attractive for repeated surrogate updates within the adaptive optimization loop.
-The FEM validation of the optimized geometries is summarized in Table~\ref{tab:final_surrogate_comparison}, while the complete optimization results for all adaptive iterations are provided in Appendix~\ref{appendix:optimization_results}. For each geometry family and surrogate strategy, Table~\ref{tab:final_surrogate_comparison} reports the final accepted adaptive iteration, the optimized window thicknesses $\mathbf{t}_w^{\star}$, the surrogate-predicted objective value $J_{\mathrm{surr}}$, the corresponding FEM-recomputed objective value $J_{\mathrm{FEM}}$, and the associated validation errors, $|e_J|$ and $e_{\max}$. The maximum variable error, $e_{\max}$, is defined as the largest relative error among all quantities entering the objective function, namely ${\Exy}_i$, $\TFD_i$ and $\TFD_f$, whereas $|e_J|$ denotes the absolute objective-function error. The optimization required between two and three adaptive iterations depending on the geometry family and surrogate type, with most cases converging after three iterations. No systematic difference in the number of iterations was observed between RBF and supervised ML surrogates.
+The FEM validation of the optimized geometries is summarized in Table~\ref{tab:final_surrogate_comparison}, while the complete optimization results for all adaptive iterations are provided in Appendix~\ref{appendix:optimization_results}. For each geometry family and surrogate strategy, Table~\ref{tab:final_surrogate_comparison} reports the final accepted adaptive iteration, the optimized window thicknesses $\mathbf{t}_w^{\star}$, the surrogate-predicted objective value $J_{\mathrm{surr}}$, the corresponding FEM-recomputed objective value $J_{\mathrm{FEM}}$ and the associated validation errors, $|e_J|$ and $e_{\max}$. The maximum variable error, $e_{\max}$, is defined as the largest relative error among all quantities entering the objective function, namely ${\Exy}_i$, $\TFD_i$ and $\TFD_f$, whereas $|e_J|$ denotes the absolute objective-function error. The optimization required between two and three adaptive iterations depending on the geometry family and surrogate type, with most cases converging after three iterations. No systematic difference in the number of iterations was observed between RBF and supervised ML surrogates.
 \begin{table*}[ht!]
 \centering
@@ -410,7 +410,7 @@ Figure~\ref{fig:optimized_window_thickness_evolution} shows the evolution of the
  \label{fig:optimized_window_thickness_evolution}
 \end{figure*}
-From a methodological point of view, these results highlight the trade-off between surrogate complexity, accuracy and computational efficiency. Supervised models, particularly SVR and GPR, provide high predictive accuracy and robustness, but require hyperparameter optimization and cross-validation for every output variable and adaptive iteration. RBF interpolation, in contrast, has a much lower training cost and provides very competitive final predictions once the relevant regions of the design domain have been adaptively sampled. Therefore, the comparison does not identify a universally superior surrogate strategy. Instead, it suggests that RBF interpolation is especially suitable for low- to moderate-dimensional design spaces with well-distributed FEM samples and relatively smooth input--output relationships, as occurs for the response variables analysed in this work. Supervised ML surrogates remain valuable when greater robustness is required, or when the response surface is expected to involve stronger nonlinear interactions, local irregularities or higher-dimensional dependencies.
+From a methodological point of view, these results highlight the trade-off between surrogate complexity, accuracy and computational efficiency. Supervised models, particularly SVR and GPR, provide high predictive accuracy and robustness, but require hyperparameter optimization and cross-validation for every output variable and adaptive iteration. RBF interpolation, in contrast, has a much lower training cost and provides very competitive final predictions once the relevant regions of the design domain have been adaptively sampled. Therefore, the comparison does not identify a universally superior surrogate strategy. Instead, it suggests that RBF interpolation is especially suitable for low- to moderate-dimensional design spaces with well-distributed FEM samples and relatively smooth input--output relationships, as occurs for the response variables analysed in this work. Supervised ML surrogates remain valuable when greater robustness is required or when the response surface is expected to involve stronger nonlinear interactions, local irregularities or higher-dimensional dependencies.
 The objective-function values should also be interpreted carefully. The objective-function error measures the consistency between surrogate predictions and FEM validation, not whether the final objective value is necessarily close to zero. Some geometry families, such as $F_2$, retain non-negligible penalty contributions because the prescribed damage targets cannot be fully achieved within the admissible design bounds. Nevertheless, the close agreement between surrogate and FEM objective values indicates that the accepted designs are not artifacts of surrogate extrapolation, but FEM-consistent optimized candidates within the explored design space.
@@ -433,9 +433,9 @@ A comprehensive comparison of surrogate strategies was performed in terms of pre
 The proposed adaptive validation loop proved to be necessary and effective. Several initially optimized candidates did not satisfy the prescribed error tolerances. After incorporating the new FEM results into the training dataset and retraining the surrogates, the prediction errors decreased and all final optimized geometries satisfied the acceptance criteria after only two or three iterations. Therefore, the final designs are not accepted solely on the basis of surrogate predictions, but are explicitly verified through FEM in the region of the design space where the optimum is located.
-It is also worth noting that, although the framework allows the DoE to be expanded with additional FEM simulations when the surrogate accuracy is insufficient, this was not required in the present application. The initial DoE datasets were already adequate to obtain accurate optimized designs after adaptive validation and retraining. In particular, the final geometries were obtained from only 8 initial FEM simulations for the two-window devices, 16 for the three-window devices, and 64 for the five-window devices, with a maximum of three adaptive retraining iterations in all cases. This demonstrates that the proposed strategy can achieve FEM-consistent optimized designs with a limited number of high-fidelity simulations, making it highly competitive from a computational point of view.
+It is also worth noting that, although the framework allows the DoE to be expanded with additional FEM simulations when the surrogate accuracy is insufficient, this was not required in the present application. The initial DoE datasets were already adequate to obtain accurate optimized designs after adaptive validation and retraining. In particular, the final geometries were obtained from only 8 initial FEM simulations for the two-window devices, 16 for the three-window devices and 64 for the five-window devices, with a maximum of three adaptive retraining iterations in all cases. This demonstrates that the proposed strategy can achieve FEM-consistent optimized designs with a limited number of simulations, making it competitive from a computational point of view.
-The proposed methodology also has some limitations that should be acknowledged. First, its reliability depends on the quality of the calibrated FEM model used to generate the training data and validate the optimized designs. Second, the TFDMap is used here as a post-processing damage indicator rather than as a constitutive fracture model; therefore, the optimized configurations should be interpreted in terms of relative damage control and proximity to critical states, not as direct predictions of crack initiation. Third, only the window thicknesses are considered as design variables. Although this leads to a controlled and interpretable optimization problem, it does not exploit the full geometric flexibility of BDSL dampers. Finally, the optimized geometries should ultimately be validated experimentally before being used to establish general design recommendations.
+The proposed methodology also has some limitations that should be acknowledged. First, its reliability depends on the quality of the calibrated FEM model used to generate the training data and validate the optimized designs. Second, the TFDMap is used here as a post-processing damage indicator rather than as a constitutive fracture model; therefore, the optimized configurations should be interpreted in terms of relative damage control and proximity to critical states, not as direct predictions of crack initiation. Third, only the window thicknesses are considered as design variables; although this leads to a controlled and interpretable optimization problem, it does not exploit the full geometric flexibility of BDSL dampers. Finally, the optimized geometries should ultimately be validated experimentally before being used to establish general design recommendations.
 Future work should extend the design space by including additional geometric and mechanical variables, such as window height, window spacing, frame thickness or global device proportions. This extension would increase the dimensionality and complexity of the surrogate task. In those cases, the performance of RBF interpolation should therefore be reassessed. While RBF models performed very well in the present study, their efficiency and accuracy may decrease as the input space becomes larger or the response surfaces develop stronger local nonlinearities. In such cases, supervised ML models or hybrid surrogate strategies may become more advantageous.
@@ -470,7 +470,7 @@ Additional information related to this study is available from the corresponding
 \label{app:hyperparameter_search_spaces}
 \vspace*{12pt}
-Table~\ref{tab:cv_hyperparameter_settings} summarizes the hyperparameter search spaces used for the Bayesian optimization of the supervised surrogate models. The same ranges are used for kernel-based and neural-network models across all dataset sizes, whereas the search spaces of tree-based models are automatically restricted for small datasets to reduce overfitting. The cross-validation strategy is also adapted to the dataset size: Leave-One-Out validation is used for $N\leq20$, repeated four-fold cross-validation with five repetitions for $21\leq N\leq80$, and shuffled five-fold cross-validation for larger datasets.
+Table~\ref{tab:cv_hyperparameter_settings} summarizes the hyperparameter search spaces used for the Bayesian optimization of the supervised surrogate models. The same ranges are used for kernel-based and neural-network models across all dataset sizes, whereas the search spaces of tree-based models are automatically restricted for small datasets to reduce overfitting. The cross-validation strategy is also adapted to the dataset size: Leave-One-Out validation is used for $N\leq20$, repeated four-fold cross-validation with five repetitions for $21\leq N\leq80$ and five-fold cross-validation for larger datasets.
 \begin{table*}[htbp]
 \centering
@@ -557,7 +557,7 @@ Model & Preprocessing / kernel & Hyperparameter & Search space \\
 \end{tabular}
 \vspace{2mm}
-\parbox{0.95\textwidth}{\footnotesize \textit{Note:} For tree-based models, the hyperparameter ranges are adapted to the dataset size: S denotes small sample $N\leq20$, M denotes medium sample $21\leq N\leq80$, and L denotes large sample $N>80$. For the remaining models, the same search spaces are used for all dataset sizes.}
+\parbox{0.95\textwidth}{\footnotesize \textit{Note:} For tree-based models, the hyperparameter ranges are adapted to the dataset size: S denotes small sample $N\leq20$, M denotes medium sample $21\leq N\leq80$ and L denotes large sample $N>80$. For the remaining models, the same search spaces are used for all dataset sizes.}
 \end{table*}
@@ -565,7 +565,7 @@ Model & Preprocessing / kernel & Hyperparameter & Search space \\
 \label{appendix:optimization_results}
 \vspace*{12pt}
-This appendix summarizes the surrogate-predicted optimized configurations, the corresponding FEM validation values and the associated errors for all adaptive iterations. The source ``Surr.'' denotes the values predicted by the surrogate model during the optimization, whereas ``FEM'' denotes the values recomputed with the high-fidelity numerical model. Error rows report the relative error for the response variables and the absolute error for the objective function.
+This appendix summarizes the surrogate-predicted optimized configurations, the corresponding FEM validation values and the associated errors for all adaptive iterations. The source ``Surr.'' denotes the values predicted by the surrogate model during the optimization, whereas ``FEM'' denotes the values recomputed with the numerical model. Error rows report the relative error for the response variables and the absolute error for the objective function.
 \begin{table*}[htbp]
 \centering