This study describes the software methodology designed for systematic benchmarking of bipedal systems through the computation of performance indicators from data collected during an experimentation stage. Under the umbrella of the European project Eurobench, we collected approximately 30 protocols with related testbeds and scoring algorithms, aiming at characterizing the performances of humanoids, exoskeletons, and/or prosthesis under different conditions. The main challenge addressed in this study concerns the standardization of the scoring process to permit a systematic benchmark of the experiments. The complexity of this process is mainly due to the lack of consistency in how to store and organize experimental data, how to define the input and output of benchmarking algorithms, and how to implement these algorithms. We propose a simple but efficient methodology for preparing scoring algorithms, to ensure reproducibility and replicability of results. This methodology mainly constrains the interface of the software and enables the engineer to develop his/her metric in his/her favorite language. Continuous integration and deployment tools are then used to verify the replicability of the software and to generate an executable instance independent of the language through dockerization. This article presents this methodology and points at all the metrics and documentation repositories designed with this policy in Eurobench. Applying this approach to other protocols and metrics would ease the reproduction, replication, and comparison of experiments.