Semiparametric and distance-based methodologies with applications to bioinformatics, finance and risk management


The center (of gravity) of this project is the study of a collection of statistical methods, among which distance based ones play a predominant role. Their main characteristics are:

1)    They originate in the need to supply solutions to problems arising in quite diverse scientific disciplines, ranging from Medicine or Sociology to Actuarial or Financial Risk Management.

2)    They are capable of handling large or huge datasets, heterogeneous data from several information sources and possibly having a partially or totally non numeric nature (e.g., functional data, financial series, DNA sequences).

3)    They take advantage of concepts from Classical or Bayesian Statistics together with resources from Machine Learning, which prompts us to name them Semiparametric Methods.

The balanced theoretical and practical orientation of this research facilitates transfer of its results to application areas.

Research topics:

Distance-Based Methods, Goodness-of-fit, Semiparametric Methods for Volatility Models

Research team:

Universidad Carlos III de Madrid

Universitat de Barcelona

Inst. Català d’Oncologia












J Miguel Marín



Eva Boj

Josep Fortiana

Anna Esteve

Recent contributions (international journals only):

[1]   Albarrán, I., Alonso, P.J. and Marin, J.M. (2011) Non-linear models of disability and age applied to Census Data. Journal of Applied Statistics, 38, 10, p. 2151-2163.

[2]   Arribas-Gil, A. and Romo, J. (2012) Robust depth-based estimation in the time warping problem. Biostatistics,13 (3), 398-414. on-line version

[3]   Arribas-Gil, A. and Matias, C. (2012) A context dependent pair hidden Markov model for statistical alignment. Statistical Applications in Genetics and Molecular Biology, 11(1), 1-29.

[4]   Batista-Foguet, J.M., Fortiana, J., Currie, C., Villalbí, J.R. (2004). Socio-Economic indexes in Surveys for Comparisons between Countries. Social Indicators Research 67, 315 - 332.

[5]   Boj, E., Claramunt, M.M., Fortiana, J. (2007). Selection of predictors in distance-based regression. Communications in Statistics-Simulation and Computation 36 (1), 87–98.

[6]   Boj, E., Claramunt, M.M., Grané, A., Fortiana, J. (2007). Implementing PLS for distance-based regression: computational issues. Computational Statistics 22 (2), 237–248.

[7]   Boj, E., Claramunt, M.M., Grané, A., Fortiana, J. (2009). Projection Error Term in Gower's Interpolation. Journal of Statistical Planning and Inference 139, 1867–1878.

[8]   Boj, E., Delicado, P., Fortiana, J. (2010). Distance-based local linear regression for functional predictors. Computational Statistics and Data Analysis 54, 429–437.

[9]   Buzon MJ, Erkizia I, Pou C, Minuesa G, Puertas MC, Esteve A, Castello A, Santos JR, Prado JG, Izquierdo-Useros N, Pattery T, Van Houtte M, Carrasco L, Clotet B, Ruiz L, Martinez-Picado J.(2012) A non-infectious cell-based phenotypic assay for the assessment of HIV-1 susceptibility to protease inhibitors. Journal of Antimicrobial Chemotherapy 67(1):32-8.

[10] Carnicer-Pont D, Almeda J, Luis Marin J, Martinez C, Gonzalez-Soler MV, Montoliu A, Muñoz R, Casabona J; HIV NADO working group (2011) Unlinked anonymous testing to estimate HIV prevalence among pregnant women in Catalonia, Spain, 1994 to 2009. Eurosurveillance, 16(32), pii: 19940. Erratum in: Eurosurveillance, 2011,16(33).pii/19945.

[11] Esteve, A., Boj, E., Fortiana, J. (2009). Interaction Terms in Distance-Based Regression. Communications in Statistics Part A-Theory and Methods 38, 3499-3509.

[12] Folch C, Casabona J, Brugal MT, Majó X, Esteve A, Meroño M, Gonzalez V; REDAN Study Group (2011) Sexually transmitted infections and sexual practices among injecting drug users in harm reduction centers in Catalonia. European Addiction Research,17(5), 271-8, Epub 2011 Jul 27.

[13] Fortiana, J., Grané, A. (2002). A scale-free goodness-of-fit statistic for the exponential distribution based on maximum correlations, Journal of Statistical Planning and Inference 108, 85 – 97.

[14] Fortiana, J., Grané, A. (2003). Goodness-of-fit tests based on maximum correlations and their orthogonal decompositions. Journal of the Royal Statistical Society Series B-Methodological 65, 115 -126.

[15] Gómez, E., Gómez-Villegas, M.A., Marín, J.M. (2002). Continuous elliptical and exponential power linear dynamic models. Journal of Multivariate Analysis 83, 22 -36.

[16] Gómez, E., Gómez-Villegas, M.A., Marín, J.M. (2002). A matrix variate generalization of the power exponential family of distributions. Communications in Statistics-Theory and Methods 31, 2167 - 2182.

[17] Gómez, E., Gómez-Villegas, M.A., Marín, J.M. (2006). Sequences of elliptical distributions and mixtures of normal distributions. Journal of Multivariate Analysis 97, 295 -310.

[18] Gómez, E., Gómez-Villegas, M.A., Marín, J.M. (2008). A multivariate exponential power distribution as mixture of normal distributions with Bayesian applications. Communications in Statistics-Theory and Methods 37, 972 - 985.

[19] Grané, A. (2012) Exact goodness-of-fit tests for censored data. Annals of the Institute of Statistical Mathematics 64, 1187-1203 (

[20] Grané, A., Fortiana, J. (2006). An adaptive goodness-of-fit test. Communications in Statistics-Theory and Methods 35,  1141-1155.

[21] Grané, A., Fortiana, J. (2008). Karhunen-Loève basis in goodness-of-fit tests decomposition: an evaluation. Communications in Statistics-Theory and Methods 37, 3144-3163.

[22] Grané, A., Fortiana, J. (2009). A location and scale-free goodness-of-fit statistic for the exponential distribution based on maximum correlations. Statistics 43, 1-12.

[23] Grané, A., Fortiana, J. (2011). A directional test of exponentiality based on maximum correlations. Metrika 73, 255 - 274.

[24] Grané, A., Tchirina, A. (2013). Asymptotic properties of a goodness-of-fit test. To appear in Statistics, 44(1), 202-215 (

[25] Grané, A., Veiga, H. (2008). Accurate minimum capital risk requirements: A comparison of several approaches. Journal of Banking and Finance 32, 2482-2492.

[26] Grané, A., Veiga, H. (2010). Wavelet-based detection of outliers in financial time series. Computational Statistics and Data Analysis, 54, 2580 - 2593.

[27] Grané, A., Veiga, H. (2012) Asymmetry, realised volatility and stock return risk estimates. Portuguese Economic Journal, 11, 147-164


[28] Guitart, C., A. Hernández-del-Valle, J.M. Marin and J. Benedicto (2012) Tracking Temporal Trend Breaks of Anthropogenic Change in Mussel Watch (MW) Databases. Environmental Science & Technology, 46(21), 11515-11523.

[29] HIV-CAUSAL Collaboration, Cain LE, Logan R, Robins JM, Sterne JA, Sabin C, Bansi L, Justice A, Goulet J, van Sighem A, de Wolf F, Bucher HC, von Wyl V, Esteve A, Casabona J, del Amo J, Moreno S, Seng R, Meyer L, Perez-Hoyos S, Muga R, Lodi S, Lanoy E, Costagliola D, Hernan MA.(2011) When to initiate combined antiretroviral therapy to reduce mortality and AIDS-defining illness in HIV-infected persons in developed countries: an observational study. Annals of Internal Medicine, 19, 154(8), 509-15.

[30] Llibre JM, Buzón MJ, Massanella M, Esteve A, Dahl V, Puertas MC, Domingo P, Gatell JM, Larrouse M, Gutierrez M, Palmer S, Stevenson M, Blanco J, Martinez-Picado J, Clotet B. (2011) Treatment intensification with raltegravir in subjects with sustained HIV-1 viremia suppression: a randomized 48 weeks study. Antiviral Therapy (in press).

[31] Marín, J.M., Montes, R., Ríos, D. (2003). Bayesian Methods in Plant Conservation Biology. Biological Conservation 113, 379- 387.

[32] Marín, J.M., Plà, L.M., Ríos, D. (2004). Inference for some stochastic process models related with sow management. Journal of Applied Statistics 32, 797 - 812.

[33] Marín, J.M. and Rodríguez-Bernal, M.T. (2012). Multiple hypothesis testing and clustering with mixtures of non-central t-distributions applied in microarray data analysis. Computational Statistics and Data Analysis. (

[34] Marín, J.M., Rodríguez-Bernal, M.T., Wiper, M. (2005). Using Weibull mixture distributions to model heterogeneous survival data. Communications in Statistics-Simulation and Computation 34, 673 - 684.

[35] Marín, J.M., Nieto, C. (2008). Spatial matching of multiple configurations of points with Bioinformatics application. Communications in Statistics-Theory and Methods 37, 1977 -1995.

[36] Pérez, A. Ruiz, E., Veiga, H. (2009). A note on the properties of power-transformed returns in long-memory stochastic volatility models with leverage effect. Computational Statistics and Data Analysis 53, 3593 - 3600.

[37] Pedrosa E, Carretero-Iglesia L, Boada A, Colobran R, Faner R, Pujol-Autonell I, Palou E, Esteve A, Pujol-Borrell R, Ferrándiz C, Juan M, Carrascosa JM. (2011) CCL4L polymorphisms and CCL4/CCL4L serum levels are associated with psoriasis severity. Journal of Investigative Dermatology, 131(9), 1830-7, doi: 10.1038/jid.2011.127.

[38] Pérez A, Giménez M, Sala P, Sierra M, Esteve A, Rodrigo C.(2011) Increase in invasive nonvaccine pneumococcal serotypes at two hospitals in Barcelona: was replacement disease to blame? Acta Paediatrica, 100(12):1572-5, doi: 10.1111/j.1651-2227.2011.02365.x

[39] Ramos, S. B. and Veiga, H. (2011) Risk factors in oil and gas industry returns: International evidence, Energy Economics, 33(3), 525-542

[40] Romero A, González V, Esteve A, Martró E, Matas L, Tural C, Pumarola T, Casanova A, Ferrer E, Caballero E, Ribera E, Margall N, Domingo P, Farré J, Puig T, Sauca MG, Barrufet P, Amengual MJ, Navarro G, Navarro M, Vilaró J, Ortín X, Amat Ortí15, Pujol F, Prats JM, Massabeu A, Simó JM, Villaverde CA, Benítez MA,  Garcia I, Díaz O, Becerra J, Ros R, Sala R, Rodrigo I, Miró JM,  Casabona J, and the AERI Study group. (2011) Identification of recent HIV-1 infection among newly diagnosed cases in Catalonia, Spain (2006-2008). European Journal of Public Health (in press).

[41] Ruiz, E., Veiga, H. (2008). Modelling long memory volatilities with leverage effect: AlLMSV versus FIEGARCH. Computational Statistics and Data Analysis 52, 2846 - 2862.

[42] Sánchez-Niubò, A., Fortiana, J., Barrio, G., Suelves, J.M., Correa, J.F., Domingo-Salvany, A. (2009). Problematic heroin use incidence trends in Spain. Addiction 104, 248 - 255.

[43] Vives, N., Carnicer-Pont, D., García de Olalla, P., Camps, N., Esteve, A., Casabona, J.  and the HIV and STI Surveillance Group. (2011) Factors associated to late diagnosis of HIV infection in Catalonia, Spain. International Journal of STD & AIDS (In press).

[44] Wiper, M.P., Palacios, A.P. and Marín, J.M. (2012). Bayesian software reliability prediction using software metrics information. Quality Technology and Quantitative Management 9, 1, p. 35-44.


