Workshop on Statistical Models for Complex Data

Madrid, Puerta de Toledo - May 26, 2026


Convenio Erasmus+ UC3M - Università degli Studi di Napoli Federico II

Workshop on Statistical Models for Complex Data

Innovación en Estadística Aplicada

Sobre el Workshop

Este workshop se organiza en el marco del convenio Erasmus+ vigente entre la Universidad Federico II de Nápoles y la Universidad Carlos III de Madrid, dentro del área de Estadística.

Durante el evento, profesores e investigadores de ambas instituciones presentarán diversos trabajos centrados en la modelización estadística de datos complejos, abordando enfoques metodológicos innovadores y aplicaciones relevantes.

Objetivos

  • Fortalecer los vínculos académicos y científicos entre ambas universidades.
  • Fomentar el intercambio de conocimientos.
  • Identificar líneas de investigación conjuntas para futuras colaboraciones.

Registro

Si deseas asistir al workshop, por favor completa el formulario de inscripción en el siguiente enlace:

Registrarse ahora


Programa del Evento

Horario Ponencia y Abstract Institución
09:30 – 10:05 Maria Iannario: Longitudinal modeling of ordinal ratings and ‘Don’t Know’ responses: A multivariate mixed approach

Statistical inference for longitudinal multivariate ordinal data is complicated by departures from the ordinal scale assumption induced by partially ordered responses, including “don’t know” (DK) categories. Treating such responses as missing or excluding them may induce misspecification and bias in inference, particularly when DK responses reflect latent uncertainty. We propose a class of longitudinal multivariate mixed-effects models for partially ordered repeated measures that explicitly incorporates DK responses within a joint likelihood framework. The model is formulated as a two-component mixture, combining a binary submodel for the DK mechanism with a cumulative logit specification for the ordinal responses. Dependence across time and outcomes is induced through a shared random-effects structure, allowing for general covariance patterns. The likelihood is constructed by marginalizing over latent indicators governing the DK mechanism and integrating over the random effects. Covariate effects are allowed to enter both the DK and ordinal components, accommodating distinct drivers of response uncertainty and preference. Parameter estimation is based on maximum likelihood, implemented via adaptive Gaussian quadrature. Identifiability is discussed with reference to the joint specification of the binary and ordinal components and the associated random-effects structure. Finite-sample performance is evaluated through simulation under a range of data-generating mechanisms. When the model is correctly specified, estimators exhibit negligible bias and satisfactory coverage. In contrast, misspecification induced by ignoring DK responses leads to biased estimation of threshold and variance components and undercoverage, particularly as the proportion of DK responses increases, while regression parameters remain comparatively stable. An application to survey data illustrates the practical relevance of the proposed approach, highlighting its ability to disentangle preference and uncertainty components and to improve interpretability relative to standard ordinal mixed models. The proposed framework extends mixed-effects models for ordinal data to accommodate semi-ordinal outcomes, providing a unified approach for modelling latent uncertainty and data quality in longitudinal settings.

Università degli Studi di Napoli Federico II
10:05 – 10:40 Rosa Lillo: Counting votes without asking for votes: Network Scale-Up methods for estimating voting intentions

Estimating the size of hidden or hard-to-reach populations is a central problem in public health, social research, and policy design. The Network Scale-Up Method (NSUM) provides an indirect statistical solution by using aggregated relational data collected from the general population, avoiding the need to directly sample sensitive or difficult-to-access groups. This talk discusses how this framework can be adapted to electoral estimation, where voting preferences may become privacy-sensitive information and direct survey responses can be affected by nonresponse, social desirability, or strategic misreporting. The methodology is illustrated with recent applications to the 2023 Spanish general elections and to real data on voting intentions in the 2024 European elections in Spain and Italy. Particular attention is paid to robust multivariate NSUM estimators, which exploit the dependence structure among political groups while reducing the influence of multivariate outliers and contaminated responses.

Universidad Carlos III de Madrid
10:40 – 11:15 Stefania Capecchi: Assessing the gender pay gap in the EU: An analysis from EIGE data

The study investigates the gender pay gap (GPG) in the European Union, focusing specifically on the impact of caregiving activities. Using data from the CARE 2022 survey of the European Institute for Gender Equality (EIGE), which provides detailed information on unpaid care, household responsibilities and social activities, the analysis explores the contribution of caregiving duties to income inequalities between men and women. To account for the hierarchical structure of the data, whereby individuals are nested within countries, the study employs multilevel ordinal logistic regression models, in which income is measured in ordered categories. Individual-level covariates include gender, education, employment status and the presence of children, while country-level heterogeneity is also considered. This approach allows to disentangle both individual and contextual factors associated with income disparities. The results indicate that gender continues to be a significant factor in determining income, with men being considerably more likely to be in higher income classes. Caregiving responsibilities, proxied by the presence of children and related variables, exhibit a measurable impact on income distribution, suggesting that unequal allocation of unpaid care work contributes to the persistence of the gender pay gap. Education emerges as a strong positive driver of income, while cross-country differences indicate the relevance of institutional and socio-economic contexts. Overall, the findings emphasise the importance of incorporating caregiving dynamics into the analysis of the gender pay gap, highlighting that unpaid care responsibilities are a key mechanism underlying gender-based income inequalities in the EU.

Università degli Studi di Napoli Federico II
11:15 – 11:45 ☕ Coffee break
11:45 – 12:20 Francesca Di Iorio: Monitoring public finance main aggregates: a nowcast strategy using administrative monthly data sources

This reform increases the need for timely and accurate monitoring of public finances to detect deviations from sustainable fiscal paths. Fiscal sustainability analysis is closely tied to GDP and its components, making real-time economic measurement increasingly important. In this context, nowcasting (Giannone, et. al 2008) has gained prominence. European institutions, particularly the European Commission, have shown growing interest in nowcasting for real-time GDP monitoring, inflation forecasting, and regional economic analysis. Traditional forecasting approaches rely on delayed official data and are often inadequate for real-time policymaking. Nowcasting overcomes these limitations by combining high-frequency information with advanced econometric techniques to deliver more timely insights. Despite extensive research on macroeconomic nowcasting (see among others: Baffigi et al. 2004, Banbura et al. 2013, Kuzin et al. 2011), its application to public finance remains limited. Monthly fiscal data are scarce, and monitoring typically focuses on the deficit rather than on its underlying components, such as revenues and public investment. This paper seeks to address this gap by applying nowcasting methods directly to key public finance variables. It focuses on direct and indirect tax revenues and public investment expenditure, assessing whether established techniques, such as Chow-Lin temporal disaggregation and MIDAS models, can produce reliable nowcasts of these components. Future work will explore whether these nowcasts can be used for fiscal scenario analysis within a framework inspired by the MeMo-It model developed by Istat (Istat 2023).

Università degli Studi di Napoli Federico II
12:20 – 12:55 Helena Veiga: Asymmetric correlation propagation in factor stochastic volatility models

This paper introduces a new family of asymmetric stochastic volatility (SV) models that capture how both the sign and magnitude of past shocks influence future volatility. Under normality, we establish stationarity conditions, derive closed-form expressions for key moments (including variance and kurtosis), and obtain a leverage-propagation function that summarizes shock transmission over time. A Monte Carlo study under Gaussian and heavy-tailed shocks shows that Bayesian MCMC provides accurate finite-sample estimates. Empirically, using daily returns on the DAX and S&P 500 indices models in the Leverage Propagation SV family (the general LPSV and its Downside-LPSV restriction) generally match or improve upon standard asymmetric SV benchmarks in terms of in-sample fit and out-of-sample volatility forecasts, with the preferred specification varying across markets and regimes: LPSV tends to be favored for the DAX, whereas Downside-LPSV dominates for the S&P 500 over the full sample and during the crisis period. LPSV offers a clearer description of time-varying leverage transmission. In an application to daily PM2.5 concentrations in Madrid, SV models yield broadly well-calibrated one-step-ahead exceedance probabilities. A counterfactual experiment further shows that extreme pollution surprises generate a persistent increase in volatility, implying high uncertainty for several subsequent days.

Universidad Carlos III de Madrid
12:55 – 13:30 Michael P. Wiper: Measuring efficiency of Peruvian universities: A stochastic frontier analysis

In the last two decades, Latin American higher education systems have invested heavily in improving teaching and research quality, yet comparatively less is known about how efficiently public universities transform resources into educational and research outputs-particularly outside the region’s largest systems. This talk examines the efficiency of Peru’s public universities from 2011 to 2023 using a multi-output stochastic frontier framework. We estimate a distance-function SFA model and decompose inefficiency into persistent (structural) and transient (managerial) components, allowing us to distinguish long-run institutional constraints from short-run performance gaps. The results indicate substantial heterogeneity in efficiency across institutions and over time. Institutional maturity and research output are positively associated with efficiency, while regional socioeconomic conditions are also strongly related to performance differences. We further provide evidence consistent with efficiency gains following Peru’s 2014 higher education reform, with improvements emerging progressively in the years after implementation. By combining a multi-output frontier with a persistent–transient decomposition in a reforming system, the study extends empirical approaches to efficiency in higher education and offers policy-relevant insights for targeting capacity-building and resource allocation in emerging higher education systems.

Universidad Carlos III de Madrid
13:30 – 15:00 🍴 Cocktail lunch
15:00 – 15:35 Vanessa Guerrero: Automatic Knot Selection in Generalized Additive Models via B-Splines

B-spline regression is a widely used framework for nonparametric modeling, whose performance depends on the number and placement of knots. These are typically chosen either explicitly through knot-selection algorithms or implicitly via penalization methods such as P-splines, which are standard in generalized additive models. In this work, we propose a novel knot-selection approach for generalized additive models that avoid multi-dimensional grid search and does not rely on backfitting. The method is evaluated on synthetic and real datasets and compared with existing knot-selection techniques and P-splines. Results show comparable performance both in accuracy and computationally, while yielding simpler models.

Universidad Carlos III de Madrid
15:35 – 16:10 Eduardo García-Portugués: A family of toroidal diffusions with exact likelihood inference

We provide a class of diffusion processes for continuous time-varying multivariate angular data with explicit transition probability densities, enabling exact likelihood inference. The presented diffusions are time-reversible and can be constructed for any pre-specified stationary distribution on the torus, including highly-multimodal mixtures. We give results on asymptotic likelihood theory allowing one-sample inference and tests of linear hypotheses for 𝑘 groups of diffusions, including homogeneity. We show that exact and direct diffusion bridge simulation is possible too. A class of circular jump processes with similar properties is also proposed. Several numerical experiments illustrate the methodology for the circular and two-dimensional torus cases. The new family of diffusions is applied (i) to test several homogeneity hypotheses on the movement of ants and (ii) to simulate bridges between the three-dimensional backbones of two related proteins.

Universidad Carlos III de Madrid
16:10 – 16:45 Lucia Guastadisegni: Pairwise likelihood methods for latent variable models in panel data

Multivariate longitudinal data can be modeled using generalized linear latent variable models (GLLVMs), which provide a flexible framework for analyzing complex data structures. However, likelihood-based inference is often computationally infeasible due to the presence of high-dimensional integrals. To address this issue, we consider composite likelihood methods. We focus on the pairwise likelihood approach, which is based on bivariate densities, and its variant, the d-order pairwise likelihood, which restricts the set of pairs of observations by retaining only those separated by at most d time units. This approach preserves the most informative dependence structure while substantially reducing computational complexity. The model considered is a multidimensional longitudinal GLLVM with time-specific latent variables capturing cross-item dependence and item-specific random effects modeling serial dependence. The proposed methods are implemented using a separate maximization strategy. The composite likelihoods are decomposed into specific components, and each component is maximized independently. The resulting estimators are then combined to perform inference on the full parameter vector. Identifiability issues arising from this strategy are investigated, and simulation studies are conducted to assess and corroborate the main theoretical findings.

Universidad Carlos III de Madrid
16:45 ☕ Coffee break & Clausura

Ubicación

El evento tendrá lugar en el Campus de Puerta de Toledo Aula 1.A.08 de la Universidad Carlos III de Madrid.

Dirección: Ronda de Toledo, 1, 28005 Madrid.

TipCómo llegar (Puerta de Toledo)

El campus es muy accesible en transporte público al estar en el centro de Madrid:

  • Metro: Estación Puerta de Toledo (Línea 5), situada justo frente al edificio.
  • Cercanías Renfe: Estaciones de Pirámides o Embajadores (Líneas C-5, C-1, C-10), a unos 10-12 minutos a pie.
  • Autobús: Líneas 3, 17, 18, 23, 35, 41, 60, 148, C1 y C2.
  • BiciMAD: Estaciones cercanas en la misma Ronda de Toledo.