JabRef references

Author	Title	Year	Journal/Proceedings	Reftype	DOI/URL
Kirch, C., Lahiri, S., Binder, H., Brannath, W., Cribben, I., Dette, H., Doebler, P., Feng, O., Gandy, A., Greven, S., Hammer, B., Harmeling, S., Hotz, T., Kauermann, G., Krause, J., Krempl, G., Nieto-Reyes, A., Okhrin, O., et al.	Challenges and Opportunities for Statistics in the Era of Data Science [Abstract] [BibTeX]	2025	Harvard Data Science Review to appear	article	URL
Abstract: Statistics as a scientific discipline is currently facing the great challenge of finding its place in data science once more. While at the beginning of the last century, the development of the discipline of statistics was initiated by data-related research questions, nowadays, it is often viewed to have not kept up with the current developments in data science, which are largely focused on algorithmic, exploratory and computational aspects and often driven by other disciplines, such as computer science. However, statistics can—and should—contribute to the advances of data science. Of most interest are the strengths of statistics, such as the mathematical focus that leads to theoretical guarantees. This includes methods for formal modeling, hypothesis tests, uncertainty quantification and statistical inference. Of particular interest are also established statistical frameworks to handle causality or data deficiencies such as dependence, missingness, biases or confounding. This paper summarizes the findings of a discussion workshop on the topic that was held in June 2023 in Hannover, Germany. The discussion centered around the following questions: How must statistics be set up so that it can contribute (more) to modern data science? In which direction should it develop further? Which strengths can already be used now? What conditions must be created so that this can succeed? What can be done to arrive at a common language? What is the added value of formal modeling, inference, and the mathematical perspective taken in statistics?
BibTeX: @article{Lahiri2025, author = {Claudia Kirch and Soumendra Lahiri and Harald Binder and Werner Brannath and Ivor Cribben and Holger Dette and Philipp Doebler and Oliver Feng and Axel Gandy and Sonja Greven and Barbara Hammer and Stefan Harmeling and Thomas Hotz and Göran Kauermann and Joscha Krause and Georg Krempl and Alicia Nieto-Reyes and Ostap Okhrin and Hernando Ombao and Florian Pein and Michal Pešta and Dimitris Politis and Li-Xuan Qin and Tom Rainforth and Holger Rauhut and Henry Reeve and David Salinas and Johannes Schmidt-Hieber and Clayton Scott and Johan Segers and Myra Spiliopoulou and Adalbert Wilhelm and Ines Wilms and Yi Yu and Johannes Lederer}, title = {Challenges and Opportunities for Statistics in the Era of Data Science}, year = {2025}, abstract = {Statistics as a scientific discipline is currently facing the great challenge of finding its place in data science once more. While at the beginning of the last century, the development of the discipline of statistics was initiated by data-related research questions, nowadays, it is often viewed to have not kept up with the current developments in data science, which are largely focused on algorithmic, exploratory and computational aspects and often driven by other disciplines, such as computer science. However, statistics can—and should—contribute to the advances of data science. Of most interest are the strengths of statistics, such as the mathematical focus that leads to theoretical guarantees. This includes methods for formal modeling, hypothesis tests, uncertainty quantification and statistical inference. Of particular interest are also established statistical frameworks to handle causality or data deficiencies such as dependence, missingness, biases or confounding. This paper summarizes the findings of a discussion workshop on the topic that was held in June 2023 in Hannover, Germany. The discussion centered around the following questions: How must statistics be set up so that it can contribute (more) to modern data science? In which direction should it develop further? Which strengths can already be used now? What conditions must be created so that this can succeed? What can be done to arrive at a common language? What is the added value of formal modeling, inference, and the mathematical perspective taken in statistics?}, journal = {Harvard Data Science Review}, volume = {}, number = {}, pages={to appear}, url={https://www.wiwi.hu-berlin.de/en/Professorships/vwl/statistik/news/challenges-and-opportunities-for-statistics-in-the-era-of-data-science} }
Leipold, F. M., Kieslich, P. J., Henninger, F., Fernández-Fontelo, A., Greven, S., Kreuter, F.	Detecting Respondent Burden in Online Surveys: How Different Sources of Question Difficulty Influence Cursor Movements [Abstract] [BibTeX]	2025	Social Science Computer Review 43 (1):191-213	article	DOI URL 25/03/06
Abstract: Online surveys are a widely used mode of data collection. However, as no interviewer is present, respondents face any difficulties they encounter alone, which may lead to measurement error and biased or (at worst) invalid conclusions. Detecting response difficulty is therefore vital. Previous research has predominantly focused on response times to detect general response difficulty. However, response difficulty may stem from different sources, such as overly complex wording or similarity between response options. So far, the question of whether indicators can discriminate between these sources has not been addressed. The goal of the present study, therefore, was to evaluate whether specific characteristics of participants’ cursor movements are related to specific properties of survey questions that increase response difficulty. In a preregistered online experiment, we manipulated the length of the question text, the complexity of the question wording, and the difficulty of the response options orthogonally between questions. We hypothesized that these changes would lead to increased response times, hovers (movement pauses), and y-flips (changes in vertical movement direction), respectively. As expected, each manipulation led to an increase in the corresponding measure, although the other dependent variables were affected as well. However, the strengths of the effects did differ as expected between the mouse-tracking indices: Hovers were more sensitive to complex wording than to question difficulty, while the opposite was true for y-flips. These results indicate that differentiating sources of response difficulty might indeed be feasible using mouse-tracking.
BibTeX: @article{doi:10.1177/08944393241247425, author = {Franziska M. Leipold and Pascal J. Kieslich and Felix Henninger and Amanda Fernández-Fontelo and Sonja Greven and Frauke Kreuter}, title ={Detecting Respondent Burden in Online Surveys: How Different Sources of Question Difficulty Influence Cursor Movements}, journal = {Social Science Computer Review}, volume = {43}, number = {1}, pages = {191-213}, year = {2025}, doi = {10.1177/08944393241247425}, URL = { https://doi.org/10.1177/08944393241247425}, urldate={25/03/06}, eprint = {https://doi.org/10.1177/08944393241247425}, abstract = { Online surveys are a widely used mode of data collection. However, as no interviewer is present, respondents face any difficulties they encounter alone, which may lead to measurement error and biased or (at worst) invalid conclusions. Detecting response difficulty is therefore vital. Previous research has predominantly focused on response times to detect general response difficulty. However, response difficulty may stem from different sources, such as overly complex wording or similarity between response options. So far, the question of whether indicators can discriminate between these sources has not been addressed. The goal of the present study, therefore, was to evaluate whether specific characteristics of participants’ cursor movements are related to specific properties of survey questions that increase response difficulty. In a preregistered online experiment, we manipulated the length of the question text, the complexity of the question wording, and the difficulty of the response options orthogonally between questions. We hypothesized that these changes would lead to increased response times, hovers (movement pauses), and y-flips (changes in vertical movement direction), respectively. As expected, each manipulation led to an increase in the corresponding measure, although the other dependent variables were affected as well. However, the strengths of the effects did differ as expected between the mouse-tracking indices: Hovers were more sensitive to complex wording than to question difficulty, while the opposite was true for y-flips. These results indicate that differentiating sources of response difficulty might indeed be feasible using mouse-tracking. } }
Maier, E., Stöcker, A., Fitzenberger, B., Greven, S.	Additive Density-on-Scalar Regression in Bayes Hilbert Spaces with an Application to Gender Economics [Abstract] [BibTeX]	2025	Annals of Applied Statistics to appear	article	URL
Abstract: Motivated by research on gender identity norms and the distribution of the woman's share in a couple's total labor income, we consider functional additive regression models for probability density functions as responses with scalar covariates. To preserve nonnegativity and integration to one under summation and scalar multiplication, we formulate the model for densities in a Bayes Hilbert space with respect to an arbitrary finite measure. This enables us to not only consider continuous densities, but also, e.g., discrete or mixed densities. Mixed densities occur in our application, as the woman's income share is a continuous variable having discrete point masses at zero and one for single-earner couples. We discuss interpretation of effect functions in our model in terms of odds-ratios. Estimation is based on a gradient boosting algorithm that allows for a potentially large number of flexible covariate effects. We show how to handle the challenge of estimation for mixed densities within our framework using an orthogonal decomposition. Applying this approach to data from the German Socio-Economic Panel Study (SOEP) shows a more symmetric distribution in East German than in West German couples after reunification, differences between couples with and without minor children, as well as trends over time.
BibTeX: @article{maier2021density, title={Additive Density-on-Scalar Regression in Bayes Hilbert Spaces with an Application to Gender Economics}, author={Eva-Maria Maier and Almond Stöcker and Bernd Fitzenberger and Sonja Greven}, journal={Annals of Applied Statistics}, year={2025}, volume={}, number={}, pages={to appear}, abstract={Motivated by research on gender identity norms and the distribution of the woman's share in a couple's total labor income, we consider functional additive regression models for probability density functions as responses with scalar covariates. To preserve nonnegativity and integration to one under summation and scalar multiplication, we formulate the model for densities in a Bayes Hilbert space with respect to an arbitrary finite measure. This enables us to not only consider continuous densities, but also, e.g., discrete or mixed densities. Mixed densities occur in our application, as the woman's income share is a continuous variable having discrete point masses at zero and one for single-earner couples. We discuss interpretation of effect functions in our model in terms of odds-ratios. Estimation is based on a gradient boosting algorithm that allows for a potentially large number of flexible covariate effects. We show how to handle the challenge of estimation for mixed densities within our framework using an orthogonal decomposition. Applying this approach to data from the German Socio-Economic Panel Study (SOEP) shows a more symmetric distribution in East German than in West German couples after reunification, differences between couples with and without minor children, as well as trends over time.}, url = {https://arxiv.org/abs/2110.11771} }
Grygar, T. M., Radojičić, U., Pavlů, I., Greven, S., Nešlehová, J. G., Tůmová, Š., Hron, K.	Exploratory functional data analysis of multivariate densities for the identification of agricultural soil contamination by risk elements [Abstract] [BibTeX]	2024	Journal of Geochemical Exploration 259 107416	article	DOI URL
Abstract: Geochemical mapping of risk element concentrations in soils is performed in many countries around the world. It results in numerous large datasets of high analytical quality, which can be used to identify soils that violate individual legislative limits for safe food production. However, there is a lack of advanced data mining tools that would be suitable for sensitive exploratory data analysis of big data while respecting the natural variability of soil composition. To distinguish anthropogenic contamination from natural variations, the analysis of the entire data distribution for smaller subareas is key. In this article, we propose a new data mining methodology for geochemical mapping data based on functional data analysis of probability densities in the framework of Bayes spaces after post-stratification of a big dataset to smaller districts. The tools we propose allow us to analyse the entire distribution, going well beyond a superficial detection of extreme concentration anomalies. We illustrate the proposed methodology on a dataset gathered according to the Czech national legislation (1990–2009), whose information content has not yet been fully exploited. Taking into account specific properties of probability density functions and recent results for orthogonal decomposition of multivariate densities enabled us to reveal real contamination patterns that were so far only suspected in Czech agricultural soils. We process the above Czech soil composition dataset for Cu, Pb, and Zn by first compartmentalizing it into spatial units, the so-called districts, and by subsequently clustering these districts according to diagnostic features of their uni- and multivariate distributions at high concentration levels. These clusters were seen to correspond to compartments that show known features of contamination, such as historical metallurgy of non-ferrous metals and iron and steel production. Comparison between compartments, notably neighbouring districts with similar natural factors controlling soil variability, is key to the reliable distinction of diffuse contamination. In this work, we used soil contamination by Cu-bearing pesticides as an example for empirical testing of the proposed data mining approach. In general, there are no natural and justifiable thresholds of risk element concentrations that would be valid for geographical areas with too much natural heterogeneity. Therefore, national (or larger) soil geochemistry datasets cannot be processed as a whole. As we demonstrate in this paper, empirical knowledge and careful tailoring of statistical tools for the characteristic types of soil contamination are essential for unequivocal identification of the anthropogenic component in real datasets.
BibTeX: @article{GRYGAR2024107416, title = {Exploratory functional data analysis of multivariate densities for the identification of agricultural soil contamination by risk elements}, journal = {Journal of Geochemical Exploration}, volume = {259}, pages = {107416}, year = {2024}, issn = {0375-6742}, doi = {https://doi.org/10.1016/j.gexplo.2024.107416}, url = {https://www.sciencedirect.com/science/article/pii/S0375674224000323}, author = {Tomáš Matys Grygar and Una Radojičić and Ivana Pavlů and Sonja Greven and Johanna G. Nešlehová and {\v{S}}těpánka Tůmová and Karel Hron}, keywords = {FDA for geochemical maps, FDA of univariate and multivariate densities, Compartmentalisation, Identification of Czech agricultural soil contamination, Cu-bearing pesticides, Bayes spaces}, abstract = {Geochemical mapping of risk element concentrations in soils is performed in many countries around the world. It results in numerous large datasets of high analytical quality, which can be used to identify soils that violate individual legislative limits for safe food production. However, there is a lack of advanced data mining tools that would be suitable for sensitive exploratory data analysis of big data while respecting the natural variability of soil composition. To distinguish anthropogenic contamination from natural variations, the analysis of the entire data distribution for smaller subareas is key. In this article, we propose a new data mining methodology for geochemical mapping data based on functional data analysis of probability densities in the framework of Bayes spaces after post-stratification of a big dataset to smaller districts. The tools we propose allow us to analyse the entire distribution, going well beyond a superficial detection of extreme concentration anomalies. We illustrate the proposed methodology on a dataset gathered according to the Czech national legislation (1990–2009), whose information content has not yet been fully exploited. Taking into account specific properties of probability density functions and recent results for orthogonal decomposition of multivariate densities enabled us to reveal real contamination patterns that were so far only suspected in Czech agricultural soils. We process the above Czech soil composition dataset for Cu, Pb, and Zn by first compartmentalizing it into spatial units, the so-called districts, and by subsequently clustering these districts according to diagnostic features of their uni- and multivariate distributions at high concentration levels. These clusters were seen to correspond to compartments that show known features of contamination, such as historical metallurgy of non-ferrous metals and iron and steel production. Comparison between compartments, notably neighbouring districts with similar natural factors controlling soil variability, is key to the reliable distinction of diffuse contamination. In this work, we used soil contamination by Cu-bearing pesticides as an example for empirical testing of the proposed data mining approach. In general, there are no natural and justifiable thresholds of risk element concentrations that would be valid for geographical areas with too much natural heterogeneity. Therefore, national (or larger) soil geochemistry datasets cannot be processed as a whole. As we demonstrate in this paper, empirical knowledge and careful tailoring of statistical tools for the characteristic types of soil contamination are essential for unequivocal identification of the anthropogenic component in real datasets.} }
Gertheiss, J., Rügamer, D., Liew, B. X. W., Greven, S.	Functional Data Analysis: An Introduction and Recent Developments [Abstract] [BibTeX]	2024	Biometrical Journal 66 (7):e202300363	article	DOI URL
Abstract: ABSTRACT Functional data analysis (FDA) is a statistical framework that allows for the analysis of curves, images, or functions on higher dimensional domains. The goals of FDA, such as descriptive analyses, classification, and regression, are generally the same as for statistical analyses of scalar-valued or multivariate data, but FDA brings additional challenges due to the high- and infinite dimensionality of observations and parameters, respectively. This paper provides an introduction to FDA, including a description of the most common statistical analysis techniques, their respective software implementations, and some recent developments in the field. The paper covers fundamental concepts such as descriptives and outliers, smoothing, amplitude and phase variation, and functional principal component analysis. It also discusses functional regression, statistical inference with functional data, functional classification and clustering, and machine learning approaches for functional data analysis. The methods discussed in this paper are widely applicable in fields such as medicine, biophysics, neuroscience, and chemistry and are increasingly relevant due to the widespread use of technologies that allow for the collection of functional data. Sparse functional data methods are also relevant for longitudinal data analysis. All presented methods are demonstrated using available software in R by analyzing a dataset on human motion and motor control. To facilitate the understanding of the methods, their implementation, and hands-on application, the code for these practical examples is made available through a code and data supplement and on GitHub.
BibTeX: @article{gertheiss2023functional, author = {Gertheiss, Jan and Rügamer, David and Liew, Bernard X. W. and Greven, Sonja}, title = {Functional Data Analysis: An Introduction and Recent Developments}, journal = {Biometrical Journal}, volume = {66}, number = {7}, pages = {e202300363}, keywords = {curve data, functional regression, image data, longitudinal data analysis, object-oriented data analysis}, doi = {https://doi.org/10.1002/bimj.202300363}, url = {https://onlinelibrary.wiley.com/doi/abs/10.1002/bimj.202300363}, eprint = {https://onlinelibrary.wiley.com/doi/pdf/10.1002/bimj.202300363}, abstract = {ABSTRACT Functional data analysis (FDA) is a statistical framework that allows for the analysis of curves, images, or functions on higher dimensional domains. The goals of FDA, such as descriptive analyses, classification, and regression, are generally the same as for statistical analyses of scalar-valued or multivariate data, but FDA brings additional challenges due to the high- and infinite dimensionality of observations and parameters, respectively. This paper provides an introduction to FDA, including a description of the most common statistical analysis techniques, their respective software implementations, and some recent developments in the field. The paper covers fundamental concepts such as descriptives and outliers, smoothing, amplitude and phase variation, and functional principal component analysis. It also discusses functional regression, statistical inference with functional data, functional classification and clustering, and machine learning approaches for functional data analysis. The methods discussed in this paper are widely applicable in fields such as medicine, biophysics, neuroscience, and chemistry and are increasingly relevant due to the widespread use of technologies that allow for the collection of functional data. Sparse functional data methods are also relevant for longitudinal data analysis. All presented methods are demonstrated using available software in R by analyzing a dataset on human motion and motor control. To facilitate the understanding of the methods, their implementation, and hands-on application, the code for these practical examples is made available through a code and data supplement and on GitHub.}, year = {2024} }
Eckardt, M., Mateu, J., Greven, S.	Generalized functional additive mixed models with (functional) compositional covariates for areal Covid-19 incidence curves [Abstract] [BibTeX]	2024	Journal of the Royal Statistical Society Series C: Applied Statistics 73 (4):880-901	article	DOI URL
Abstract: We extend the generalized functional additive mixed model to include compositional and functional compositional (density) covariates carrying relative information of a whole. Relying on the isometric isomorphism of the Bayes Hilbert space of probability densities with a sub-space of the L2, we include functional compositions as transformed functional covariates with constrained yet interpretable effect function. The extended model allows for the estimation of linear, non-linear, and time-varying effects of scalar and functional covariates, as well as (correlated) functional random effects, in addition to the compositional effects. We use the model to estimate the effect of the age, sex, and smoking (functional) composition of the population on regional Covid-19 incidence data for Spain, while accounting for climatological and socio-demographic covariate effects and spatial correlation.
BibTeX: @article{10.1093/jrsssc/qlae016, author = {Eckardt, Matthias and Mateu, Jorge and Greven, Sonja}, title = {Generalized functional additive mixed models with (functional) compositional covariates for areal Covid-19 incidence curves}, journal = {Journal of the Royal Statistical Society Series C: Applied Statistics}, volume = {73}, number = {4}, pages = {880-901}, year = {2024}, month = {03}, abstract = {We extend the generalized functional additive mixed model to include compositional and functional compositional (density) covariates carrying relative information of a whole. Relying on the isometric isomorphism of the Bayes Hilbert space of probability densities with a sub-space of the L2, we include functional compositions as transformed functional covariates with constrained yet interpretable effect function. The extended model allows for the estimation of linear, non-linear, and time-varying effects of scalar and functional covariates, as well as (correlated) functional random effects, in addition to the compositional effects. We use the model to estimate the effect of the age, sex, and smoking (functional) composition of the population on regional Covid-19 incidence data for Spain, while accounting for climatological and socio-demographic covariate effects and spatial correlation.}, issn = {0035-9254}, doi = {10.1093/jrsssc/qlae016}, url = {https://doi.org/10.1093/jrsssc/qlae016}, eprint = {https://academic.oup.com/jrsssc/article-pdf/73/4/880/58805394/qlae016.pdf}, }
Stöcker, A., Steyer, L., Greven, S.	Comments on: shape-based functional data analysis [Abstract] [BibTeX]	2024	TEST 33 48-58	article	DOI
Abstract: We would first like to thank the authors, Yuexuan Wu, Chao Huang and Anuj Srivastava, for an interesting article, as well as for the relevant work, in particular on the square-root-velocity (SRV) framework, that it summarizes, and which we have found very useful in our own work. We consider the SRV framework a milestone in object data analysis. While many aspects in the article are worthy of attention, we focus our discussion in the following on several aspects and challenges we have encountered in our work in this field. In particular, we will discuss the univariate versus the multivariate case in Sect. 2, the problem of sparsely sampled curves in Sect. 3, and regression respecting invariances in Sect. 4, before concluding with a discussion in Sect. 5.
BibTeX: @article{Stoecker2023, author={St{\"o}cker, Almond and Steyer, Lisa and Greven, Sonja}, title={Comments on: shape-based functional data analysis}, journal={TEST}, year={2024}, volume={33}, pages={48-58}, doi={https://doi.org/10.1007/s11749-023-00901-x}, abstract={We would first like to thank the authors, Yuexuan Wu, Chao Huang and Anuj Srivastava, for an interesting article, as well as for the relevant work, in particular on the square-root-velocity (SRV) framework, that it summarizes, and which we have found very useful in our own work. We consider the SRV framework a milestone in object data analysis. While many aspects in the article are worthy of attention, we focus our discussion in the following on several aspects and challenges we have encountered in our work in this field. In particular, we will discuss the univariate versus the multivariate case in Sect. 2, the problem of sparsely sampled curves in Sect. 3, and regression respecting invariances in Sect. 4, before concluding with a discussion in Sect. 5.} }
Gertheiss, J., Rügamer, D., Greven, S.	Methoden für die Analyse funktionaler Daten [Abstract] [BibTeX]	2023	Moderne Verfahren der Angewandten Statistik Springer Berlin Heidelberg, Berlin, Heidelberg, 1-35	inbook	DOI URL
Abstract: Funktionale Daten entstehen als diskrete Messungen von inhärent glatten Funktionen wie z. B. Bewegungsprofilen oder Infrarot-Absorptionsspektren. Dieses Kapitel behandelt anhand konkreter Beispiele einige grundlegende Analyseverfahren für derartige Daten. Dabei wird der Fokus auf Regressionsmodelle gelegt, bei denen zumindest einige der Einflussgrößen und/oder die Zielgröße funktional sind. Darüber hinaus wird in weitere Verfahren wie die funktionale Hauptkomponentenanalyse und die Clusteranalyse für funktionale Daten eingeführt.
BibTeX: @Inbook{Gertheiss2023, author="Gertheiss, Jan and R{\"u}gamer, David and Greven, Sonja", editor="Gertheiss, Jan and Schmid, Matthias and Spindler, Martin", title="Methoden f{\"u}r die Analyse funktionaler Daten", bookTitle="Moderne Verfahren der Angewandten Statistik", year="2023", publisher="Springer Berlin Heidelberg", address="Berlin, Heidelberg", pages="1--35", abstract="Funktionale Daten entstehen als diskrete Messungen von inh{\"a}rent glatten Funktionen wie z. B. Bewegungsprofilen oder Infrarot-Absorptionsspektren. Dieses Kapitel behandelt anhand konkreter Beispiele einige grundlegende Analyseverfahren f{\"u}r derartige Daten. Dabei wird der Fokus auf Regressionsmodelle gelegt, bei denen zumindest einige der Einflussgr{\"o}{\ss}en und/oder die Zielgr{\"o}{\ss}e funktional sind. Dar{\"u}ber hinaus wird in weitere Verfahren wie die funktionale Hauptkomponentenanalyse und die Clusteranalyse f{\"u}r funktionale Daten eingef{\"u}hrt.", isbn="978-3-662-63496-7", doi="10.1007/978-3-662-63496-7_5-1", url="https://doi.org/10.1007/978-3-662-63496-7_5-1" }
Fernández-Fontelo, A., Kieslich, P. J., Henninger, F., Kreuter, F., Greven, S.	Predicting Question Difficulty in Web Surveys: A Machine Learning Approach Based on Mouse Movement Features [Abstract] [BibTeX]	2023	Social Science Computer Review 41 (1):141-162	article	DOI
Abstract: Survey research aims to collect robust and reliable data from respondents. However, despite researchers’ efforts in designing questionnaires, survey instruments may be imperfect, and question structure not as clear as could be, thus creating a burden for respondents. If it were possible to detect such problems, this knowledge could be used to predict problems in a questionnaire during pretesting, inform real-time interventions through responsive questionnaire design, or to indicate and correct measurement error after the fact. Previous research has used paradata, specifically response times, to detect difficulties and help improve user experience and data quality. Today, richer data sources are available, for example, movements respondents make with their mouse, as an additional detailed indicator for the respondent–survey interaction. This article uses machine learning techniques to explore the predictive value of mouse-tracking data regarding a question’s difficulty. We use data from a survey on respondents’ employment history and demographic information, in which we experimentally manipulate the difficulty of several questions. Using measures derived from mouse movements, we predict whether respondents have answered the easy or difficult version of a question, using and comparing several state-of-the-art supervised learning methods. We have also developed a personalization method that adjusts for respondents’ baseline mouse behavior and evaluate its performance. For all three manipulated survey questions, we find that including the full set of mouse movement measures and accounting for individual differences in these measures improve prediction performance over response-time-only models.
BibTeX: @article{doi:10.1177/08944393211032950, author = {Amanda Fernández-Fontelo and Pascal J. Kieslich and Felix Henninger and Frauke Kreuter and Sonja Greven}, title ={Predicting Question Difficulty in Web Surveys: A Machine Learning Approach Based on Mouse Movement Features}, journal = {Social Science Computer Review}, volume = {41}, number = {1}, pages = {141-162}, year = {2023}, doi = {10.1177/08944393211032950}, eprint = {https://doi.org/10.1177/08944393211032950}, abstract = { Survey research aims to collect robust and reliable data from respondents. However, despite researchers’ efforts in designing questionnaires, survey instruments may be imperfect, and question structure not as clear as could be, thus creating a burden for respondents. If it were possible to detect such problems, this knowledge could be used to predict problems in a questionnaire during pretesting, inform real-time interventions through responsive questionnaire design, or to indicate and correct measurement error after the fact. Previous research has used paradata, specifically response times, to detect difficulties and help improve user experience and data quality. Today, richer data sources are available, for example, movements respondents make with their mouse, as an additional detailed indicator for the respondent–survey interaction. This article uses machine learning techniques to explore the predictive value of mouse-tracking data regarding a question’s difficulty. We use data from a survey on respondents’ employment history and demographic information, in which we experimentally manipulate the difficulty of several questions. Using measures derived from mouse movements, we predict whether respondents have answered the easy or difficult version of a question, using and comparing several state-of-the-art supervised learning methods. We have also developed a personalization method that adjusts for respondents’ baseline mouse behavior and evaluate its performance. For all three manipulated survey questions, we find that including the full set of mouse movement measures and accounting for individual differences in these measures improve prediction performance over response-time-only models. } }
Olesiewicz, M., Kooroshy, J., Greven, S.	Navigating the corporate disclosure gap: Modelling of Missing Not at Random Carbon Data [Abstract] [BibTeX]	2023	The Journal of Impact & ESG Investing 4 8-34	article	DOI
Abstract: Corporate carbon emissions data is disclosed by approximately 65% of large and mid-sized companies globally, despite being a key indicator of corporate climate performance. With investors increasingly looking to integrate climate risk into their investment strategies and risk reporting, this creates demand for robust prediction models that can generate reliable estimates for missing carbon disclosures. However, these estimates lack transparency and are frequently used in the investment decisions process with the same confidence as corporate reported data. As disclosures remain mostly voluntary and the propensity to disclose is shaped by several factors (e.g. size, sector, geography), missing emissions data should be assumed to be missing not at random (MNAR). However, widely used estimation methods (e.g. linear regression models) typically do not correct for MNAR bias and do not accurately reflect the uncertainty of estimated data. The objective of this paper is to address these issues: (1) account for the uncertainty of the missing data and thus obtain regression coefficients by multiple imputation (MI) (2) correct for potential bias by using MI algorithms based on Heckman's sample selection model introduced by Galimard et al. (3) estimate missing carbon disclosures with linear models based on MI and report on the uncertainty of predicted values, measured as the length of the prediction interval. In the simulation, our approach resulted in an accuracy gain based on root mean squared error of up to 30%, and up to a 40% higher coverage rate than the existing models. When applied to commercial data, the results suggested up to 20% higher coverage for proposed methods.
BibTeX: @article{greven2021carbon, title={Navigating the corporate disclosure gap: Modelling of Missing Not at Random Carbon Data}, author={Malgorzata Olesiewicz and Jaakko Kooroshy and Sonja Greven}, year={2023}, doi = {10.48550/arXiv.2112.07784}, abstract={Corporate carbon emissions data is disclosed by approximately 65\% of large and mid-sized companies globally, despite being a key indicator of corporate climate performance. With investors increasingly looking to integrate climate risk into their investment strategies and risk reporting, this creates demand for robust prediction models that can generate reliable estimates for missing carbon disclosures. However, these estimates lack transparency and are frequently used in the investment decisions process with the same confidence as corporate reported data. As disclosures remain mostly voluntary and the propensity to disclose is shaped by several factors (e.g. size, sector, geography), missing emissions data should be assumed to be missing not at random (MNAR). However, widely used estimation methods (e.g. linear regression models) typically do not correct for MNAR bias and do not accurately reflect the uncertainty of estimated data. The objective of this paper is to address these issues: (1) account for the uncertainty of the missing data and thus obtain regression coefficients by multiple imputation (MI) (2) correct for potential bias by using MI algorithms based on Heckman's sample selection model introduced by Galimard et al. (3) estimate missing carbon disclosures with linear models based on MI and report on the uncertainty of predicted values, measured as the length of the prediction interval. In the simulation, our approach resulted in an accuracy gain based on root mean squared error of up to 30\%, and up to a 40\% higher coverage rate than the existing models. When applied to commercial data, the results suggested up to 20\% higher coverage for proposed methods.}, journal = "The Journal of Impact \& ESG Investing", volume={4}, pages={8-34} }
Steyer, L., Stöcker, A., Greven, S.	Elastic analysis of irregularly or sparsely sampled curves [Abstract] [BibTeX]	2023	Biometrics 79 (3):2103-2115	article	DOI URL
Abstract: Abstract We provide statistical analysis methods for samples of curves in two or more dimensions, where the image, but not the parameterization of the curves, is of interest and suitable alignment/registration is thus necessary. Examples are handwritten letters, movement paths, or object outlines. We focus in particular on the computation of (smooth) means and distances, allowing, for example, classification or clustering. Existing parameterization invariant analysis methods based on the elastic distance of the curves modulo parameterization, using the square-root-velocity framework, have limitations in common realistic settings where curves are irregularly and potentially sparsely observed. We propose using spline curves to model smooth or polygonal (Fréchet) means of open or closed curves with respect to the elastic distance and show identifiability of the spline model modulo parameterization. We further provide methods and algorithms to approximate the elastic distance for irregularly or sparsely observed curves, via interpreting them as polygons. We illustrate the usefulness of our methods on two datasets. The first application classifies irregularly sampled spirals drawn by Parkinson's patients and healthy controls, based on the elastic distance to a mean spiral curve computed using our approach. The second application clusters sparsely sampled GPS tracks based on the elastic distance and computes smooth cluster means to find new paths on the Tempelhof field in Berlin. All methods are implemented in the R-package “elasdics” and evaluated in simulations.
BibTeX: @article{https://doi.org/10.1111/biom.13706, author = {Steyer, Lisa and Stöcker, Almond and Greven, Sonja}, title = {Elastic analysis of irregularly or sparsely sampled curves}, journal = {Biometrics}, volume = {79}, number = {3}, pages = {2103-2115}, keywords = {curve alignment, Fisher–Rao Riemannian metric, functional data analysis, multivariate functional data, registration, square-root-velocity transformation, warping}, doi = {10.1111/biom.13706}, url = {https://onlinelibrary.wiley.com/doi/abs/10.1111/biom.13706}, eprint = {https://onlinelibrary.wiley.com/doi/pdf/10.1111/biom.13706}, abstract = {Abstract We provide statistical analysis methods for samples of curves in two or more dimensions, where the image, but not the parameterization of the curves, is of interest and suitable alignment/registration is thus necessary. Examples are handwritten letters, movement paths, or object outlines. We focus in particular on the computation of (smooth) means and distances, allowing, for example, classification or clustering. Existing parameterization invariant analysis methods based on the elastic distance of the curves modulo parameterization, using the square-root-velocity framework, have limitations in common realistic settings where curves are irregularly and potentially sparsely observed. We propose using spline curves to model smooth or polygonal (Fréchet) means of open or closed curves with respect to the elastic distance and show identifiability of the spline model modulo parameterization. We further provide methods and algorithms to approximate the elastic distance for irregularly or sparsely observed curves, via interpreting them as polygons. We illustrate the usefulness of our methods on two datasets. The first application classifies irregularly sampled spirals drawn by Parkinson's patients and healthy controls, based on the elastic distance to a mean spiral curve computed using our approach. The second application clusters sparsely sampled GPS tracks based on the elastic distance and computes smooth cluster means to find new paths on the Tempelhof field in Berlin. All methods are implemented in the R-package “elasdics” and evaluated in simulations.}, year = {2023} }
Henninger, F., Kieslich, P. J., Fernández-Fontelo, A., Greven, S., Kreuter, F.	Privacy Attitudes toward Mouse-Tracking Paradata Collection [Abstract] [BibTeX]	2023	Public Opinion Quarterly 87 (S1):602-618	article	DOI
Abstract: Survey participants’ mouse movements provide a rich, unobtrusive source of paradata, offering insight into the response process beyond the observed answers. However, the use of mouse tracking may require participants’ explicit consent for their movements to be recorded and analyzed. Thus, the question arises of how its presence affects the willingness of participants to take part in a survey at all—if prospective respondents are reluctant to complete a survey if additional measures are recorded, collecting paradata may do more harm than good. Previous research has found that other paradata collection modes reduce the willingness to participate, and that this decrease may be influenced by the specific motivation provided to participants for collecting the data. However, the effects of mouse movement collection on survey consent and participation have not been addressed so far. In a vignette experiment, we show that reported willingness to participate in a survey decreased when mouse tracking was part of the overall consent. However, a larger proportion of the sample indicated willingness to both take part and provide mouse-tracking data when these decisions were combined, compared to an independent opt-in to paradata collection, separated from the decision to complete the study. This suggests that survey practitioners may face a trade-off between maximizing their overall participation rate and maximizing the number of participants who also provide mouse-tracking data. Explaining motivations for paradata collection did not have a positive effect and, in some cases, even reduced participants’ reported willingness to take part in the survey.
BibTeX: @article{10.1093/poq/nfad034, author = {Henninger, Felix and Kieslich, Pascal J and Fernández-Fontelo, Amanda and Greven, Sonja and Kreuter, Frauke}, title = "{Privacy Attitudes toward Mouse-Tracking Paradata Collection}", journal = {Public Opinion Quarterly}, volume = {87}, number = {S1}, pages = {602-618}, year = {2023}, month = {08}, abstract = "{Survey participants’ mouse movements provide a rich, unobtrusive source of paradata, offering insight into the response process beyond the observed answers. However, the use of mouse tracking may require participants’ explicit consent for their movements to be recorded and analyzed. Thus, the question arises of how its presence affects the willingness of participants to take part in a survey at all—if prospective respondents are reluctant to complete a survey if additional measures are recorded, collecting paradata may do more harm than good. Previous research has found that other paradata collection modes reduce the willingness to participate, and that this decrease may be influenced by the specific motivation provided to participants for collecting the data. However, the effects of mouse movement collection on survey consent and participation have not been addressed so far. In a vignette experiment, we show that reported willingness to participate in a survey decreased when mouse tracking was part of the overall consent. However, a larger proportion of the sample indicated willingness to both take part and provide mouse-tracking data when these decisions were combined, compared to an independent opt-in to paradata collection, separated from the decision to complete the study. This suggests that survey practitioners may face a trade-off between maximizing their overall participation rate and maximizing the number of participants who also provide mouse-tracking data. Explaining motivations for paradata collection did not have a positive effect and, in some cases, even reduced participants’ reported willingness to take part in the survey.}", issn = {0033-362X}, doi = {10.1093/poq/nfad034}, eprint = {https://academic.oup.com/poq/article-pdf/87/S1/602/51344699/nfad034.pdf}, }
Volkmann, A., Stöcker, A., Scheipl, F., Greven, S.	Multivariate functional additive mixed models [Abstract] [BibTeX]	2023	Statistical Modelling 23 (4):303-326	article	DOI
Abstract: Multivariate functional data can be intrinsically multivariate like movement trajectories in 2D or complementary such as precipitation, temperature and wind speeds over time at a given weather station. We propose a multivariate functional additive mixed model (multiFAMM) and show its application to both data situations using examples from sports science (movement trajectories of snooker players) and phonetic science (acoustic signals and articulation of consonants). The approach includes linear and nonlinear covariate effects and models the dependency structure between the dimensions of the responses using multivariate functional principal component analysis. Multivariate functional random intercepts capture both the auto-correlation within a given function and cross-correlations between the multivariate functional dimensions. They also allow us to model between-function correlations as induced by, for example, repeated measurements or crossed study designs. Modelling the dependency structure between the dimensions can generate additional insight into the properties of the multivariate functional process, improves the estimation of random effects, and yields corrected confidence bands for covariate effects. Extensive simulation studies indicate that a multivariate modelling approach is more parsimonious than fitting independent univariate models to the data while maintaining or improving model fit.
BibTeX: @article{doi:10.1177/1471082X211056158, author = {Alexander Volkmann and Almond Stöcker and Fabian Scheipl and Sonja Greven}, title ={Multivariate functional additive mixed models}, journal = {Statistical Modelling}, volume = {23}, number = {4}, pages = {303-326}, year = {2023}, doi = {10.1177/1471082X211056158}, abstract = { Multivariate functional data can be intrinsically multivariate like movement trajectories in 2D or complementary such as precipitation, temperature and wind speeds over time at a given weather station. We propose a multivariate functional additive mixed model (multiFAMM) and show its application to both data situations using examples from sports science (movement trajectories of snooker players) and phonetic science (acoustic signals and articulation of consonants). The approach includes linear and nonlinear covariate effects and models the dependency structure between the dimensions of the responses using multivariate functional principal component analysis. Multivariate functional random intercepts capture both the auto-correlation within a given function and cross-correlations between the multivariate functional dimensions. They also allow us to model between-function correlations as induced by, for example, repeated measurements or crossed study designs. Modelling the dependency structure between the dimensions can generate additional insight into the properties of the multivariate functional process, improves the estimation of random effects, and yields corrected confidence bands for covariate effects. Extensive simulation studies indicate that a multivariate modelling approach is more parsimonious than fitting independent univariate models to the data while maintaining or improving model fit. } }
Stöcker, A., Steyer, L., Greven, S.	Functional additive models on manifolds of planar shapes and forms [Abstract] [BibTeX]	2023	Journal of Computational and Graphical Statistics 32 (4):1600-1612	article	DOI
Abstract: The "shape" of a planar curve and/or landmark configuration is considered its equivalence class under translation, rotation and scaling, its "form" its equivalence class under translation and rotation while scale is preserved. We extend generalized additive regression to models for such shapes/forms as responses respecting the resulting quotient geometry by employing the squared geodesic distance as loss function and a geodesic response function to map the additive predictor to the shape/form space. For fitting the model, we propose a Riemannian L2-Boosting algorithm well suited for a potentially large number of possibly parameter-intensive model terms, which also yields automated model selection. We provide novel intuitively interpretable visualizations for (even non-linear) covariate effects in the shape/form space via suitable tensor-product factorization. The usefulness of the proposed framework is illustrated in an analysis of 1) astragalus shapes of wild and domesticated sheep and 2) cell forms generated in a biophysical model, as well as 3) in a realistic simulation study with response shapes and forms motivated from a dataset on bottle outlines.
BibTeX: @Article{ssg23, author = {Almond Stöcker and Lisa Steyer and Sonja Greven}, journal = {Journal of Computational and Graphical Statistics}, title = {Functional additive models on manifolds of planar shapes and forms}, year = {2023}, volume = {32}, number = {4}, pages = {1600--1612}, doi = {10.1080/10618600.2023.2175687}, abstract = {The "shape" of a planar curve and/or landmark configuration is considered its equivalence class under translation, rotation and scaling, its "form" its equivalence class under translation and rotation while scale is preserved. We extend generalized additive regression to models for such shapes/forms as responses respecting the resulting quotient geometry by employing the squared geodesic distance as loss function and a geodesic response function to map the additive predictor to the shape/form space. For fitting the model, we propose a Riemannian L2-Boosting algorithm well suited for a potentially large number of possibly parameter-intensive model terms, which also yields automated model selection. We provide novel intuitively interpretable visualizations for (even non-linear) covariate effects in the shape/form space via suitable tensor-product factorization. The usefulness of the proposed framework is illustrated in an analysis of 1) astragalus shapes of wild and domesticated sheep and 2) cell forms generated in a biophysical model, as well as 3) in a realistic simulation study with response shapes and forms motivated from a dataset on bottle outlines. } }
Zhang, B., Hepp, T., Greven, S., Bergherr, E.	Adaptive Step-Length Selection in Gradient Boosting for Gaussian Location and Scale Models [Abstract] [BibTeX]	2022	Computational Statistics 37 2295–2332	article	DOI
Abstract: Tuning of model-based boosting algorithms relies mainly on the number of iterations, while the step-length is fixed at a predefined value. For complex models with several predictors such as Generalized additive models for location, scale and shape (GAMLSS), imbalanced updates of predictors, where some distribution parameters are updated more frequently than others, can be a problem that prevents some submodels to be appropriately fitted within a limited number of boosting iterations. We propose an approach using adaptive step-length (ASL) determination within a non-cyclical boosting algorithm for Gaussian location and scale models, as an important special case of the wider class of GAMLSS, to prevent such imbalance. Moreover, we discuss properties of the ASL and derive a semi-analytical form of the ASL that avoids manual selection of the search interval and numerical optimization to find the optimal step-length, and consequently improves computational efficiency. We show competitive behavior of the proposed approaches compared to penalized maximum likelihood and boosting with a fixed step-length for Gaussian location and scale models in two simulations and two applications, in particular for cases of large variance and/or more variables than observations. In addition, the underlying concept of the ASL is also applicable to the whole GAMLSS framework and to other models with more than one predictor like zero-inflated count models, and brings up insights into the choice of the reasonable defaults for the step-length in the simpler special case of (Gaussian) additive models.
BibTeX: @Article{zhang2021adaptive, title={Adaptive Step-Length Selection in Gradient Boosting for Gaussian Location and Scale Models }, author={Boyao Zhang and Tobias Hepp and Sonja Greven and Elisabeth Bergherr}, year={2022}, journal = {Computational Statistics}, volume = {37}, pages = {2295–2332}, doi = {10.1007/s00180-022-01199-3}, abstract={Tuning of model-based boosting algorithms relies mainly on the number of iterations, while the step-length is fixed at a predefined value. For complex models with several predictors such as Generalized additive models for location, scale and shape (GAMLSS), imbalanced updates of predictors, where some distribution parameters are updated more frequently than others, can be a problem that prevents some submodels to be appropriately fitted within a limited number of boosting iterations. We propose an approach using adaptive step-length (ASL) determination within a non-cyclical boosting algorithm for Gaussian location and scale models, as an important special case of the wider class of GAMLSS, to prevent such imbalance. Moreover, we discuss properties of the ASL and derive a semi-analytical form of the ASL that avoids manual selection of the search interval and numerical optimization to find the optimal step-length, and consequently improves computational efficiency. We show competitive behavior of the proposed approaches compared to penalized maximum likelihood and boosting with a fixed step-length for Gaussian location and scale models in two simulations and two applications, in particular for cases of large variance and/or more variables than observations. In addition, the underlying concept of the ASL is also applicable to the whole GAMLSS framework and to other models with more than one predictor like zero-inflated count models, and brings up insights into the choice of the reasonable defaults for the step-length in the simpler special case of (Gaussian) additive models.} }
Rügamer, D., Baumann, P. F. M., Greven, S.	Selective Inference for Additive and Linear Mixed Models [Abstract] [BibTeX]	2022	Computational Statistics and Data Analysis 167	article	DOI
Abstract: After model selection, subsequent inference in statistical models tends to be overconfident if selection is not accounted for. One possible solution to address this problem is selective inference, which constitutes a post-selection inference framework and yields valid inference statements by conditioning on the selection event. Existing work on selective inference is, however, not directly applicable to additive and linear mixed models. A novel extension to recent work on selective inference to the class of additive and linear mixed models is thus presented. The approach can be applied for any type of model selection mechanism that can be expressed as a function of the outcome variable (and potentially of covariates on which the model conditions). Properties of the method are validated in simulation studies and in an application to a data set in monetary economics. The approach is particularly useful in cases of non-standard selection procedures, as present in the motivating application.
BibTeX: @Article{rugamer2020selective, author = {David Rügamer and Philipp F. M. Baumann and Sonja Greven}, title = {Selective Inference for Additive and Linear Mixed Models}, journal = {Computational Statistics and Data Analysis}, year = {2022}, volume = {167}, doi = {10.1016/j.csda.2021.107350}, abstract = {After model selection, subsequent inference in statistical models tends to be overconfident if selection is not accounted for. One possible solution to address this problem is selective inference, which constitutes a post-selection inference framework and yields valid inference statements by conditioning on the selection event. Existing work on selective inference is, however, not directly applicable to additive and linear mixed models. A novel extension to recent work on selective inference to the class of additive and linear mixed models is thus presented. The approach can be applied for any type of model selection mechanism that can be expressed as a function of the outcome variable (and potentially of covariates on which the model conditions). Properties of the method are validated in simulation studies and in an application to a data set in monetary economics. The approach is particularly useful in cases of non-standard selection procedures, as present in the motivating application.} }
Alas, H. D., Stöcker, A., Umlauf, N., Senaweera, O., Pfeifer, S., Greven, S., Wiedensohler, A.	Pedestrian exposure to black carbon and PM 2.5 emissions in urban hot spots: New findings using mobile measurement techniques and flexible Bayesian regression model [Abstract] [BibTeX]	2022	Journal Of Exposure Science And Environmental Epidemiology 32 604-614	article	DOI
Abstract: \paragraph{Background:} Data from extensive mobile measurements (MM) of air pollutants provide spatially resolved information on pedestrians’ exposure to particulate matter (black carbon (BC) and PM2.5 mass concentrations). \paragraph{Objective:} We present a distributional regression model in a Bayesian framework that estimates the effects of spatiotemporal factors on the pollutant concentrations influencing pedestrian exposure. \paragraph{Methods:} We modeled the mean and variance of the pollutant concentrations obtained from MM in two cities and extended commonly used lognormal models with a lognormal-normal convolution (logNNC) extension for BC to account for instrument measurement error. \paragraph{Results:} The logNNC extension significantly improved the BC model. From these model results, we found local sources and, hence, local mitigation efforts to improve air quality, have more impact on the ambient levels of BC mass concentrations than on the regulated PM2.5. \paragraph{Significance:} Firstly, this model (logNNC in bamlss package available in R) could be used for the statistical analysis of MM data from various study areas and pollutants with the potential for predicting pollutant concentrations in urban areas. Secondly, with respect to pedestrian exposure, it is crucial for BC mass concentration to be monitored and regulated in areas dominated by traffic-related air pollution.
BibTeX: @Article{asuspgw21, author = {Honey Dawn Alas and Almond Stöcker and Nikolaus Umlauf and Oshada Senaweera and Sascha Pfeifer and Sonja Greven and Alfred Wiedensohler}, title = {Pedestrian exposure to black carbon and PM 2.5 emissions in urban hot spots: New findings using mobile measurement techniques and flexible Bayesian regression model}, journal = {Journal Of Exposure Science And Environmental Epidemiology}, year = {2022}, volume = {32}, pages = {604-614}, doi={10.1038/s41370-021-00379-5}, abstract={\paragraph{Background:} Data from extensive mobile measurements (MM) of air pollutants provide spatially resolved information on pedestrians’ exposure to particulate matter (black carbon (BC) and PM2.5 mass concentrations). \paragraph{Objective:} We present a distributional regression model in a Bayesian framework that estimates the effects of spatiotemporal factors on the pollutant concentrations influencing pedestrian exposure. \paragraph{Methods:} We modeled the mean and variance of the pollutant concentrations obtained from MM in two cities and extended commonly used lognormal models with a lognormal-normal convolution (logNNC) extension for BC to account for instrument measurement error. \paragraph{Results:} The logNNC extension significantly improved the BC model. From these model results, we found local sources and, hence, local mitigation efforts to improve air quality, have more impact on the ambient levels of BC mass concentrations than on the regulated PM2.5. \paragraph{Significance:} Firstly, this model (logNNC in bamlss package available in R) could be used for the statistical analysis of MM data from various study areas and pollutants with the potential for predicting pollutant concentrations in urban areas. Secondly, with respect to pedestrian exposure, it is crucial for BC mass concentration to be monitored and regulated in areas dominated by traffic-related air pollution.} }
Säfken, B., Rügamer, D., Kneib, T., Greven, S.	Conditional Model Selection in Mixed-Effects Models with cAIC4 [Abstract] [BibTeX]	2021	Journal of Statistical Software, Articles 99	article	DOI
Abstract: Model selection in mixed models based on the conditional distribution is appropriate for many practical applications and has been a focus of recent statistical research. In this paper we introduce the R package cAIC4 that allows for the computation of the conditional Akaike information criterion (cAIC). Computation of the conditional AIC needs to take into account the uncertainty of the random effects variance and is therefore not straightforward. We introduce a fast and stable implementation for the calculation of the cAIC for (generalized) linear mixed models estimated with lme4 and (generalized) additive mixed models estimated with gamm4. Furthermore, cAIC4 offers a stepwise function that allows for an automated stepwise selection scheme for mixed models based on the cAIC. Examples of many possible applications are presented to illustrate the practical impact and easy handling of the package.
BibTeX: @Article{Saefken, author = {Säfken, B. and Rügamer, D. and Kneib, T. and Greven, S}, journal = {Journal of Statistical Software, Articles}, title = {Conditional Model Selection in Mixed-Effects Models with cAIC4}, year = {2021}, volume = {99}, doi = {10.18637/jss.v099.i08}, abstract = {Model selection in mixed models based on the conditional distribution is appropriate for many practical applications and has been a focus of recent statistical research. In this paper we introduce the R package cAIC4 that allows for the computation of the conditional Akaike information criterion (cAIC). Computation of the conditional AIC needs to take into account the uncertainty of the random effects variance and is therefore not straightforward. We introduce a fast and stable implementation for the calculation of the cAIC for (generalized) linear mixed models estimated with lme4 and (generalized) additive mixed models estimated with gamm4. Furthermore, cAIC4 offers a stepwise function that allows for an automated stepwise selection scheme for mixed models based on the cAIC. Examples of many possible applications are presented to illustrate the practical impact and easy handling of the package.} }
Stöcker, A., Brockhaus, S., Schaffer, S. A., Bronk, B. v., Opitz, M., Greven, S.	Boosting functional response models for location, scale and shape with an application to bacterial competition [Abstract] [BibTeX]	2021	Statistical Modelling 21 (5):385-404	article	DOI
Abstract: We extend generalized additive models for location, scale and shape (GAMLSS) to regression with functional response. This allows us to simultaneously model point-wise mean curves, variances and other distributional parameters of the response in dependence of various scalar and functional covariate effects. In addition, the scope of distributions is extended beyond exponential families. The model is fitted via gradient boosting, which offers inherent model selection and is shown to be suitable for both complex model structures and highly auto-correlated response curves. This enables us to analyse bacterial growth in Escherichia coli in a complex interaction scenario, fruitfully extending usual growth models.
BibTeX: @Article{Stoeker2021, author = {Almond Stöcker and Sarah Brockhaus and Sophia Anna Schaffer and Benedikt von Bronk and Madeleine Opitz and Sonja Greven}, journal = {Statistical Modelling}, title = {Boosting functional response models for location, scale and shape with an application to bacterial competition}, year = {2021}, number = {5}, pages = {385-404}, volume = {21}, abstract = {We extend generalized additive models for location, scale and shape (GAMLSS) to regression with functional response. This allows us to simultaneously model point-wise mean curves, variances and other distributional parameters of the response in dependence of various scalar and functional covariate effects. In addition, the scope of distributions is extended beyond exponential families. The model is fitted via gradient boosting, which offers inherent model selection and is shown to be suitable for both complex model structures and highly auto-correlated response curves. This enables us to analyse bacterial growth in Escherichia coli in a complex interaction scenario, fruitfully extending usual growth models.}, doi = {10.1177/1471082X20917586} }
Fernández-Fontelo, A., Kieslich, P. J., Henninger, F., Kreuter, F., Greven, S.	Predicting question difficulty in web surveys: A machine-learning approach based on mouse movement features [Abstract] [BibTeX]	2021	Social Science Computer Review 41 (1):141-162	article	DOI URL
Abstract: Survey research aims to collect robust and reliable data from respondents. However, despite researchers’ efforts in designing questionnaires, survey instruments may be imperfect, and question structure not as clear as could be, thus creating a burden for respondents. If it were possible to detect such problems, this knowledge could be used to predict problems in a questionnaire during pre-testing, inform real-time interventions through responsive questionnaire design, or to indicate and correct measurement error after the fact. Previous research has used paradata, specifically response times, to detect difficulties and help improve user experience and data quality. Today richer data sources are available, for example movements respondents make with their mouse, as an additional detailed indicator for the respondent-survey interaction. This paper uses machine learning techniques to explore the predictive value of mouse-tracking data with regard to a question’s difficulty. We use data from a survey on respondents’ employment history and demographic information, in which we experimentally manipulate the difficulty of several questions. Using measures derived from mouse movements, we predict whether respondents have answered the easy or difficult version of a question, using and comparing several state-of-the-art supervised learning methods. We have also developed a personalization method that adjusts for respondents’ baseline mouse behavior, and evaluate its performance. For all three manipulated survey questions, we find that including the full set of mouse movement measures and accounting for individual differences in these measures improve prediction performance over response-time-only models.
BibTeX: @Article{FernandezFontelo2021, author = {Fern{\'{a}}ndez-Fontelo, Amanda and Kieslich, Pascal. J. and Henninger, Felix and Kreuter, Frauke and Greven, Sonja}, title = {Predicting question difficulty in web surveys: A machine-learning approach based on mouse movement features}, year = {2021}, abstract = {Survey research aims to collect robust and reliable data from respondents. However, despite researchers’ efforts in designing questionnaires, survey instruments may be imperfect, and question structure not as clear as could be, thus creating a burden for respondents. If it were possible to detect such problems, this knowledge could be used to predict problems in a questionnaire during pre-testing, inform real-time interventions through responsive questionnaire design, or to indicate and correct measurement error after the fact. Previous research has used paradata, specifically response times, to detect difficulties and help improve user experience and data quality. Today richer data sources are available, for example movements respondents make with their mouse, as an additional detailed indicator for the respondent-survey interaction. This paper uses machine learning techniques to explore the predictive value of mouse-tracking data with regard to a question’s difficulty. We use data from a survey on respondents’ employment history and demographic information, in which we experimentally manipulate the difficulty of several questions. Using measures derived from mouse movements, we predict whether respondents have answered the easy or difficult version of a question, using and comparing several state-of-the-art supervised learning methods. We have also developed a personalization method that adjusts for respondents’ baseline mouse behavior, and evaluate its performance. For all three manipulated survey questions, we find that including the full set of mouse movement measures and accounting for individual differences in these measures improve prediction performance over response-time-only models.}, journal = {Social Science Computer Review}, volume = {41}, number = {1}, pages = {141-162}, doi = {10.1177/08944393211032950}, URL = {https://doi.org/10.1177/08944393211032950} }
Brockhaus, S., Rügamer, D., Greven, S.	Boosting Functional Regression Models with FDboost [Abstract] [BibTeX]	2020	Journal of Statistical Software, Articles 94 (10):1-50	article	DOI
Abstract: The R add-on package FDboost is a flexible toolbox for the estimation of functional regression models by model-based boosting. It provides the possibility to fit regression models for scalar and functional response with effects of scalar as well as functional covariates, i.e., scalar-on-function, function-on-scalar and function-on-function regression models. In addition to mean regression, quantile regression models as well as generalized additive models for location scale and shape can be fitted with FDboost. Furthermore, boosting can be used in high-dimensional data settings with more covariates than observations. We provide a hands-on tutorial on model fitting and tuning, including the visualization of results. The methods for scalar-on-function regression are illustrated with spectrometric data of fossil fuels and those for functional response regression with a data set including bioelectrical signals for emotional episodes.
BibTeX: @article{JSSv094i10, author = {Sarah Brockhaus and David Rügamer and Sonja Greven}, title = {Boosting Functional Regression Models with FDboost}, journal = {Journal of Statistical Software, Articles}, volume = {94}, number = {10}, year = {2020}, keywords = {functional data analysis; function-on-function regression; function-on-scalar regression; gradient boosting; model-based boosting; scalar-on-function regression}, abstract = {The R add-on package FDboost is a flexible toolbox for the estimation of functional regression models by model-based boosting. It provides the possibility to fit regression models for scalar and functional response with effects of scalar as well as functional covariates, i.e., scalar-on-function, function-on-scalar and function-on-function regression models. In addition to mean regression, quantile regression models as well as generalized additive models for location scale and shape can be fitted with FDboost. Furthermore, boosting can be used in high-dimensional data settings with more covariates than observations. We provide a hands-on tutorial on model fitting and tuning, including the visualization of results. The methods for scalar-on-function regression are illustrated with spectrometric data of fossil fuels and those for functional response regression with a data set including bioelectrical signals for emotional episodes.}, issn = {1548-7660}, pages = {1--50}, doi = {10.18637/jss.v094.i10} }
Fernãndez-Fontelo, A., Henninger, F., Kieslich, P. J., Kreuter, F., Greven, S.	A new model for multivariate functional data classication with application to the prediction of difficulty in web surveys using mouse movement trajectories [Abstract] [BibTeX]	2020	Proceedings of the 35th International Workshop on Statistical Modelling: July 20-24, 2020 Bilbao, Basque Country, Spain, 73-78	inproceedings	URL
Abstract: A semi-metric-based model for multivariate functional data classication is presented and used to improve difficulty prediction in web surveys with mouse movement trajectories.
BibTeX: @InProceedings{fernandez2020functionaldataclassification, author = {Amanda Fern\'andez-Fontelo and Felix Henninger and Pascal J. Kieslich and Frauke Kreuter and Sonja Greven}, title = {A new model for multivariate functional data classication with application to the prediction of difficulty in web surveys using mouse movement trajectories}, booktitle = {Proceedings of the 35th International Workshop on Statistical Modelling: July 20-24, 2020}, address = {Bilbao, Basque Country, Spain}, year = {2020}, pages = {73-78}, abstract = {A semi-metric-based model for multivariate functional data classication is presented and used to improve difficulty prediction in web surveys with mouse movement trajectories.}, URL = {http://www.statmod.org/workshops_archive_proceedings_2020.html} }
Greven, S., Scheipl, F.	Comments on: Inference and computation with Generalized Additive Models and their extensions [Abstract] [BibTeX]	2020	TEST 29 (2):343-350	article	DOI
Abstract: Simon Wood describes a very general framework for additive regression modeling. We wholeheartedly would like to congratulate him not only on this well-written overview but also on the work that it summarizes, much of it his own. In particular, this includes the methodological and theoretical developments, but also the availability of an implementation of much of what is described in the R package mgcv (Wood 2019). This allows these versatile modeling tools to be the basis for a whole ecosystem of follow-up work by other researchers. It also ensures that the methods are not only used by statisticians, but are truly useful for researchers with all kinds of applications ranging from ecology (Pedersen et al. 2018) to linguistics (Winter and Wieling 2016; Baayen et al. 2018). The model class that Wood describes in Section 3.3, as it is based on the general concept of penalized regression, is even larger than might be apparent from the many examples given. Together with the comprehensive and extendable implementation, this means that many further models can be fitted. In the following sections, we describe two such extensions from our own work, which rely on the inferential techniques presented here: regression with functional data in Sect. 2 and time-to-event models in Sect. 3. We close with some comments on statistical inference and thoughts on potential extensions from our own perspective in Sects. 4 and 5.
BibTeX: @Article{Greven2020, author = {Greven, S. and Scheipl, F.}, title = {Comments on: Inference and computation with Generalized Additive Models and their extensions}, journal = {TEST}, year = {2020}, volume = {29}, number = {2}, pages = {343-350}, month = apr, issn = {1863-8260}, abstract = {Simon Wood describes a very general framework for additive regression modeling. We wholeheartedly would like to congratulate him not only on this well-written overview but also on the work that it summarizes, much of it his own. In particular, this includes the methodological and theoretical developments, but also the availability of an implementation of much of what is described in the R package mgcv (Wood 2019). This allows these versatile modeling tools to be the basis for a whole ecosystem of follow-up work by other researchers. It also ensures that the methods are not only used by statisticians, but are truly useful for researchers with all kinds of applications ranging from ecology (Pedersen et al. 2018) to linguistics (Winter and Wieling 2016; Baayen et al. 2018). The model class that Wood describes in Section 3.3, as it is based on the general concept of penalized regression, is even larger than might be apparent from the many examples given. Together with the comprehensive and extendable implementation, this means that many further models can be fitted. In the following sections, we describe two such extensions from our own work, which rely on the inferential techniques presented here: regression with functional data in Sect. 2 and time-to-event models in Sect. 3. We close with some comments on statistical inference and thoughts on potential extensions from our own perspective in Sects. 4 and 5.}, doi = {10.1007/s11749-020-00714-2} }
Maier, E., Stöcker, A., Fitzenberger, B., Greven, S.	Density-on-Scalar Regression Models with an Application in Gender Economics [Abstract] [BibTeX]	2020	Proceedings of the 35th International Workshop on Statistical Modelling: July 20-24, 2020 Bilbao, Basque Country, Spain, 153-158	inproceedings	URL
Abstract: We provide a gradient boosting approach to estimate functional additive regression models with probability density functions as response variables and scalar covariates. To respect the special properties of densities, we formulate the regression model in a Bayes Hilbert space. This allows for a variety of applications, in particular for mixed densities, which have positive probability masses at some points of an interval. We illustrate how to handle this challenge by means of a motivating data set from the German Socio-Economic Panel Study (SOEP). In this application, we analyze the distribution of the woman's share in a couple's total labor income, which has positive probability masses at zero and one, using covariate eects for year, federal state, and age of the youngest child.
BibTeX: @InProceedings{maier2020densityregression, author = {Eva-Maria Maier and Almond St\"ocker and Bernd Fitzenberger and Sonja Greven}, title = {Density-on-Scalar Regression Models with an Application in Gender Economics}, booktitle = {Proceedings of the 35th International Workshop on Statistical Modelling: July 20-24, 2020}, address = {Bilbao, Basque Country, Spain}, year = {2020}, pages = {153--158}, abstract = {We provide a gradient boosting approach to estimate functional additive regression models with probability density functions as response variables and scalar covariates. To respect the special properties of densities, we formulate the regression model in a Bayes Hilbert space. This allows for a variety of applications, in particular for mixed densities, which have positive probability masses at some points of an interval. We illustrate how to handle this challenge by means of a motivating data set from the German Socio-Economic Panel Study (SOEP). In this application, we analyze the distribution of the woman's share in a couple's total labor income, which has positive probability masses at zero and one, using covariate eects for year, federal state, and age of the youngest child.}, URL = {http://www.statmod.org/workshops_archive_proceedings_2020.html} }
Rügamer, D., Greven, S.	Inference for L2-Boosting [Abstract] [BibTeX]	2020	Statistics and Computing 30 (2):279-289	article	DOI
Abstract: We propose a statistical inference framework for the component-wise functional gradient descent algorithm (CFGD) under normality assumption for model errors, also known as L2-Boosting. The CFGD is one of the most versatile tools to analyze data, because it scales well to high-dimensional data sets, allows for a very flexible definition of additive regression models and incorporates inbuilt variable selection. Due to the variable selection, we build on recent proposals for post-selection inference. However, the iterative nature of component-wise boosting, which can repeatedly select the same component to update, necessitates adaptations and extensions to existing approaches. We propose tests and confidence intervals for linear, grouped and penalized additive model components selected by L2-Boosting. Our concepts also transfer to slow-learning algorithms more generally, and to other selection techniques which restrict the response space to more complex sets than polyhedra. We apply our framework to an additive model for sales prices of residential apartments and investigate the properties of our concepts in simulation studies.
BibTeX: @Article{articlereference.2019-06-05.8633378311, author = {R{\"{u}}gamer, David and Greven, Sonja}, authorURLs = {https://www.biostat.statistik.uni-muenchen.de/personen/mitarbeiter/ruegamer/index.html and https://www.wiwi.hu-berlin.de/en/professuren/vwl/statistik/team/grevenso}, title = {Inference for L2-Boosting}, year = {2020}, abstract = {We propose a statistical inference framework for the component-wise functional gradient descent algorithm (CFGD) under normality assumption for model errors, also known as L2-Boosting. The CFGD is one of the most versatile tools to analyze data, because it scales well to high-dimensional data sets, allows for a very flexible definition of additive regression models and incorporates inbuilt variable selection. Due to the variable selection, we build on recent proposals for post-selection inference. However, the iterative nature of component-wise boosting, which can repeatedly select the same component to update, necessitates adaptations and extensions to existing approaches. We propose tests and confidence intervals for linear, grouped and penalized additive model components selected by L2-Boosting. Our concepts also transfer to slow-learning algorithms more generally, and to other selection techniques which restrict the response space to more complex sets than polyhedra. We apply our framework to an additive model for sales prices of residential apartments and investigate the properties of our concepts in simulation studies.}, journal = {Statistics and Computing}, volume = {30}, number = {2}, pages = {279-289}, doi = {10.1007/s11222-019-09882} }
Steyer, L., Stöcker, A., Greven, S.	Elastic analysis of irregularly and sparsely sampled curves [Abstract] [BibTeX]	2020	Proceedings of the 35th International Workshop on Statistical Modelling: July 20-24 2020 Bilbao, Basque Country, Spain, 218-221	inproceedings	URL
Abstract: We provide methods and algorithms to approximate the elastic distance between irregularly and sparsely sampled curves and to fit smooth elastic means for collections of such curves. Moreover, we illustrate both methods by applying them to a dataset comprising GPS tracks, where we first cluster the tracks based on the elastic distance between them and then estimate elastic means for each cluster.
BibTeX: @InProceedings{steyer2020ElasticAnalysisSparseCurves, author = {Lisa Steyer and Almond Stöcker and Sonja Greven}, title = {Elastic analysis of irregularly and sparsely sampled curves}, booktitle = {Proceedings of the 35th International Workshop on Statistical Modelling: July 20-24}, address = {2020 Bilbao, Basque Country, Spain}, year = {2020}, pages = {218-221}, abstract = {We provide methods and algorithms to approximate the elastic distance between irregularly and sparsely sampled curves and to fit smooth elastic means for collections of such curves. Moreover, we illustrate both methods by applying them to a dataset comprising GPS tracks, where we first cluster the tracks based on the elastic distance between them and then estimate elastic means for each cluster.}, URL = {http://www.statmod.org/workshops_archive_proceedings_2020.html} }
Happ, C., Scheipl, F., Gabriel, A., Greven, S.	A general framework for multivariate functional principal component analysis of amplitude and phase variation [Abstract] [BibTeX]	2019	Stat 8 (1):e220 e220 sta4.220	article	DOI
Abstract: Functional data typically contain amplitude and phase variation. In many data situations, phase variation is treated as a nuisance effect and is removed during preprocessing, although it may contain valuable information. In this note, we focus on joint principal component analysis (PCA) of amplitude and phase variation. As the space of warping functions has a complex geometric structure, one key element of the analysis is transforming the warping functions to . We present different transformation approaches and show how they fit into a general class of transformations. This allows us to compare their strengths and limitations. In the context of PCA, our results offer arguments in favour of the centred log-ratio transformation. We further embed two existing approaches from the literature for joint PCA of amplitude and phase variation into the framework of multivariate functional PCA, where we study the properties of the estimators based on an appropriate metric. The approach is illustrated through an application from seismology.
BibTeX: @Article{https://doi.org/10.1002/sta4.220, author = {Happ, Clara and Scheipl, Fabian and Gabriel, Alice-Agnes and Greven, Sonja}, title = {A general framework for multivariate functional principal component analysis of amplitude and phase variation}, journal = {Stat}, year = {2019}, volume = {8}, number = {1}, pages = {e220}, note = {e220 sta4.220}, abstract = {Functional data typically contain amplitude and phase variation. In many data situations, phase variation is treated as a nuisance effect and is removed during preprocessing, although it may contain valuable information. In this note, we focus on joint principal component analysis (PCA) of amplitude and phase variation. As the space of warping functions has a complex geometric structure, one key element of the analysis is transforming the warping functions to . We present different transformation approaches and show how they fit into a general class of transformations. This allows us to compare their strengths and limitations. In the context of PCA, our results offer arguments in favour of the centred log-ratio transformation. We further embed two existing approaches from the literature for joint PCA of amplitude and phase variation into the framework of multivariate functional PCA, where we study the properties of the estimators based on an appropriate metric. The approach is illustrated through an application from seismology.}, doi = {10.1002/sta4.220}, eprint = {https://onlinelibrary.wiley.com/doi/pdf/10.1002/sta4.220}, keywords = {Bayes Hilbert space, Fréchet variance, functional data analysis, registration, seismology, transformation of warping functions} }
Brendel, M., Sauerbeck, J., Greven, S., Kotz, S., Scheiwein, F., Blautzik, J., Delker, A., Pogarell, O., Ishii, K., Bartenstein, P., Rominger, A., Initiative, f. t. A. D. N.	Serotonin selective reuptake inhibitor treatment improves cognition and grey matter atrophy but not amyloid burden during two-year follow-up in mild cognitive impairment and Alzheimer's disease patients with depressive symptoms [Abstract] [BibTeX]	2018	Journal of Alzheimer's Disease 65 (3):793-806	article	DOI
Abstract: Late-life depression, even when of subsyndromal severity, has shown strong associations with mild cognitive impairment (MCI) and Alzheimer's disease (AD). Preclinical studies have suggested that serotonin selective reuptake inhibitors (SSRIs) can attenuate amyloidogenesis. Therefore, we aimed to investigate the effect of SSRI medication on amyloidosis and grey matter volume in subsyndromal depressed subjects with MCI and AD during an interval of two years. 256 cognitively affected subjects (225 MCI/ 31 AD) undergoing [ 18 F]-AV45-PET and MRI at baseline and 2-year follow-up were selected from the ADNI database. Subjects with a positive depression item (DEP(+); n = 73) in the Neuropsychiatric Inventory Questionnaire were subdivided to those receiving SSRI medication (SSRI(+); n = 24) and those without SSRI treatment (SSRI(-); n = 49). Longitudinal cognition (Δ-ADAS), amyloid deposition rate (standardized uptake value, using white matter as reference region (SUVR WM), and changes in grey matter volume were compared using common covariates. Analyses were performed separately in all subjects and in the subgroup of amyloid-positive subjects. Cognitive performance in DEP(+)/SSRI(+) subjects (Δ-ADAS: -5.0%) showed less deterioration with 2-year follow-up when compared to DEP(+)/SSRI(-) subjects (Δ-ADAS: +18.6%, p < 0.05), independent of amyloid SUVR WM at baseline. With SSRI treatment, the progression of grey matter atrophy was reduced (-0.9% versus -2.7%, p < 0.05), notably in frontooral cortex. A slight trend towards lower amyloid deposition rate was observed in DEP(+)/SSRI(+) subjects versus DEP(+)/SSRI(-). Despite the lack of effect to amyloid PET, SSRI medication distinctly rescued the declining cognitive performance in cognitively affected patients with depressive symptoms, and likewise attenuated grey matter atrophy. © 2018 - IOS Press and the authors. All rights reserved.
BibTeX: @Article{Brendel2018793, author = {Brendel, M and Sauerbeck, J and Greven, S and Kotz, S and Scheiwein, F and Blautzik, J and Delker, A and Pogarell, O and Ishii, K and Bartenstein, P and Rominger, A and Initiative, for the Alzheimer's Disease Neuroimaging}, title = {Serotonin selective reuptake inhibitor treatment improves cognition and grey matter atrophy but not amyloid burden during two-year follow-up in mild cognitive impairment and Alzheimer's disease patients with depressive symptoms}, year = {2018}, abstract = {Late-life depression, even when of subsyndromal severity, has shown strong associations with mild cognitive impairment (MCI) and Alzheimer's disease (AD). Preclinical studies have suggested that serotonin selective reuptake inhibitors (SSRIs) can attenuate amyloidogenesis. Therefore, we aimed to investigate the effect of SSRI medication on amyloidosis and grey matter volume in subsyndromal depressed subjects with MCI and AD during an interval of two years. 256 cognitively affected subjects (225 MCI/ 31 AD) undergoing [ 18 F]-AV45-PET and MRI at baseline and 2-year follow-up were selected from the ADNI database. Subjects with a positive depression item (DEP(+); n = 73) in the Neuropsychiatric Inventory Questionnaire were subdivided to those receiving SSRI medication (SSRI(+); n = 24) and those without SSRI treatment (SSRI(-); n = 49). Longitudinal cognition ($\Delta$-ADAS), amyloid deposition rate (standardized uptake value, using white matter as reference region (SUVR WM), and changes in grey matter volume were compared using common covariates. Analyses were performed separately in all subjects and in the subgroup of amyloid-positive subjects. Cognitive performance in DEP(+)/SSRI(+) subjects ($\Delta$-ADAS: -5.0\%) showed less deterioration with 2-year follow-up when compared to DEP(+)/SSRI(-) subjects ($\Delta$-ADAS: +18.6\%, p < 0.05), independent of amyloid SUVR WM at baseline. With SSRI treatment, the progression of grey matter atrophy was reduced (-0.9\% versus -2.7\%, p < 0.05), notably in frontooral cortex. A slight trend towards lower amyloid deposition rate was observed in DEP(+)/SSRI(+) subjects versus DEP(+)/SSRI(-). Despite the lack of effect to amyloid PET, SSRI medication distinctly rescued the declining cognitive performance in cognitively affected patients with depressive symptoms, and likewise attenuated grey matter atrophy. {\textcopyright} 2018 - IOS Press and the authors. All rights reserved.}, journal = {Journal of Alzheimer's Disease}, volume = {65}, number = {3}, pages = {793-806}, keywords = {amyloid beta protein; florbetapir f 18; placebo; serotonin uptake inhibitor, aged; Alzheimer disease; amyloidosis; Article; brain atrophy; brain size; case control study; cognition; cohort analysis; controlled study; disease burden; drug efficacy; female; follow up; gray matter; human; late life depression; longitudinal study; major clinical study; male; mental patient; mild cognitive impairment; neuroimaging; neuropsychiatric inventory; nuclear magnetic resonance imaging; positron emission tomography; priority journal; psychopharmacotherapy; retrospective study; standardized uptake value ratio; subsyndromal depression; white matter}, doi = {10.3233/JAD-170387}, issn = {13872877} }
Brockhaus, S., Fuest, A., Mayr, A., Greven, S.	Signal regression models for location, scale and shape with an application to stock returns [Abstract] [BibTeX]	2018	Journal of the Royal Statistical Society. Series C: Applied Statistics 67 (3):665-686	article	DOI
Abstract: We discuss scalar-on-function regression models where all parameters of the assumed response distribution can be modelled depending on covariates. We thus combine signal regression models with generalized additive models for location, scale and shape. Our approach is motivated by a time series of stock returns, where it is of interest to model both the expectation and the variance depending on lagged response values and functional liquidity curves. We compare two fundamentally different methods for estimation, a gradient boosting and a penalized-likelihood-based approach, and address practically important points like identifiability and model choice. Estimation by a componentwise gradient boosting algorithm allows for high dimensional data settings and variable selection. Estimation by a penalized-likelihood-based approach has the advantage of directly provided statistical inference. © 2017 Royal Statistical Society
BibTeX: @Article{Brockhaus2018665, author = {Brockhaus, S and Fuest, A and Mayr, A and Greven, S}, title = {Signal regression models for location, scale and shape with an application to stock returns}, year = {2018}, abstract = {We discuss scalar-on-function regression models where all parameters of the assumed response distribution can be modelled depending on covariates. We thus combine signal regression models with generalized additive models for location, scale and shape. Our approach is motivated by a time series of stock returns, where it is of interest to model both the expectation and the variance depending on lagged response values and functional liquidity curves. We compare two fundamentally different methods for estimation, a gradient boosting and a penalized-likelihood-based approach, and address practically important points like identifiability and model choice. Estimation by a componentwise gradient boosting algorithm allows for high dimensional data settings and variable selection. Estimation by a penalized-likelihood-based approach has the advantage of directly provided statistical inference. {\textcopyright} 2017 Royal Statistical Society}, journal = {Journal of the Royal Statistical Society. Series C: Applied Statistics}, volume = {67}, number = {3}, pages = {665-686}, doi = {10.1111/rssc.12252}, issn = {00359254} }
Cederbaum, J., Scheipl, F., Greven, S.	Fast symmetric additive covariance smoothing [Abstract] [BibTeX]	2018	Computational Statistics and Data Analysis 120 25-41	article	DOI
Abstract: A fast bivariate smoothing approach for symmetric surfaces is proposed that has a wide range of applications. It is shown how it can be applied to estimate the covariance function in longitudinal data as well as multiple additive covariances in functional data with complex correlation structures. The proposed symmetric smoother can handle (possibly noisy) data sampled on a common, dense grid as well as irregularly or sparsely sampled data. Estimation is based on bivariate penalized spline smoothing using a mixed model representation and the symmetry is used to reduce computation time compared to the usual non-symmetric smoothers. The application of the approach in functional principal component analysis for very general functional linear mixed models is outlined and its practical value is demonstrated in two applications. The approach is evaluated in extensive simulations. Documented open source software is provided that implements the fast symmetric bivariate smoother building on established algorithms for additivemodels. © 2017 Elsevier B.V.
BibTeX: @Article{Cederbaum201825, author = {Cederbaum, J and Scheipl, F and Greven, S}, title = {Fast symmetric additive covariance smoothing}, year = {2018}, abstract = {A fast bivariate smoothing approach for symmetric surfaces is proposed that has a wide range of applications. It is shown how it can be applied to estimate the covariance function in longitudinal data as well as multiple additive covariances in functional data with complex correlation structures. The proposed symmetric smoother can handle (possibly noisy) data sampled on a common, dense grid as well as irregularly or sparsely sampled data. Estimation is based on bivariate penalized spline smoothing using a mixed model representation and the symmetry is used to reduce computation time compared to the usual non-symmetric smoothers. The application of the approach in functional principal component analysis for very general functional linear mixed models is outlined and its practical value is demonstrated in two applications. The approach is evaluated in extensive simulations. Documented open source software is provided that implements the fast symmetric bivariate smoother building on established algorithms for additivemodels. {\textcopyright} 2017 Elsevier B.V.}, journal = {Computational Statistics and Data Analysis}, volume = {120}, pages = {25-41}, keywords = {Bit error rate; Open source software; Open systems; Software engineering, Complex correlation; Covariance function; Extensive simulations; Functional datas; Functional principal component analysis; Longitudinal data; Penalized splines; Principal Components, Principal component analysis}, doi = {10.1016/j.csda.2017.11.002}, issn = {01679473} }
Happ, C., Greven, S.	Multivariate Functional Principal Component Analysis for Data Observed on Different (Dimensional) Domains [Abstract] [BibTeX]	2018	Journal of the American Statistical Association 113 (522):649-659	article	DOI
Abstract: Existing approaches for multivariate functional principal component analysis are restricted to data on the same one-dimensional interval. The presented approach focuses on multivariate functional data on different domains that may differ in dimension, such as functions and images. The theoretical basis for multivariate functional principal component analysis is given in terms of a Karhunen–Loève Theorem. For the practically relevant case of a finite Karhunen–Loève representation, a relationship between univariate and multivariate functional principal component analysis is established. This offers an estimation strategy to calculate multivariate functional principal components and scores based on their univariate counterparts. For the resulting estimators, asymptotic results are derived. The approach can be extended to finite univariate expansions in general, not necessarily orthonormal bases. It is also applicable for sparse functional data or data with measurement error. A flexible R implementation is available on CRAN. The new method is shown to be competitive to existing approaches for data observed on a common one-dimensional domain. The motivating application is a neuroimaging study, where the goal is to explore how longitudinal trajectories of a neuropsychological test score covary with FDG-PET brain scans at baseline. Supplementary material, including detailed proofs, additional simulation results, and software is available online. © 2018, © 2018 American Statistical Association.
BibTeX: @Article{Happ2018649, author = {Happ, C and Greven, S}, title = {Multivariate Functional Principal Component Analysis for Data Observed on Different (Dimensional) Domains}, year = {2018}, abstract = {Existing approaches for multivariate functional principal component analysis are restricted to data on the same one-dimensional interval. The presented approach focuses on multivariate functional data on different domains that may differ in dimension, such as functions and images. The theoretical basis for multivariate functional principal component analysis is given in terms of a Karhunen{\textendash}Lo{\`{e}}ve Theorem. For the practically relevant case of a finite Karhunen{\textendash}Lo{\`{e}}ve representation, a relationship between univariate and multivariate functional principal component analysis is established. This offers an estimation strategy to calculate multivariate functional principal components and scores based on their univariate counterparts. For the resulting estimators, asymptotic results are derived. The approach can be extended to finite univariate expansions in general, not necessarily orthonormal bases. It is also applicable for sparse functional data or data with measurement error. A flexible R implementation is available on CRAN. The new method is shown to be competitive to existing approaches for data observed on a common one-dimensional domain. The motivating application is a neuroimaging study, where the goal is to explore how longitudinal trajectories of a neuropsychological test score covary with FDG-PET brain scans at baseline. Supplementary material, including detailed proofs, additional simulation results, and software is available online. {\textcopyright} 2018, {\textcopyright} 2018 American Statistical Association.}, journal = {Journal of the American Statistical Association}, volume = {113}, number = {522}, pages = {649-659}, doi = {10.1080/01621459.2016.1273115}, issn = {01621459} }
Happ, C., Greven, S., Schmid, V.	The impact of model assumptions in scalar-on-image regression [Abstract] [BibTeX]	2018	Statistics in Medicine 37 (28):4298-4317	article	DOI
Abstract: Complex statistical models such as scalar-on-image regression often require strong assumptions to overcome the issue of nonidentifiability. While in theory, it is well understood that model assumptions can strongly influence the results, this seems to be underappreciated, or played down, in practice. This article gives a systematic overview of the main approaches for scalar-on-image regression with a special focus on their assumptions. We categorize the assumptions and develop measures to quantify the degree to which they are met. The impact of model assumptions and the practical usage of the proposed measures are illustrated in a simulation study and in an application to neuroimaging data. The results show that different assumptions indeed lead to quite different estimates with similar predictive ability, raising the question of their interpretability. We give recommendations for making modeling and interpretation decisions in practice based on the new measures and simulations using hypothetic coefficient images and the observed data. © 2018 John Wiley & Sons, Ltd.
BibTeX: @Article{Happ20184298, author = {Happ, C and Greven, S and Schmid, V.J}, title = {The impact of model assumptions in scalar-on-image regression}, year = {2018}, abstract = {Complex statistical models such as scalar-on-image regression often require strong assumptions to overcome the issue of nonidentifiability. While in theory, it is well understood that model assumptions can strongly influence the results, this seems to be underappreciated, or played down, in practice. This article gives a systematic overview of the main approaches for scalar-on-image regression with a special focus on their assumptions. We categorize the assumptions and develop measures to quantify the degree to which they are met. The impact of model assumptions and the practical usage of the proposed measures are illustrated in a simulation study and in an application to neuroimaging data. The results show that different assumptions indeed lead to quite different estimates with similar predictive ability, raising the question of their interpretability. We give recommendations for making modeling and interpretation decisions in practice based on the new measures and simulations using hypothetic coefficient images and the observed data. {\textcopyright} 2018 John Wiley \& Sons, Ltd.}, journal = {Statistics in Medicine}, volume = {37}, number = {28}, pages = {4298-4317}, keywords = {article; neuroimaging; quantitative analysis; sensitivity analysis; simulation}, doi = {10.1002/sim.7915}, issn = {02776715} }
Köhler, M., Umlauf, N., Greven, S.	Nonlinear association structures in flexible Bayesian additive joint models [Abstract] [BibTeX]	2018	Statistics in Medicine 37 (30):4771-4788	article	DOI
Abstract: Joint models of longitudinal and survival data have become an important tool for modeling associations between longitudinal biomarkers and event processes. The association between marker and log hazard is assumed to be linear in existing shared random effects models, with this assumption usually remaining unchecked. We present an extended framework of flexible additive joint models that allows the estimation of nonlinear covariate specific associations by making use of Bayesian P-splines. Our joint models are estimated in a Bayesian framework using structured additive predictors for all model components, allowing for great flexibility in the specification of smooth nonlinear, time-varying, and random effects terms for longitudinal submodel, survival submodel, and their association. The ability to capture truly linear and nonlinear associations is assessed in simulations and illustrated on the widely studied biomedical data on the rare fatal liver disease primary biliary cirrhosis. All methods are implemented in the R package bamlss to facilitate the application of this flexible joint model in practice. © 2018 John Wiley & Sons, Ltd.
BibTeX: @Article{Khler20184771, author = {K{\"{o}}hler, M and Umlauf, N and Greven, S}, title = {Nonlinear association structures in flexible Bayesian additive joint models}, year = {2018}, abstract = {Joint models of longitudinal and survival data have become an important tool for modeling associations between longitudinal biomarkers and event processes. The association between marker and log hazard is assumed to be linear in existing shared random effects models, with this assumption usually remaining unchecked. We present an extended framework of flexible additive joint models that allows the estimation of nonlinear covariate specific associations by making use of Bayesian P-splines. Our joint models are estimated in a Bayesian framework using structured additive predictors for all model components, allowing for great flexibility in the specification of smooth nonlinear, time-varying, and random effects terms for longitudinal submodel, survival submodel, and their association. The ability to capture truly linear and nonlinear associations is assessed in simulations and illustrated on the widely studied biomedical data on the rare fatal liver disease primary biliary cirrhosis. All methods are implemented in the R package bamlss to facilitate the application of this flexible joint model in practice. {\textcopyright} 2018 John Wiley \& Sons, Ltd.}, journal = {Statistics in Medicine}, volume = {37}, number = {30}, pages = {4771-4788}, keywords = {Article; Bayes theorem; biliary cirrhosis; human; linear system; liver disease; major clinical study; Markov chain; mathematical computing; mathematical model; Monte Carlo method; nonlinear system; survival time}, doi = {10.1002/sim.7967}, issn = {02776715} }
Rügamer, D., Brockhaus, S., Gentsch, K., Scherer, K., Greven, S.	Boosting factor-specific functional historical models for the detection of synchronization in bioelectrical signals [Abstract] [BibTeX]	2018	Journal of the Royal Statistical Society. Series C: Applied Statistics 67 (3):621-642	article	DOI
Abstract: The link between different psychophysiological measures during emotion episodes is not well understood. To analyse the functional relationship between electroencephalography and facial electromyography, we apply historical function-on-function regression models to electroencephalography and electromyography data that were simultaneously recorded from 24 participants while they were playing a computerized gambling task. Given the complexity of the data structure for this application, we extend simple functional historical models to models including random historical effects, factor-specific historical effects and factor-specific random historical effects. Estimation is conducted by a componentwise gradient boosting algorithm, which scales well to large data sets and complex models. © 2017 Royal Statistical Society
BibTeX: @Article{Rgamer2018621, author = {R{\"{u}}gamer, D and Brockhaus, S and Gentsch, K and Scherer, K and Greven, S}, title = {Boosting factor-specific functional historical models for the detection of synchronization in bioelectrical signals}, year = {2018}, abstract = {The link between different psychophysiological measures during emotion episodes is not well understood. To analyse the functional relationship between electroencephalography and facial electromyography, we apply historical function-on-function regression models to electroencephalography and electromyography data that were simultaneously recorded from 24 participants while they were playing a computerized gambling task. Given the complexity of the data structure for this application, we extend simple functional historical models to models including random historical effects, factor-specific historical effects and factor-specific random historical effects. Estimation is conducted by a componentwise gradient boosting algorithm, which scales well to large data sets and complex models. {\textcopyright} 2017 Royal Statistical Society}, journal = {Journal of the Royal Statistical Society. Series C: Applied Statistics}, volume = {67}, number = {3}, pages = {621-642}, doi = {10.1111/rssc.12241}, issn = {00359254} }
Rügamer, D., Greven, S.	Selective inference after likelihood- or test-based model selection in linear models [Abstract] [BibTeX]	2018	Statistics and Probability Letters 140 7-12	article	DOI
Abstract: Statistical inference after model selection requires an inference framework that takes the selection into account in order to be valid. Following recent work on selective inference, we derive analytical expressions for inference after likelihood- or test-based model selection for linear models. © 2018 Elsevier B.V.
BibTeX: @Article{Rgamer20187, author = {R{\"{u}}gamer, D and Greven, S}, title = {Selective inference after likelihood- or test-based model selection in linear models}, year = {2018}, abstract = {Statistical inference after model selection requires an inference framework that takes the selection into account in order to be valid. Following recent work on selective inference, we derive analytical expressions for inference after likelihood- or test-based model selection for linear models. {\textcopyright} 2018 Elsevier B.V.}, journal = {Statistics and Probability Letters}, volume = {140}, pages = {7-12}, doi = {10.1016/j.spl.2018.04.010}, issn = {01677152} }
Augustin, N., Mattocks, C., Faraway, J., Greven, S., Ness, A.	Modelling a response as a function of high-frequency count data: The association between physical activity and fat mass [Abstract] [BibTeX]	2017	Statistical Methods in Medical Research 26 (5):2210-2226	article	DOI
Abstract: Accelerometers are widely used in health sciences, ecology and other application areas. They quantify the intensity of physical activity as counts per epoch over a given period of time. Currently, health scientists use very lossy summaries of the accelerometer time series, some of which are based on coarse discretisation of activity levels, and make certain implicit assumptions, including linear or constant effects of physical activity. We propose the histogram as a functional summary for achieving a near lossless dimension reduction, comparability between individual time series and easy interpretability. Using the histogram as a functional summary avoids registration of accelerometer counts in time. In our novel method, a scalar response is regressed on additive multi-dimensional functional predictors, including the histogram of the high-frequency counts, and additive non-linear predictors for other continuous covariates. The method improves on the current state-of-The art, as it can deal with high-frequency time series of different lengths and missing values and yields a flexible way to model the physical activity effect with fewer assumptions. It also allows the commonly made modelling assumptions to be tested. We investigate the relationship between the response fat mass and physical activity measured by accelerometer, in data from the Avon Longitudinal Study of Parents and Children. Our method allows testing of whether the effect of physical activity varies over its intensity by gender, by time of day or by day of the week. We show that meaningful interpretation requires careful treatment of identifiability constraints in the light of the sum-to-one property of a histogram. We find that the (not necessarily causal) effect of physical activity on kg fat mass is not linear and not constant over the activity intensity. © The Author(s) 2017.
BibTeX: @Article{Augustin20172210, author = {Augustin, N.H and Mattocks, C and Faraway, J.J and Greven, S and Ness, A.R}, title = {Modelling a response as a function of high-frequency count data: The association between physical activity and fat mass}, year = {2017}, abstract = {Accelerometers are widely used in health sciences, ecology and other application areas. They quantify the intensity of physical activity as counts per epoch over a given period of time. Currently, health scientists use very lossy summaries of the accelerometer time series, some of which are based on coarse discretisation of activity levels, and make certain implicit assumptions, including linear or constant effects of physical activity. We propose the histogram as a functional summary for achieving a near lossless dimension reduction, comparability between individual time series and easy interpretability. Using the histogram as a functional summary avoids registration of accelerometer counts in time. In our novel method, a scalar response is regressed on additive multi-dimensional functional predictors, including the histogram of the high-frequency counts, and additive non-linear predictors for other continuous covariates. The method improves on the current state-of-The art, as it can deal with high-frequency time series of different lengths and missing values and yields a flexible way to model the physical activity effect with fewer assumptions. It also allows the commonly made modelling assumptions to be tested. We investigate the relationship between the response fat mass and physical activity measured by accelerometer, in data from the Avon Longitudinal Study of Parents and Children. Our method allows testing of whether the effect of physical activity varies over its intensity by gender, by time of day or by day of the week. We show that meaningful interpretation requires careful treatment of identifiability constraints in the light of the sum-to-one property of a histogram. We find that the (not necessarily causal) effect of physical activity on kg fat mass is not linear and not constant over the activity intensity. {\textcopyright} The Author(s) 2017.}, journal = {Statistical Methods in Medical Research}, volume = {26}, number = {5}, pages = {2210-2226}, keywords = {Article; Bayes theorem; body fat distribution; body mass; clinical protocol; covariance; fat mass; human; mathematical analysis; mathematical model; maximum likelihood method; nonlinear system; physical activity; prediction; regression analysis; sedentary lifestyle; time series analysis; accelerometry; adipose tissue; adult; anatomy and histology; child; exercise; female; longitudinal study; male; sex factor; statistical analysis; statistical model; statistics; time factor, Accelerometry; Adipose Tissue; Adult; Child; Data Interpretation, Statistical; Exercise; Female; Humans; Longitudinal Studies; Male; Models, Statistical; Sex Factors; Statistics as Topic; Time Factors}, doi = {10.1177/0962280215595832}, issn = {09622802} }
Brockhaus, S., Melcher, M., Leisch, F., Greven, S.	Boosting flexible functional regression models with a high number of functional historical effects [Abstract] [BibTeX]	2017	Statistics and Computing 27 (4):913-926	article	DOI
Abstract: We propose a general framework for regression models with functional response containing a potentially large number of flexible effects of functional and scalar covariates. Special emphasis is put on historical functional effects, where functional response and functional covariate are observed over the same interval and the response is only influenced by covariate values up to the current grid point. Historical functional effects are mostly used when functional response and covariate are observed on a common time interval, as they account for chronology. Our formulation allows for flexible integration limits including, e.g., lead or lag times. The functional responses can be observed on irregular curve-specific grids. Additionally, we introduce different parameterizations for historical effects and discuss identifiability issues.The models are estimated by a component-wise gradient boosting algorithm which is suitable for models with a potentially high number of covariate effects, even more than observations, and inherently does model selection. By minimizing corresponding loss functions, different features of the conditional response distribution can be modeled, including generalized and quantile regression models as special cases. The methods are implemented in the open-source R-package FDboost. The methodological developments are motivated by biotechnological data on Escherichia coli fermentations, but cover a much broader model class. © 2016, Springer Science+Business Media New York.
BibTeX: @Article{Brockhaus2017913, author = {Brockhaus, S and Melcher, M and Leisch, F and Greven, S}, title = {Boosting flexible functional regression models with a high number of functional historical effects}, year = {2017}, abstract = {We propose a general framework for regression models with functional response containing a potentially large number of flexible effects of functional and scalar covariates. Special emphasis is put on historical functional effects, where functional response and functional covariate are observed over the same interval and the response is only influenced by covariate values up to the current grid point. Historical functional effects are mostly used when functional response and covariate are observed on a common time interval, as they account for chronology. Our formulation allows for flexible integration limits including, e.g., lead or lag times. The functional responses can be observed on irregular curve-specific grids. Additionally, we introduce different parameterizations for historical effects and discuss identifiability issues.The models are estimated by a component-wise gradient boosting algorithm which is suitable for models with a potentially high number of covariate effects, even more than observations, and inherently does model selection. By minimizing corresponding loss functions, different features of the conditional response distribution can be modeled, including generalized and quantile regression models as special cases. The methods are implemented in the open-source R$\~$package FDboost. The methodological developments are motivated by biotechnological data on Escherichia coli fermentations, but cover a much broader model class. {\textcopyright} 2016, Springer Science+Business Media New York.}, journal = {Statistics and Computing}, volume = {27}, number = {4}, pages = {913-926}, doi = {10.1007/s11222-016-9662-1}, issn = {09603174} }
Greven, S., Scheipl, F.	A general framework for functional regression modelling [Abstract] [BibTeX]	2017	Statistical Modelling 17 (1-2):1-35	article	DOI
Abstract: Researchers are increasingly interested in regression models for functional data. This article discusses a comprehensive framework for additive (mixed) models for functional responses and/or functional covariates based on the guiding principle of reframing functional regression in terms of corresponding models for scalar data, allowing the adaptation of a large body of existing methods for these novel tasks. The framework encompasses many existing as well as new models. It includes regression for {`}generalized{'} functional data, mean regression, quantile regression as well as generalized additive models for location, shape and scale (GAMLSS) for functional data. It admits many flexible linear, smooth or interaction terms of scalar and functional covariates as well as (functional) random effects and allows flexible choices of bases—particularly splines and functional principal components—and corresponding penalties for each term. It covers functional data observed on common (dense) or curve-specific (sparse) grids. Penalized-likelihood-based and gradient-boosting-based inference for these models are implemented in R packages refund and FDboost, respectively. We also discuss identifiability and computational complexity for the functional regression models covered. A running example on a longitudinal multiple sclerosis imaging study serves to illustrate the flexibility and utility of the proposed model class. Reproducible code for this case study is made available online. © 2017, © 2017 SAGE Publications.
BibTeX: @Article{Greven20171, author = {Greven, S and Scheipl, F}, title = {A general framework for functional regression modelling}, year = {2017}, abstract = {Researchers are increasingly interested in regression models for functional data. This article discusses a comprehensive framework for additive (mixed) models for functional responses and/or functional covariates based on the guiding principle of reframing functional regression in terms of corresponding models for scalar data, allowing the adaptation of a large body of existing methods for these novel tasks. The framework encompasses many existing as well as new models. It includes regression for {`}generalized{'} functional data, mean regression, quantile regression as well as generalized additive models for location, shape and scale (GAMLSS) for functional data. It admits many flexible linear, smooth or interaction terms of scalar and functional covariates as well as (functional) random effects and allows flexible choices of bases{\textemdash}particularly splines and functional principal components{\textemdash}and corresponding penalties for each term. It covers functional data observed on common (dense) or curve-specific (sparse) grids. Penalized-likelihood-based and gradient-boosting-based inference for these models are implemented in R packages refund and FDboost, respectively. We also discuss identifiability and computational complexity for the functional regression models covered. A running example on a longitudinal multiple sclerosis imaging study serves to illustrate the flexibility and utility of the proposed model class. Reproducible code for this case study is made available online. {\textcopyright} 2017, {\textcopyright} 2017 SAGE Publications.}, journal = {Statistical Modelling}, volume = {17}, number = {1-2}, pages = {1-35}, doi = {10.1177/1471082X16681317}, issn = {1471082X} }
Greven, S., Scheipl, F.	Rejoinder	2017	Statistical Modelling 17 (1-2):100-115	article	DOI
BibTeX: @Article{Greven2017100, author = {Greven, S and Scheipl, F}, title = {Rejoinder}, year = {2017}, journal = {Statistical Modelling}, volume = {17}, number = {1-2}, pages = {100-115}, doi = {10.1177/1471082X16689188}, issn = {1471082X} }
Köhler, M., Beyerlein, A., Vehik, K., Greven, S., al.,	Joint modeling of longitudinal autoantibody patterns and progression to type 1 diabetes: results from the TEDDY study [Abstract] [BibTeX]	2017	Acta Diabetologica 54 (11):1009-1017	article	DOI
Abstract: Aims: The onset of clinical type 1 diabetes (T1D) is preceded by the occurrence of disease-specific autoantibodies. The level of autoantibody titers is known to be associated with progression time from the first emergence of autoantibodies to the onset of clinical symptoms, but detailed analyses of this complex relationship are lacking. We aimed to fill this gap by applying advanced statistical models. Methods: We investigated data of 613 children from the prospective TEDDY study who were persistent positive for IAA, GADA and/or IA2A autoantibodies. We used a novel approach of Bayesian joint modeling of longitudinal and survival data to assess the potentially time- and covariate-dependent association between the longitudinal autoantibody titers and progression time to T1D. Results: For all autoantibodies we observed a positive association between the titers and the T1D progression risk. This association was estimated as time-constant for IA2A, but decreased over time for IAA and GADA. For example the hazard ratio [95% credibility interval] for IAA (per transformed unit) was 3.38 [2.66, 4.38] at 6-months after seroconversion, and 2.02 [1.55, 2.68] at 36-months after seroconversion. Conclusions: These findings indicate that T1D progression risk stratification based on autoantibody titers should focus on time points early after seroconversion. Joint modeling techniques allow for new insights into these associations. © 2017, Springer-Verlag Italia S.r.l.
BibTeX: @Article{Khler20171009, author = {K{\"{o}}hler, M and Beyerlein, A and Vehik, K and Greven, S and al.}, title = {Joint modeling of longitudinal autoantibody patterns and progression to type 1 diabetes: results from the TEDDY study}, year = {2017}, abstract = {Aims: The onset of clinical type 1 diabetes (T1D) is preceded by the occurrence of disease-specific autoantibodies. The level of autoantibody titers is known to be associated with progression time from the first emergence of autoantibodies to the onset of clinical symptoms, but detailed analyses of this complex relationship are lacking. We aimed to fill this gap by applying advanced statistical models. Methods: We investigated data of 613 children from the prospective TEDDY study who were persistent positive for IAA, GADA and/or IA2A autoantibodies. We used a novel approach of Bayesian joint modeling of longitudinal and survival data to assess the potentially time- and covariate-dependent association between the longitudinal autoantibody titers and progression time to T1D. Results: For all autoantibodies we observed a positive association between the titers and the T1D progression risk. This association was estimated as time-constant for IA2A, but decreased over time for IAA and GADA. For example the hazard ratio [95\% credibility interval] for IAA (per transformed unit) was 3.38 [2.66, 4.38] at 6$\~$months after seroconversion, and 2.02 [1.55, 2.68] at 36$\~$months after seroconversion. Conclusions: These findings indicate that T1D progression risk stratification based on autoantibody titers should focus on time points early after seroconversion. Joint modeling techniques allow for new insights into these associations. {\textcopyright} 2017, Springer-Verlag Italia S.r.l.}, journal = {Acta Diabetologica}, volume = {54}, number = {11}, pages = {1009-1017}, keywords = {autoantibody; glutamate decarboxylase antibody; HLA antigen; insulin antibody; islet antigen 2 antibody; unclassified drug; autoantibody; glutamate decarboxylase, antibody titer; Article; autoimmunity; Bayesian joint modeling; child; clinical study; controlled study; disease association; disease course; female; hazard ratio; human; immunopathogenesis; insulin dependent diabetes mellitus; longitudinal study; major clinical study; male; Markov chain; preschool child; priority journal; risk factor; school child; seroconversion; statistical model; survival analysis; time; blood; disease exacerbation; disease predisposition; immunology; infant; insulin dependent diabetes mellitus; metabolism; pathology; theoretical model, Autoantibodies; Child, Preschool; Diabetes Mellitus, Type 1; Disease Progression; Disease Susceptibility; Female; Glutamate Decarboxylase; Humans; Infant; Longitudinal Studies; Male; Models, Theoretical; Risk Factors; Seroconversion}, doi = {10.1007/s00592-017-1033-7}, issn = {09405429} }
Köhler, M., Umlauf, N., Beyerlein, A., Winkler, C., Ziegler, A., Greven, S.	Flexible Bayesian additive joint models with an application to type 1 diabetes research [Abstract] [BibTeX]	2017	Biometrical Journal 59 (6):1144-1165	article	DOI
Abstract: The joint modeling of longitudinal and time-to-event data is an important tool of growing popularity to gain insights into the association between a biomarker and an event process. We develop a general framework of flexible additive joint models that allows the specification of a variety of effects, such as smooth nonlinear, time-varying and random effects, in the longitudinal and survival parts of the models. Our extensions are motivated by the investigation of the relationship between fluctuating disease-specific markers, in this case autoantibodies, and the progression to the autoimmune disease type 1 diabetes. Using Bayesian P-splines, we are in particular able to capture highly nonlinear subject-specific marker trajectories as well as a time-varying association between the marker and event process allowing new insights into disease progression. The model is estimated within a Bayesian framework and implemented in the R-package bamlss. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
BibTeX: @Article{Khler20171144, author = {K{\"{o}}hler, M and Umlauf, N and Beyerlein, A and Winkler, C and Ziegler, A.-G and Greven, S}, title = {Flexible Bayesian additive joint models with an application to type 1 diabetes research}, year = {2017}, abstract = {The joint modeling of longitudinal and time-to-event data is an important tool of growing popularity to gain insights into the association between a biomarker and an event process. We develop a general framework of flexible additive joint models that allows the specification of a variety of effects, such as smooth nonlinear, time-varying and random effects, in the longitudinal and survival parts of the models. Our extensions are motivated by the investigation of the relationship between fluctuating disease-specific markers, in this case autoantibodies, and the progression to the autoimmune disease type 1 diabetes. Using Bayesian P-splines, we are in particular able to capture highly nonlinear subject-specific marker trajectories as well as a time-varying association between the marker and event process allowing new insights into disease progression. The model is estimated within a Bayesian framework and implemented in the R-package bamlss. {\textcopyright} 2017 WILEY-VCH Verlag GmbH \& Co. KGaA, Weinheim}, journal = {Biometrical Journal}, volume = {59}, number = {6}, pages = {1144-1165}, keywords = {Bayes theorem; biometry; human; insulin dependent diabetes mellitus; longitudinal study; procedures; statistical model, Bayes Theorem; Biometry; Diabetes Mellitus, Type 1; Humans; Longitudinal Studies; Models, Statistical}, doi = {10.1002/bimj.201600224}, issn = {03233847} }
Pouplier, M., Cederbaum, J., Hoole, P., Marin, S., Greven, S.	Mixed modeling for irregularly sampled and correlated functional data: Speech science applications [Abstract] [BibTeX]	2017	Journal of the Acoustical Society of America 142 (2):935-946	article	DOI
Abstract: The speech sciences often employ complex experimental designs requiring models with multiple covariates and crossed random effects. For curve-like data such as time-varying signals, single-time-point feature extraction is commonly used as data reduction technique to make the data amenable to statistical hypothesis testing, thereby discarding a wealth of information. The present paper discusses the application of functional linear mixed models, a functional analogue to linear mixed models. This type of model allows for the holistic evaluation of curve dynamics for data with complex correlation structures due to repeated measures on subjects and stimulus items. The nonparametric, spline-based estimation technique allows for correlated functional data to be observed irregularly, or even sparsely. This means that information on variation in the temporal domain is preserved. Functional principal component analysis is used for parsimonious data representation and variance decomposition. The basic functionality and usage of the model is illustrated based on several case studies with different data types and experimental designs. The statistical method is broadly applicable to any types of data that consist of groups of curves, whether they are articulatory or acoustic time series data, or generally any types of data suitably modeled based on penalized splines. © 2017 Acoustical Society of America.
BibTeX: @Article{Pouplier2017935, author = {Pouplier, M and Cederbaum, J and Hoole, P and Marin, S and Greven, S}, title = {Mixed modeling for irregularly sampled and correlated functional data: Speech science applications}, year = {2017}, abstract = {The speech sciences often employ complex experimental designs requiring models with multiple covariates and crossed random effects. For curve-like data such as time-varying signals, single-time-point feature extraction is commonly used as data reduction technique to make the data amenable to statistical hypothesis testing, thereby discarding a wealth of information. The present paper discusses the application of functional linear mixed models, a functional analogue to linear mixed models. This type of model allows for the holistic evaluation of curve dynamics for data with complex correlation structures due to repeated measures on subjects and stimulus items. The nonparametric, spline-based estimation technique allows for correlated functional data to be observed irregularly, or even sparsely. This means that information on variation in the temporal domain is preserved. Functional principal component analysis is used for parsimonious data representation and variance decomposition. The basic functionality and usage of the model is illustrated based on several case studies with different data types and experimental designs. The statistical method is broadly applicable to any types of data that consist of groups of curves, whether they are articulatory or acoustic time series data, or generally any types of data suitably modeled based on penalized splines. {\textcopyright} 2017 Acoustical Society of America.}, journal = {Journal of the Acoustical Society of America}, volume = {142}, number = {2}, pages = {935-946}, keywords = {Financial data processing; Random processes; Statistics; Testing, Data representations; Estimation techniques; Functional principal component analysis; Holistic evaluations; Reduction techniques; Statistical hypothesis testing; Variance decomposition; Wealth of information, Principal component analysis}, doi = {10.1121/1.4998555}, issn = {00014966} }
Cederbaum, J., Pouplier, M., Hoole, P., Greven, S.	Functional linear mixed models for irregularly or sparsely sampled data [Abstract] [BibTeX]	2016	Statistical Modelling 16 (1):67-88	article	DOI
Abstract: We propose an estimation approach to analyse correlated functional data, which are observed on unequal grids or even sparsely. The model we use is a functional linear mixed model, a functional analogue of the linear mixed model. Estimation is based on dimension reduction via functional principal component analysis and on mixed model methodology. Our procedure allows the decomposition of the variability in the data as well as the estimation of mean effects of interest, and borrows strength across curves. Confidence bands for mean effects can be constructed conditionally on estimated principal components. We provide R-code implementing our approach in an online appendix. The method is motivated by and applied to data from speech production research. © 2016, © 2016 SAGE Publications.
BibTeX: @Article{Cederbaum201667, author = {Cederbaum, J and Pouplier, M and Hoole, P and Greven, S}, title = {Functional linear mixed models for irregularly or sparsely sampled data}, year = {2016}, abstract = {We propose an estimation approach to analyse correlated functional data, which are observed on unequal grids or even sparsely. The model we use is a functional linear mixed model, a functional analogue of the linear mixed model. Estimation is based on dimension reduction via functional principal component analysis and on mixed model methodology. Our procedure allows the decomposition of the variability in the data as well as the estimation of mean effects of interest, and borrows strength across curves. Confidence bands for mean effects can be constructed conditionally on estimated principal components. We provide R-code implementing our approach in an online appendix. The method is motivated by and applied to data from speech production research. {\textcopyright} 2016, {\textcopyright} 2016 SAGE Publications.}, journal = {Statistical Modelling}, volume = {16}, number = {1}, pages = {67-88}, doi = {10.1177/1471082X15617594}, issn = {1471082X} }
Greven, S., Scheipl, F.	Smoothing parameter uncertainty in general smooth models. Invited comment on Wood et al (2016),	2016	Journal of the American Statistical Association 111 (516):1568-1573	article	DOI
BibTeX: @Article{Greven20161568, author = {Greven, S and Scheipl, F}, title = {Smoothing parameter uncertainty in general smooth models. Invited comment on Wood et al (2016),}, journal = {Journal of the American Statistical Association}, year = {2016}, volume = {111}, number = {516}, pages = {1568-1573}, issn = {01621459}, doi = {10.1080/01621459.2016.1250580} }
Brockhaus, S., Fuest, A., Mayr, A., Greven, S.	Functional regression models for location, scale and shape applied to stock returns	2015	Proceedings of the 30th International Workshop on Statistical Modelling 117-122	inproceedings
BibTeX: @Inproceedings{Brockhaus2015iwsm, author = {Brockhaus, Sarah and Fuest, Andreas and Mayr, Andreas and Greven, Sonja}, title = {Functional regression models for location, scale and shape applied to stock returns}, year = {2015}, booktitle = {Proceedings of the 30th International Workshop on Statistical Modelling}, editor = {Friedl Herwig and Wagner Helga}, pages = {117-122} }
Brockhaus, S., Scheipl, F., Hothorn, T., Greven, S.	The functional linear array model [Abstract] [BibTeX]	2015	Statistical Modelling 15 (3):279-300	article	DOI
Abstract: The functional linear array model (FLAM) is a unified model class for functional regression models including function-on-scalar, scalar-on-function and function-on-function regression. Mean, median, quantile as well as generalized additive regression models for functional or scalar responses are contained as special cases in this general framework. Our implementation features a broad variety of covariate effects, such as, linear, smooth and interaction effects of grouping variables, scalar and functional covariates. Computational efficiency is achieved by representing the model as a generalized linear array model. While the array structure requires a common grid for functional responses, missing values are allowed. Estimation is conducted using a boosting algorithm, which allows for numerous covariates and automatic, data-driven model selection. To illustrate the flexibility of the model class we use three applications on curing of resin for car production, heat values of fossil fuels and Canadian climate data (the last one in the electronic supplement). These require function-on-scalar, scalar-on-function and function-on-function regression models, respectively, as well as additional capabilities such as robust regression, spatial functional regression, model selection and accommodation of missings. An implementation of our methods is provided in the R add-on package FDboost. © 2015 SAGE Publications
BibTeX: @Article{Brockhaus2015279, author = {Brockhaus, S and Scheipl, F and Hothorn, T and Greven, S}, title = {The functional linear array model}, year = {2015}, abstract = {The functional linear array model (FLAM) is a unified model class for functional regression models including function-on-scalar, scalar-on-function and function-on-function regression. Mean, median, quantile as well as generalized additive regression models for functional or scalar responses are contained as special cases in this general framework. Our implementation features a broad variety of covariate effects, such as, linear, smooth and interaction effects of grouping variables, scalar and functional covariates. Computational efficiency is achieved by representing the model as a generalized linear array model. While the array structure requires a common grid for functional responses, missing values are allowed. Estimation is conducted using a boosting algorithm, which allows for numerous covariates and automatic, data-driven model selection. To illustrate the flexibility of the model class we use three applications on curing of resin for car production, heat values of fossil fuels and Canadian climate data (the last one in the electronic supplement). These require function-on-scalar, scalar-on-function and function-on-function regression models, respectively, as well as additional capabilities such as robust regression, spatial functional regression, model selection and accommodation of missings. An implementation of our methods is provided in the R add-on package FDboost. {\textcopyright} 2015 SAGE Publications}, journal = {Statistical Modelling}, volume = {15}, number = {3}, pages = {279-300}, doi = {10.1177/1471082X14566913}, issn = {1471082X} }
Greven, S.	A General Framework for Functional Regression	2015	Proceedings of the 30th International Workshop on Statistical Modelling 39-54	inproceedings
BibTeX: @Inproceedings{Greven2015iwsm, author = {Greven, Sonja}, title = {A General Framework for Functional Regression}, year = {2015}, booktitle = {Proceedings of the 30th International Workshop on Statistical Modelling}, editor = {Friedl Herwig and Wagner Helga}, pages = {39-54} }
Scheipl, F., Staicu, A., Greven, S.	Functional Additive Mixed Models [Abstract] [BibTeX]	2015	Journal of Computational and Graphical Statistics 24 (2):477-501	article	DOI
Abstract: We propose an extensive framework for additive regression models for correlated functional responses, allowing for multiple partially nested or crossed functional random effects with flexible correlation structures for, for example, spatial, temporal, or longitudinal functional data. Additionally, our framework includes linear and nonlinear effects of functional and scalar covariates that may vary smoothly over the index of the functional response. It accommodates densely or sparsely observed functional responses and predictors which may be observed with additional error and includes both spline-based and functional principal component-based terms. Estimation and inference in this framework is based on standard additive mixed models, allowing us to take advantage of established methods and robust, flexible algorithms. We provide easy-to-use open source software in the pffr() function for the R package refund. Simulations show that the proposed method recovers relevant effects reliably, handles small sample sizes well, and also scales to larger datasets. Applications with spatially and longitudinally observed functional data demonstrate the flexibility in modeling and interpretability of results of our approach. © 2015, © American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America.
BibTeX: @Article{Scheipl2015477, author = {Scheipl, F and Staicu, A.-M and Greven, S}, title = {Functional Additive Mixed Models}, year = {2015}, abstract = {We propose an extensive framework for additive regression models for correlated functional responses, allowing for multiple partially nested or crossed functional random effects with flexible correlation structures for, for example, spatial, temporal, or longitudinal functional data. Additionally, our framework includes linear and nonlinear effects of functional and scalar covariates that may vary smoothly over the index of the functional response. It accommodates densely or sparsely observed functional responses and predictors which may be observed with additional error and includes both spline-based and functional principal component-based terms. Estimation and inference in this framework is based on standard additive mixed models, allowing us to take advantage of established methods and robust, flexible algorithms. We provide easy-to-use open source software in the pffr() function for the R package refund. Simulations show that the proposed method recovers relevant effects reliably, handles small sample sizes well, and also scales to larger datasets. Applications with spatially and longitudinally observed functional data demonstrate the flexibility in modeling and interpretability of results of our approach. {\textcopyright} 2015, {\textcopyright} American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America.}, journal = {Journal of Computational and Graphical Statistics}, volume = {24}, number = {2}, pages = {477-501}, doi = {10.1080/10618600.2014.901914}, issn = {10618600} }
Brockhaus, S., Scheipl, F., Hothorn, T., Greven, S.	The Functional Linear Array Model and an Application to Viscosity Curves	2014	Proceedings of the 29th International Workshop on Statistical Modelling 63-68	inproceedings
BibTeX: @Inproceedings{Brockhaus2014, author = {Brockhaus, Sarah and Scheipl, Fabian and Hothorn, Torsten and Greven, Sonja}, title = {The Functional Linear Array Model and an Application to Viscosity Curves}, year = {2014}, booktitle = {Proceedings of the 29th International Workshop on Statistical Modelling}, editor = {Kneib, Thomas and Sobotka, Fabian and Fahrenholz, Jan and Irmer, Henriette}, pages = {63-68} }
Cederbaum, J., Greven, S., Pouplier, M., Hoole, P.	Functional linear mixed model for irregularly spaced phonetics data	2014	Proceedings of the 29th International Workshop on Statistical Modelling	inproceedings
BibTeX: @Inproceedings{sparseFLMMPhonetics, author = {Cederbaum, Jona and Greven, Sonja and Pouplier, Marianne and Hoole, Phil}, title = {Functional linear mixed model for irregularly spaced phonetics data}, year = {2014}, booktitle = {Proceedings of the 29th International Workshop on Statistical Modelling}, editor = {Kneib, Thomas and Sobotka, Fabian and Fahrenholz, Jan and Irmer, Henriette} }
Pouplier, M., Hoole, P., Cederbaum, J., Greven, S., Pastätter, M.	Perceptual and articulatory factors in German fricative assimilation [Abstract] [BibTeX]	2014	Proceedings of the 10th International Seminar on Speech Production (ISSP) 332-335	inproceedings	URL
Abstract: We present data from an EPG experiment on German fricative assimilation. It has been claimed that fricative sequences other than sibilant clusters do not assimilate due to perceptual constraints. We demonstrate that f# sibilant sequences show in principle the same kind of temporal overlap as sibilant clustersdo, but due to the labial constriction dominating the acoustics, this temporal overlap is acoustically and perceptually less salient. Our data further reveal an order asymmetry: sibilant#f behave differently to f# sibilant clusters; there is no evidence in sibilant#f clusters for the labio-dental constriction overlapping the sibilant in our data. We consider perceptual and articulatory accounts of this asymmetry. We also investigate whether lexical stress affects assimilation patterns. To that effect we dis cuss a new statistical method for analyzing functional data with a mixed model allowing for mult iple covariates and crossed randomeffects. We find that primarily stress of the word-final but not the word-initial syllable interacts with assimilation
BibTeX: @Inproceedings{Pouplier2014, author = {Pouplier, Marianne and Hoole, Philip and Cederbaum, Jona and Greven, Sonja and Past{\"{a}}tter, Manfred}, title = {Perceptual and articulatory factors in German fricative assimilation}, year = {2014}, URL = {https://www.phonetik.uni-muenchen.de/\~hoole/pdf/Pouplieretal\_issp2014.pdf}, abstract = {We present data from an EPG experiment on German fricative assimilation. It has been claimed that fricative sequences other than sibilant clusters do not assimilate due to perceptual constraints. We demonstrate that f\# sibilant sequences show in principle the same kind of temporal overlap as sibilant clustersdo, but due to the labial constriction dominating the acoustics, this temporal overlap is acoustically and perceptually less salient. Our data further reveal an order asymmetry: sibilant\#f behave differently to f\# sibilant clusters; there is no evidence in sibilant\#f clusters for the labio-dental constriction overlapping the sibilant in our data. We consider perceptual and articulatory accounts of this asymmetry. We also investigate whether lexical stress affects assimilation patterns. To that effect we dis cuss a new statistical method for analyzing functional data with a mixed model allowing for mult iple covariates and crossed randomeffects. We find that primarily stress of the word-final but not the word-initial syllable interacts with assimilation}, booktitle = {Proceedings of the 10th International Seminar on Speech Production (ISSP)}, editor = {Fuchs, Susanne and Grice, Martine and Hermes, Anne and Lancia, Leonardo and M{\"{u}}cke, Doris}, pages = {332-335} }
Saefken, B., Ruegamer, D., Greven, w. c. f. S., Kneib, T.	cAIC4: Conditional Akaike information criterion for lme4	2014	R package version 0.2, available at http://CRAN.R-project.org/package=cAIC4	manual	URL
BibTeX: @Manual{cAIC4, author = {Saefken, Benjamin and Ruegamer, David and Greven, with contributions from S and Kneib, Thomas}, title = {cAIC4: Conditional Akaike information criterion for lme4}, year = {2014}, URL = {http://CRAN.R-project.org/package=cAIC4}, note = {R package version 0.2, available at http://CRAN.R-project.org/package=cAIC4} }
Goldsmith, J., Greven, S., Crainiceanu, C.	Corrected Confidence Bands for Functional Data Using Principal Components [Abstract] [BibTeX]	2013	Biometrics 69 (1):41-51	article	DOI
Abstract: Functional principal components (FPC) analysis is widely used to decompose and express functional observations. Curve estimates implicitly condition on basis functions and other quantities derived from FPC decompositions; however these objects are unknown in practice. In this article, we propose a method for obtaining correct curve estimates by accounting for uncertainty in FPC decompositions. Additionally, pointwise and simultaneous confidence intervals that account for both model- and decomposition-based variability are constructed. Standard mixed model representations of functional expansions are used to construct curve estimates and variances conditional on a specific decomposition. Iterated expectation and variance formulas combine model-based conditional estimates across the distribution of decompositions. A bootstrap procedure is implemented to understand the uncertainty in principal component decomposition quantities. Our method compares favorably to competing approaches in simulation studies that include both densely and sparsely observed functions. We apply our method to sparse observations of CD4 cell counts and to dense white-matter tract profiles. Code for the analyses and simulations is publicly available, and our method is implemented in the R package refund on CRAN. © 2013, The International Biometric Society.
BibTeX: @Article{Goldsmith201341, author = {Goldsmith, J and Greven, S and Crainiceanu, C}, title = {Corrected Confidence Bands for Functional Data Using Principal Components}, year = {2013}, abstract = {Functional principal components (FPC) analysis is widely used to decompose and express functional observations. Curve estimates implicitly condition on basis functions and other quantities derived from FPC decompositions; however these objects are unknown in practice. In this article, we propose a method for obtaining correct curve estimates by accounting for uncertainty in FPC decompositions. Additionally, pointwise and simultaneous confidence intervals that account for both model- and decomposition-based variability are constructed. Standard mixed model representations of functional expansions are used to construct curve estimates and variances conditional on a specific decomposition. Iterated expectation and variance formulas combine model-based conditional estimates across the distribution of decompositions. A bootstrap procedure is implemented to understand the uncertainty in principal component decomposition quantities. Our method compares favorably to competing approaches in simulation studies that include both densely and sparsely observed functions. We apply our method to sparse observations of CD4 cell counts and to dense white-matter tract profiles. Code for the analyses and simulations is publicly available, and our method is implemented in the R package refund on CRAN. {\textcopyright} 2013, The International Biometric Society.}, journal = {Biometrics}, volume = {69}, number = {1}, pages = {41-51}, keywords = {bootstrapping; decomposition; principal component analysis; variance analysis, article; brain; CD4 lymphocyte count; comparative study; computer simulation; confidence interval; growth, development and aging; human; Human immunodeficiency virus; Human immunodeficiency virus infection; methodology; multiple sclerosis; nuclear magnetic resonance imaging; pathology; principal component analysis; statistical model, Brain; CD4 Lymphocyte Count; Computer Simulation; Confidence Intervals; HIV; HIV Infections; Humans; Magnetic Resonance Imaging; Models, Statistical; Multiple Sclerosis; Principal Component Analysis}, doi = {10.1111/j.1541-0420.2012.01808.x}, issn = {0006341X} }
Baumert, J., Karakas, M., Greven, S., Rückerl, R., Peters, A., Koenig, W.	Variability of fibrinogen measurements in post-myocardial infarction patients: Results from the AIRGENE study center Augsburg [Abstract] [BibTeX]	2012	Thrombosis and Haemostasis 107 (5):895-902	article	DOI
Abstract: Elevated fibrinogen levels are strongly and consistently associated with incident coronary heart disease (CHD). A possible causal contribution of fibrinogen in the pathway leading to atherothrombotic cardiovascular disease complications has been suggested. However, for implementation in clinical practice, data on validity and reliability, which are still scarce, are needed that are still scarce, especially in subjects with a history of CHD. For the present study, levels of plasma fibrinogen were measured in 200 post-myocardial infarction (post-MI) patients aged 39-76 years, with approximately six blood samples collected at monthly intervals between May 2003 and March 2004, giving a total of 1,144 samples. Inter-individual variability (between-subject variance component, VC b and coefficient of variation, CV b), intra-individual and analytical variability (VC w+a and CV w+a), intraclass correlation coefficient (ICC) and the number of measurements required for an ICC of 0.75 were estimated to assess the reliability of serial fibrinogen measurements. Mean fibrinogen concentration of all subjects over all samples was 3.34 g/l (standard deviation 0.67). Between-subject variation for fibrinogen was VC b = 0.34 (CV b'=17.5%) whereas within-subject and analytical variation was estimated as VC w+a = 0.14 (CV w+a=11.0%). The variation was mainly explained by between-subject variability, shown by the proportion of total variance of 71.3%. Two different measurements were required to reach sufficient reliability, if subjects with extreme values were not excluded. The present study indicates a fairly good reproducibility of serial individual fibrinogen measurements in post-MI subjects. © Schattauer 2012.
BibTeX: @Article{Baumert2012895, author = {Baumert, J and Karakas, M and Greven, S and R{\"{u}}ckerl, R and Peters, A and Koenig, W}, title = {Variability of fibrinogen measurements in post-myocardial infarction patients: Results from the AIRGENE study center Augsburg}, year = {2012}, abstract = {Elevated fibrinogen levels are strongly and consistently associated with incident coronary heart disease (CHD). A possible causal contribution of fibrinogen in the pathway leading to atherothrombotic cardiovascular disease complications has been suggested. However, for implementation in clinical practice, data on validity and reliability, which are still scarce, are needed that are still scarce, especially in subjects with a history of CHD. For the present study, levels of plasma fibrinogen were measured in 200 post-myocardial infarction (post-MI) patients aged 39-76 years, with approximately six blood samples collected at monthly intervals between May 2003 and March 2004, giving a total of 1,144 samples. Inter-individual variability (between-subject variance component, VC b and coefficient of variation, CV b), intra-individual and analytical variability (VC w+a and CV w+a), intraclass correlation coefficient (ICC) and the number of measurements required for an ICC of 0.75 were estimated to assess the reliability of serial fibrinogen measurements. Mean fibrinogen concentration of all subjects over all samples was 3.34 g/l (standard deviation 0.67). Between-subject variation for fibrinogen was VC b = 0.34 (CV b'=17.5\%) whereas within-subject and analytical variation was estimated as VC w+a = 0.14 (CV w+a=11.0\%). The variation was mainly explained by between-subject variability, shown by the proportion of total variance of 71.3\%. Two different measurements were required to reach sufficient reliability, if subjects with extreme values were not excluded. The present study indicates a fairly good reproducibility of serial individual fibrinogen measurements in post-MI subjects. {\textcopyright} Schattauer 2012.}, journal = {Thrombosis and Haemostasis}, volume = {107}, number = {5}, pages = {895-902}, keywords = {creatine kinase; fibrinogen; high density lipoprotein cholesterol, adult; aged; article; blood pressure measurement; blood sampling; body mass; cardiovascular risk; cholesterol blood level; female; fibrinogen blood level; heart infarction; human; ischemic heart disease; major clinical study; male; priority journal, Adult; Aged; Analysis of Variance; Biological Markers; Europe; Female; Fibrinogen; Humans; Longitudinal Studies; Male; Middle Aged; Myocardial Infarction; Nephelometry and Turbidimetry; Predictive Value of Tests; Reproducibility of Results; Risk Assessment; Risk Factors; Time Factors}, doi = {10.1160/TH11-10-0703}, issn = {03406245} }
Brüske, I., Hampel, R., Baumgärtner, Z., Rückerl, R., Greven, S., Koenig, W., Peters, A., Schneider, A.	Ambient air pollution and lipoprotein-associated phospholipase A2 in survivors of myocardial infarction [Abstract] [BibTeX]	2011	Environmental Health Perspectives 119 (7):921-926	article	DOI
Abstract: Background: Increasing evidence suggests230 a proatherogenic role for lipoprotein-associated phospholipase A2 (Lp-PLA2). A meta-analysis of published cohorts has shown that Lp-PLA2 is an independent predictor of coronary heart disease events and stroke. Objective: In this study, we investigated whether the association between air pollution and cardiovascular disease might be partly explained by increased Lp-PLA2 mass in response to exposure. Methods: A prospective longitudinal study of 200 patients who had had a myocardial infarction was performed in Augsburg, Germany. Up to six repeated clinical examinations were scheduled every 4-6 weeks between May 2003 and March 2004. Supplementary to the multicenter AIRGENE protocol, we assessed repeated plasma Lp-PLA2 concentrations. Air pollution data from a fixed monitoring site representing urban background concentrations were collected. We measured hourly means of particle mass [particulate matter (PM) < 10 μm (PM10) and PM < 2.5 μm (PM2.5) in aerodynamic diameter] and particle number concentrations (PNCs), as well as the gaseous air pollutants carbon monoxide (CO), sulfur dioxide (SO2), ozone (O3), nitric oxide (NO), and nitrogen dioxide (NO2). Data were analyzed using mixed models with random patient effects. Results: Lp-PLA2 showed a positive association with PM10, PM2.5, and PNCs, as well as with CO, NO2, NO, and SO2 4-5 days before blood withdrawal (lag 4-5). A positive association with O3 was much more immediate (lag 0). However, inverse associations with some pollutants were evident at shorter time lags. Conclusion: These preliminary findings should be replicated in other study populations because they suggest that the accumulation of acute and subacute effects or the chronic exposure to ambient particulate and gaseous air pollution may result in the promotion of atherosclerosis, mediated, at least in part, by increased levels of Lp-PLA2.
BibTeX: @Article{Brske2011921, author = {Br{\"{u}}ske, I and Hampel, R and Baumg{\"{a}}rtner, Z and R{\"{u}}ckerl, R and Greven, S and Koenig, W and Peters, A and Schneider, A}, title = {Ambient air pollution and lipoprotein-associated phospholipase A2 in survivors of myocardial infarction}, year = {2011}, abstract = {Background: Increasing evidence suggests230 a proatherogenic role for lipoprotein-associated phospholipase A2 (Lp-PLA2). A meta-analysis of published cohorts has shown that Lp-PLA2 is an independent predictor of coronary heart disease events and stroke. Objective: In this study, we investigated whether the association between air pollution and cardiovascular disease might be partly explained by increased Lp-PLA2 mass in response to exposure. Methods: A prospective longitudinal study of 200 patients who had had a myocardial infarction was performed in Augsburg, Germany. Up to six repeated clinical examinations were scheduled every 4-6 weeks between May 2003 and March 2004. Supplementary to the multicenter AIRGENE protocol, we assessed repeated plasma Lp-PLA2 concentrations. Air pollution data from a fixed monitoring site representing urban background concentrations were collected. We measured hourly means of particle mass [particulate matter (PM) \< 10 $\mu$m (PM10) and PM \< 2.5 $\mu$m (PM2.5) in aerodynamic diameter] and particle number concentrations (PNCs), as well as the gaseous air pollutants carbon monoxide (CO), sulfur dioxide (SO2), ozone (O3), nitric oxide (NO), and nitrogen dioxide (NO2). Data were analyzed using mixed models with random patient effects. Results: Lp-PLA2 showed a positive association with PM10, PM2.5, and PNCs, as well as with CO, NO2, NO, and SO2 4-5 days before blood withdrawal (lag 4-5). A positive association with O3 was much more immediate (lag 0). However, inverse associations with some pollutants were evident at shorter time lags. Conclusion: These preliminary findings should be replicated in other study populations because they suggest that the accumulation of acute and subacute effects or the chronic exposure to ambient particulate and gaseous air pollution may result in the promotion of atherosclerosis, mediated, at least in part, by increased levels of Lp-PLA2.}, journal = {Environmental Health Perspectives}, volume = {119}, number = {7}, pages = {921-926}, keywords = {1 alkyl 2 acetylglycerophosphocholine esterase; beta adrenergic receptor blocking agent; carbon monoxide; dipeptidyl carboxypeptidase inhibitor; diuretic agent; fibric acid derivative; hydroxymethylglutaryl coenzyme A reductase inhibitor; nitric oxide; nitrogen dioxide; ozone; sulfur dioxide, 1 alkyl 2 acetylglycerophosphocholine esterase blood level; adult; aged; air pollutant; air pollution; ambient air; article; cardiovascular risk; concentration (parameters); disease association; environmental monitoring; enzyme blood level; female; Germany; heart infarction; human; longitudinal study; major clinical study; male; priority journal; prospective study; survivor; suspended particulate matter}, doi = {10.1289/ehp.1002681}, issn = {00916765} }
Greven, S., Dominici, F., Zeger, S.	An approach to the estimation of chronic air pollution effects using spatio-temporal information [Abstract] [BibTeX]	2011	Journal of the American Statistical Association 106 (494):396-406	article	DOI
Abstract: There is substantial observational evidence that long-term exposure to particulate air pollution is associated with premature death in urban populations. Estimates of the magnitude of these effects derive largely from cross sectional comparisons of adjusted mortality rates among cities with varying pollution levels. Such estimates are potentially confounded by other differences among the populations correlated with air pollution, for example, socioeconomic factors. An alternative approach is to study covariation of particulate matter and mortality across time within a city, as has been done in investigations of short-term exposures. In either event, observational studies like these are subject to confounding by unmeasured variables. Therefore the ability to detect such confounding and to derive estimates less affected by confounding are a high priority. In this article, we describe and apply a method of decomposing the exposure variable into components with variation at distinct temporal, spatial, and time by space scales, here focusing on the components involving time. Starting from a proportional hazard model, we derive a Poisson regression model and estimate two regression coefficients: the "global" coefficient that measures the association between national trends in pollution and mortality; and the "local" coefficient, derived from space by time variation, that measures the association between location-specific trends in pollution and mortality adjusted by the national trends. Absent unmeasured confounders and given valid model assumptions, the scale-specific coefficients should be similar; substantial differences in these coefficients constitute a basis for questioning the model. We derive a backfitting algorithm to fit our model to very large spatio-temporal datasets. We apply our methods to the Medicare Cohort Air Pollution Study (MCAPS), which includes individual-level information on time of death and age on a population of 18.2 million for the period 2000-2006. Results based on the global coefficient indicate a large increase in the national life expectancy for reductions in the yearly national average of PM2.5. However, this coefficient based on national trends in PM2.5 and mortality is likely to be confounded by other variables trending on the national level. Confounding of the local coefficient by unmeasured factors is less likely, although it cannot be ruled out. Based on the local coefficient alone, we are not able to demonstrate any change in life expectancy for a reduction in PM2.5.We use additional survey data available for a subset of the data to investigate sensitivity of results to the inclusion of additional covariates, but both coefficients remain largely unchanged. © 2011 American Statistical Association.
BibTeX: @Article{Greven2011396, author = {Greven, S and Dominici, F and Zeger, S}, title = {An approach to the estimation of chronic air pollution effects using spatio-temporal information}, year = {2011}, abstract = {There is substantial observational evidence that long-term exposure to particulate air pollution is associated with premature death in urban populations. Estimates of the magnitude of these effects derive largely from cross sectional comparisons of adjusted mortality rates among cities with varying pollution levels. Such estimates are potentially confounded by other differences among the populations correlated with air pollution, for example, socioeconomic factors. An alternative approach is to study covariation of particulate matter and mortality across time within a city, as has been done in investigations of short-term exposures. In either event, observational studies like these are subject to confounding by unmeasured variables. Therefore the ability to detect such confounding and to derive estimates less affected by confounding are a high priority. In this article, we describe and apply a method of decomposing the exposure variable into components with variation at distinct temporal, spatial, and time by space scales, here focusing on the components involving time. Starting from a proportional hazard model, we derive a Poisson regression model and estimate two regression coefficients: the "global" coefficient that measures the association between national trends in pollution and mortality; and the "local" coefficient, derived from space by time variation, that measures the association between location-specific trends in pollution and mortality adjusted by the national trends. Absent unmeasured confounders and given valid model assumptions, the scale-specific coefficients should be similar; substantial differences in these coefficients constitute a basis for questioning the model. We derive a backfitting algorithm to fit our model to very large spatio-temporal datasets. We apply our methods to the Medicare Cohort Air Pollution Study (MCAPS), which includes individual-level information on time of death and age on a population of 18.2 million for the period 2000-2006. Results based on the global coefficient indicate a large increase in the national life expectancy for reductions in the yearly national average of PM2.5. However, this coefficient based on national trends in PM2.5 and mortality is likely to be confounded by other variables trending on the national level. Confounding of the local coefficient by unmeasured factors is less likely, although it cannot be ruled out. Based on the local coefficient alone, we are not able to demonstrate any change in life expectancy for a reduction in PM2.5.We use additional survey data available for a subset of the data to investigate sensitivity of results to the inclusion of additional covariates, but both coefficients remain largely unchanged. {\textcopyright} 2011 American Statistical Association.}, journal = {Journal of the American Statistical Association}, volume = {106}, number = {494}, pages = {396-406}, doi = {10.1198/jasa.2011.ap09392}, issn = {01621459} }
Wilbert-Lampen, U., Nickel, T., Scheipl, F., Greven, S., Küchenhoff, H., Kääb, S., Steinbeck, G.	Mortality due to myocardial infarction in the Bavarian population during World Cup Soccer 2006 [Abstract] [BibTeX]	2011	Clinical Research in Cardiology 100 (9):731-736	article	DOI
Abstract: Background: Previously, we had demonstrated that the World Cup Soccer 2006 provoked levels of emotional stress sufficient to increase the incidence of acute cardiovascular events. We sought to assess whether mortality was also increased as a result. Method: We analyzed daily data on mortality due to myocardial infarction (MI) and total mortality using data from the Bavarian State Office for Statistics. We retrospectively assessed study periods from 2006, 2005 and 2003. Quasi-Poisson regression with a log link to model the number of daily deaths was used. To be able to account for a possible delay, we also fitted a cubic distributed lag quasi-Poisson model for both 1 and 2 weeks post-exposure. Results: A total of 6,699 deaths due to MI were investigated. No increase in death was found on days of World Cup matches either with or without German participation compared to the matched control periods. In addition, none of the analyses showed a significant effect of the (lagged) exposure to the risk period. Likewise, total mortality rates remained unchanged over the entire period of our analysis. Conclusion: During World Cup Soccer, the number of deaths due to myocardial infarction was not measurably increased compared to a matched control period. Thus, we could not demonstrate a translation of a stress-induced increase of cardiac morbidity into a noticeable increase in mortality. However, our findings are based on a public mortality registry, which may be flawed in many ways, regarding ascertainment of causes of death, in particular. © 2011 Springer-Verlag.
BibTeX: @Article{WilbertLampen2011731, author = {Wilbert-Lampen, U and Nickel, T and Scheipl, F and Greven, S and K{\"{u}}chenhoff, H and K{\"{a}}{\"{a}}b, S and Steinbeck, G}, title = {Mortality due to myocardial infarction in the Bavarian population during World Cup Soccer 2006}, year = {2011}, abstract = {Background: Previously, we had demonstrated that the World Cup Soccer 2006 provoked levels of emotional stress sufficient to increase the incidence of acute cardiovascular events. We sought to assess whether mortality was also increased as a result. Method: We analyzed daily data on mortality due to myocardial infarction (MI) and total mortality using data from the Bavarian State Office for Statistics. We retrospectively assessed study periods from 2006, 2005 and 2003. Quasi-Poisson regression with a log link to model the number of daily deaths was used. To be able to account for a possible delay, we also fitted a cubic distributed lag quasi-Poisson model for both 1 and 2 weeks post-exposure. Results: A total of 6,699 deaths due to MI were investigated. No increase in death was found on days of World Cup matches either with or without German participation compared to the matched control periods. In addition, none of the analyses showed a significant effect of the (lagged) exposure to the risk period. Likewise, total mortality rates remained unchanged over the entire period of our analysis. Conclusion: During World Cup Soccer, the number of deaths due to myocardial infarction was not measurably increased compared to a matched control period. Thus, we could not demonstrate a translation of a stress-induced increase of cardiac morbidity into a noticeable increase in mortality. However, our findings are based on a public mortality registry, which may be flawed in many ways, regarding ascertainment of causes of death, in particular. {\textcopyright} 2011 Springer-Verlag.}, journal = {Clinical Research in Cardiology}, volume = {100}, number = {9}, pages = {731-736}, keywords = {article; cause of death; controlled study; emotional stress; female; Germany; heart infarction; human; major clinical study; male; mortality; sport, Female; Germany; Humans; Male; Myocardial Infarction; Poisson Distribution; Registries; Regression Analysis; Retrospective Studies; Soccer; Stress, Psychological}, doi = {10.1007/s00392-011-0302-7}, issn = {18610684} }
Greven, S., Crainiceanu, C., Caffo, B., Reich, D.	Longitudinal functional principal component analysis [Abstract] [BibTeX]	2010	Electronic Journal of Statistics 4 1022-1054	article	DOI
Abstract: We introduce models for the analysis of functional data observed at multiple time points. The dynamic behavior of functional data is decomposed into a time-dependent population average, baseline (or static) subject-specific variability, longitudinal (or dynamic) subject-specific variability, subject-visit-specific variability and measurement error. The model can be viewed as the functional analog of the classical longitudinal mixed effects model where random effects are replaced by random processes. Methods have wide applicability and are computationally feasible for moderate and large data sets. Computational feasibility is assured by using principal component bases for the functional processes. The methodology is motivated by and applied to a diffusion tensor imaging (DTI) study designed to analyze differences and changes in brain connectivity in healthy volunteers and multiple sclerosis (MS) patients. An R implementation is provided. © 2010, Institute of Mathematical Statistics. All rights reserved.
BibTeX: @Article{Greven20101022, author = {Greven, S and Crainiceanu, C and Caffo, B and Reich, D}, title = {Longitudinal functional principal component analysis}, year = {2010}, abstract = {We introduce models for the analysis of functional data observed at multiple time points. The dynamic behavior of functional data is decomposed into a time-dependent population average, baseline (or static) subject-specific variability, longitudinal (or dynamic) subject-specific variability, subject-visit-specific variability and measurement error. The model can be viewed as the functional analog of the classical longitudinal mixed effects model where random effects are replaced by random processes. Methods have wide applicability and are computationally feasible for moderate and large data sets. Computational feasibility is assured by using principal component bases for the functional processes. The methodology is motivated by and applied to a diffusion tensor imaging (DTI) study designed to analyze differences and changes in brain connectivity in healthy volunteers and multiple sclerosis (MS) patients. An R implementation is provided. {\textcopyright} 2010, Institute of Mathematical Statistics. All rights reserved.}, journal = {Electronic Journal of Statistics}, volume = {4}, pages = {1022-1054}, doi = {10.1214/10-EJS575}, issn = {19357524} }
Karakas, M., Baumert, J., Greven, S., Rückerl, R., Peters, A., Koenig, W.	Reproducibility in serial C-reactive protein and interleukin-6 measurements in post-myocardial infarction patients: Results from the AIRGENE study [Abstract] [BibTeX]	2010	Clinical Chemistry 56 (5):861-864	article	DOI
Abstract: BACKGROUND: Among the numerous emerging biomarkers, high-sensitivity C-reactive protein (hsCRP) and interleukin-6 (IL-6) have received widespread interest, and a large database has been accumulated on their potential role as predictors of cardiovascular risk. The concentrations of inflammatory biomarkers, however, are influenced, among other things, by physiological variation, which is the natural within-individual variation occurring over time. Implementation of hsCRP and IL-6 measurement into clinical practice requires data on the reliability of such measurements. METHODS: We serially measured hsCRP and IL-6 concentrations in up to 6 blood samples taken at monthly intervals from 200 post-myocardial infarction patients who participated in the AIRGENE study. RESULTS: The mean (SD) of the ln-transformed plasma concentrations (in milligrams per liter for hsCRP and nanograms per liter for IL-6) for all participants over all samples was 0.16 (1.04) for hsCRP and 0.76 (0.57) for IL-6, with no significant differences between men and women. The within-individual and analytical variance component for the ln-transformed hsCRP data was 0.37, and the between-individual variance component was 0.73. For the ln-transformed IL-6 data, these values were 0.11 and 0.22, respectively. A substantial part of the total variation in plasma hsCRP and IL-6 concentrations was explained by the between-individual variation (as a percentage of the total variance, 66.1% for the ln-transformed hsCRP data and 66.2% for the ln-transformed IL-6 data). For both markers, 2 measurements were needed to reach a sufficient reliability. CONCLUSIONS: Our results demonstrate considerable stability and good reproducibility for serial hsCRP and IL-6 measurements. Thus, there should be no major concern about misclassification in clinical practice if at least 2 subsequent measurements are taken.
BibTeX: @Article{Karakas2010861, author = {Karakas, M and Baumert, J and Greven, S and R{\"{u}}ckerl, R and Peters, A and Koenig, W}, title = {Reproducibility in serial C-reactive protein and interleukin-6 measurements in post-myocardial infarction patients: Results from the AIRGENE study}, year = {2010}, abstract = {BACKGROUND: Among the numerous emerging biomarkers, high-sensitivity C-reactive protein (hsCRP) and interleukin-6 (IL-6) have received widespread interest, and a large database has been accumulated on their potential role as predictors of cardiovascular risk. The concentrations of inflammatory biomarkers, however, are influenced, among other things, by physiological variation, which is the natural within-individual variation occurring over time. Implementation of hsCRP and IL-6 measurement into clinical practice requires data on the reliability of such measurements. METHODS: We serially measured hsCRP and IL-6 concentrations in up to 6 blood samples taken at monthly intervals from 200 post-myocardial infarction patients who participated in the AIRGENE study. RESULTS: The mean (SD) of the ln-transformed plasma concentrations (in milligrams per liter for hsCRP and nanograms per liter for IL-6) for all participants over all samples was 0.16 (1.04) for hsCRP and 0.76 (0.57) for IL-6, with no significant differences between men and women. The within-individual and analytical variance component for the ln-transformed hsCRP data was 0.37, and the between-individual variance component was 0.73. For the ln-transformed IL-6 data, these values were 0.11 and 0.22, respectively. A substantial part of the total variation in plasma hsCRP and IL-6 concentrations was explained by the between-individual variation (as a percentage of the total variance, 66.1\% for the ln-transformed hsCRP data and 66.2\% for the ln-transformed IL-6 data). For both markers, 2 measurements were needed to reach a sufficient reliability. CONCLUSIONS: Our results demonstrate considerable stability and good reproducibility for serial hsCRP and IL-6 measurements. Thus, there should be no major concern about misclassification in clinical practice if at least 2 subsequent measurements are taken.}, journal = {Clinical Chemistry}, volume = {56}, number = {5}, pages = {861-864}, keywords = {C reactive protein; hydroxymethylglutaryl coenzyme A reductase inhibitor; interleukin 6, adult; aged; article; blood sampling; clinical practice; controlled study; Dressler syndrome; female; human; male; measurement; protein blood level; reliability; reproducibility, Adult; Aged; Biological Markers; C-Reactive Protein; Cohort Studies; Female; Humans; Interleukin-6; Male; Middle Aged; Myocardial Infarction; Reproducibility of Results}, doi = {10.1373/clinchem.2010.143719}, issn = {00099147} }
Greven, S., Crainiceanu, C., Küchenhoff, H., Peters, A.	Restricted likelihood ratio testing for zero variance components in linear mixed models [Abstract] [BibTeX]	2009	Journal of Computational and Graphical Statistics 17 (4):870-891	article	DOI
Abstract: The goal of our article is to provide a transparent, robust, and computationally fea sible statistical platform for restricted likelihood ratio testing (RLRT) for zero variance components in linear mixed models. This problem is nonstandard because under the null hypothesis the parameter is on the boundary of the parameter space. Our proposed approach is different from the asymptotic results of Stram and Lee who assumed that the outcome vector can be partitioned into many independent subvectors. Thus, our methodology applies to a wider class of mixed models, which includes models with a moderate number of clusters or nonparametric smoothing components. We propose two approximations to the finite sample null distribution of the RLRT statistic. Both approximations converge weakly to the asymptotic distribution obtained by Stram and Lee when their assumptions hold. When their assumptions do not hold, we show in extensive simulation studies that both approximations outperform the Stram and Lee approximation ad the parametric bootstrap. We also identify and address numerical problems associated with standard mixed model software. Our methods are motivated by and applied to a large longitudinal study on air pollution health effects in a highly susceptible cohort. Relevant software is posted as an online supplement. © 2008 American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America.
BibTeX: @Article{Greven2009870, author = {Greven, S and Crainiceanu, C.M and K{\"{u}}chenhoff, H and Peters, A}, title = {Restricted likelihood ratio testing for zero variance components in linear mixed models}, year = {2009}, abstract = {The goal of our article is to provide a transparent, robust, and computationally fea sible statistical platform for restricted likelihood ratio testing (RLRT) for zero variance components in linear mixed models. This problem is nonstandard because under the null hypothesis the parameter is on the boundary of the parameter space. Our proposed approach is different from the asymptotic results of Stram and Lee who assumed that the outcome vector can be partitioned into many independent subvectors. Thus, our methodology applies to a wider class of mixed models, which includes models with a moderate number of clusters or nonparametric smoothing components. We propose two approximations to the finite sample null distribution of the RLRT statistic. Both approximations converge weakly to the asymptotic distribution obtained by Stram and Lee when their assumptions hold. When their assumptions do not hold, we show in extensive simulation studies that both approximations outperform the Stram and Lee approximation ad the parametric bootstrap. We also identify and address numerical problems associated with standard mixed model software. Our methods are motivated by and applied to a large longitudinal study on air pollution health effects in a highly susceptible cohort. Relevant software is posted as an online supplement. {\textcopyright} 2008 American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America.}, journal = {Journal of Computational and Graphical Statistics}, volume = {17}, number = {4}, pages = {870-891}, doi = {10.1198/106186008X386599}, issn = {10618600} }
Panagiotakos, D., Dimakopoulou, K., Katsouyanni, K., Bellander, T., Grau, M., Koenig, W., Lanki, T., Pistelli, R., Schneider, A., Peters, A., Brueske-Hohfeld, I., Chavez, H., Cyrys, J., Geruschkat, U., Grallert, H., Greven, S., Ibald-Mulli, A., Illig, T., et al.	Mediterranean diet and inflammatory response in myocardial infarction survivors [Abstract] [BibTeX]	2009	International Journal of Epidemiology 38 (3):856-866	article	DOI
Abstract: Background: Within the framework of the multi-centre AIRGENE project we studied the association of the Mediterranean diet on plasma levels of various inflammatory markers, in myocardial infarction (MI) survivors from six geographic areas in Europe. Methods: From 2003 to 2004, 1003 patients were repeatedly clinically examined. On every clinical visit (on average 5.8 times), blood EDTA-plasma samples were collected. High sensitivity C-reactive protein (CRP), interleukin (IL)-6 and fibrinogen concentrations were measured based on standardized procedures. Dietary habits were evaluated through a semi-quantitative Food Frequency Questionnaire (FFQ), whereas adherence to the Mediterranean diet was assessed by a diet score. Results: A protective effect of adherence to the Mediterranean diet was found. For each unit of increasing adherence to the Mediterranean diet score there was a reduction of 3.1% in the average CRP levels (95% CI 0.5-5.7%) and of 1.9% in the average IL-6 levels (95% CI 0.5-3.4%) after adjusting for centre, age, sex, body mass index, physical activity, smoking status, diabetes and medication intake. No significant association was observed between the diet score and fibrinogen levels. Moderate intake of red wine (1-12 wine glasses per month) was associated with lower levels of CRP, IL-6 and fibrinogen. Conclusions: Adherence to the traditional Mediterranean diet was associated with a reduction of the concentrations of inflammatory markers in MI survivors. This may, in part, explain the beneficial effects of this diet on various chronic diseases such as atherosclerosis and cancer, and expands its role to secondary prevention level. © Published by Oxford University Press on behalf of the International Epidemiological Association © The Author 2009; all rights reserved.
BibTeX: @article{Panagiotakos2009856, author = {Panagiotakos, D.B and Dimakopoulou, K and Katsouyanni, K and Bellander, T and Grau, M and Koenig, W and Lanki, T and Pistelli, R and Schneider, A and Peters, A and Brueske-Hohfeld, I and Chavez, H and Cyrys, J and Geruschkat, U and Grallert, H and Greven, S and Ibald-Mulli, A and Illig, T and Kirchmair, H and von Klot, S and Kolz, M and Marowsky-Koeppl, M and Mueller, M and Rueckerl, R and Schaffrath Rosario, A and Schneider, A and Wichmann, H.-E and Holle, R and Nagl, H and Fabricius, I and Greschik, C and G{\"{u}}ther, F and Haensel, M and Hah, U and Kuch, U and Meisinger, C and Pietsch, M and Rempfer, E and Schaich, G and Schwarzw{\~{a}}lder, I and Zeitler, B and Loewel, H and Koenig, W and Khuseyinova, N and Trischler, G and Forastiere, F and Compagnucci, P and Di Carlo, F and Ferri, M and Montanari, A and Perucci, C and Picciotto, S and Romeo, E and Stafoggia, M and Pistelli, R and Altamura, L and Andreani, M.R and Baldari, F and Infusino, F and Santarelli, P and Jesi, A.P and Cattani, G and Marconi, A and Pekkaen, J and Alanne, M and Alastalo, H and Eerola, T and Eriksson, J and Kauppila, T and Lanki, T and Nyholm, P and Perola, M and Salomaa, V and Tiittanen, P and Luotola, K and Bellader, T and Berglind, N and Bohm, K and H{\"{a}}rden, R and Lampa, E and Ljungman, P and Nyberg, F and Ohlander, B and Pershagen, G and Rosenqvist, M and Larsdotter Svensson, T and Thunberg, E and Wedeen, G and Sunyer, J and Covas, M and Fit{\'{o}}, M and Grau, M and Jacquemin, B and Marrugat, J and Mu{\\~{n}}z, L and Perell{\'{o}}, M and Plana, E and Rebato, C and Schroeder, H and Soler, C and Katsouyanni, K and Chalamandaris, A and Dimakopoulou, K and Panaggiotakos, D and Stefanadis, C and Pitasavos, C and Antoiades, C and Chrysohoou, C and Mitropoulos, J and Kulmala, M and Aalto, P and Paatero, P}, title = {Mediterranean diet and inflammatory response in myocardial infarction survivors}, year = {2009}, abstract = {Background: Within the framework of the multi-centre AIRGENE project we studied the association of the Mediterranean diet on plasma levels of various inflammatory markers, in myocardial infarction (MI) survivors from six geographic areas in Europe. Methods: From 2003 to 2004, 1003 patients were repeatedly clinically examined. On every clinical visit (on average 5.8 times), blood EDTA-plasma samples were collected. High sensitivity C-reactive protein (CRP), interleukin (IL)-6 and fibrinogen concentrations were measured based on standardized procedures. Dietary habits were evaluated through a semi-quantitative Food Frequency Questionnaire (FFQ), whereas adherence to the Mediterranean diet was assessed by a diet score. Results: A protective effect of adherence to the Mediterranean diet was found. For each unit of increasing adherence to the Mediterranean diet score there was a reduction of 3.1\% in the average CRP levels (95\% CI 0.5-5.7\%) and of 1.9\% in the average IL-6 levels (95\% CI 0.5-3.4\%) after adjusting for centre, age, sex, body mass index, physical activity, smoking status, diabetes and medication intake. No significant association was observed between the diet score and fibrinogen levels. Moderate intake of red wine (1-12 wine glasses per month) was associated with lower levels of CRP, IL-6 and fibrinogen. Conclusions: Adherence to the traditional Mediterranean diet was associated with a reduction of the concentrations of inflammatory markers in MI survivors. This may, in part, explain the beneficial effects of this diet on various chronic diseases such as atherosclerosis and cancer, and expands its role to secondary prevention level. {\textcopyright} Published by Oxford University Press on behalf of the International Epidemiological Association {\textcopyright} The Author 2009; all rights reserved.}, journal = {International Journal of Epidemiology}, volume = {38}, number = {3}, pages = {856-866}, keywords = {C reactive protein; edetic acid; fibrinogen; interleukin 6, body mass; cardiovascular disease; diet; food quality; lifestyle; survivorship, adult; aged; alcohol consumption; article; clinical trial; female; fibrinogen blood level; food frequency questionnaire; heart infarction; human; major clinical study; male; Mediterranean diet; multicenter study; patient compliance; priority journal; protein blood level; red wine; smoking, Aged; Biological Markers; Coronary Disease; Diet Records; Diet, Mediterranean; Europe; Female; Humans; Inflammation; Longitudinal Studies; Male; Middle Aged; Myocardial Infarction; Patient Compliance; Questionnaires; Secondary Prevention; Survivors, Eurasia; Europe}, doi = {10.1093/ije/dyp142}, issn = {03005771} }
Peters, A., Greven, S., Heid, I., Baldari, F., Breitner, S., Bellander, T., Chrysohoou, C., Illig, T., Jacquemin, B., Koenig, W., Lanki, T., Nyberg, F., Pekkanen, J., Pistelli, R., Rückerl, R., Stefanadis, C., Schneider, A., Sunyer, J., Wichmann, H.	Fibrinogen genes modify the fibrinogen response to ambient particulate matter [Abstract] [BibTeX]	2009	American Journal of Respiratory and Critical Care Medicine 179 (6):484-491	article	DOI
Abstract: Rationale: Ambient particulate matter has been associated with systemic inflammation indicated by blood markers such as fibrinogen, implicated in promoting atherothrombosis. Objectives: This study evaluated whether single-nucleotide polymorphisms (SNPs) within thefibrinogen genes modified the relationship between ambient particles and plasma fibrinogen. Methods: In 854 myocardial infarction survivors from five European cities plasma fibrinogen levels were determined repeatedly (n = 5,082). City-specific analyses were conducted to assess the impact of particulate matter on fibrinogen levels, applying additive mixed models adjusting for patient characteristics, time trend, and weather. City-specific estimates were pooled by meta-analysis methodology. Measurements and Main Results: Seven SNPs in the FGA and FGB genes shown to be associated with differences in fibrinogen levels were selected. Promoter SNPs within FGA and FGB were associated with modifications of the relationship between 5-day averages of particulate matter with an aerodynamic diameter below 10 μm(PM10)and plasma fibrinogen levels. The PM10-fibrinogen relationship for subjects with the homozygous minor allele genotype of FGB rs1800790 compared with subjects homozygous for the major allele was eightfold higher (P value for the interaction, 0.037). Conclusions: The data suggest that susceptibility to ambient particu- late matter may be partly genetically determined by polymorphisms that alter early physiological responses such as transcription of fibrinogen. Subjects with variants of these frequent SNPs may have increased risks not only due to constitutionally higher fibrinogen concentrations, but also due to an augmented response to environmental inflammatory stimuli such as ambient particulate matter.
BibTeX: @Article{Peters2009484, author = {Peters, A and Greven, S and Heid, I.M and Baldari, F and Breitner, S and Bellander, T and Chrysohoou, C and Illig, T and Jacquemin, B and Koenig, W and Lanki, T and Nyberg, F and Pekkanen, J and Pistelli, R and R{\"{u}}ckerl, R and Stefanadis, C and Schneider, A and Sunyer, J and Wichmann, H.E}, title = {Fibrinogen genes modify the fibrinogen response to ambient particulate matter}, year = {2009}, abstract = {Rationale: Ambient particulate matter has been associated with systemic inflammation indicated by blood markers such as fibrinogen, implicated in promoting atherothrombosis. Objectives: This study evaluated whether single-nucleotide polymorphisms (SNPs) within thefibrinogen genes modified the relationship between ambient particles and plasma fibrinogen. Methods: In 854 myocardial infarction survivors from five European cities plasma fibrinogen levels were determined repeatedly (n = 5,082). City-specific analyses were conducted to assess the impact of particulate matter on fibrinogen levels, applying additive mixed models adjusting for patient characteristics, time trend, and weather. City-specific estimates were pooled by meta-analysis methodology. Measurements and Main Results: Seven SNPs in the FGA and FGB genes shown to be associated with differences in fibrinogen levels were selected. Promoter SNPs within FGA and FGB were associated with modifications of the relationship between 5-day averages of particulate matter with an aerodynamic diameter below 10 $\mu$m(PM10)and plasma fibrinogen levels. The PM10-fibrinogen relationship for subjects with the homozygous minor allele genotype of FGB rs1800790 compared with subjects homozygous for the major allele was eightfold higher (P value for the interaction, 0.037). Conclusions: The data suggest that susceptibility to ambient particu- late matter may be partly genetically determined by polymorphisms that alter early physiological responses such as transcription of fibrinogen. Subjects with variants of these frequent SNPs may have increased risks not only due to constitutionally higher fibrinogen concentrations, but also due to an augmented response to environmental inflammatory stimuli such as ambient particulate matter.}, journal = {American Journal of Respiratory and Critical Care Medicine}, volume = {179}, number = {6}, pages = {484-491}, keywords = {fibrinogen; fibrinogen, adult; aged; air pollution; allele; article; atherosclerosis; Europe; female; FGA gene; FGB gene; gene; gene frequency; genetic transcription; genetic variability; genotype; heart infarction; high risk population; homozygosity; human; inflammation; major clinical study; male; particle size; particulate matter; priority journal; promoter region; single nucleotide polymorphism; stimulus response; thrombosis; weather; clinical trial; environmental exposure; Europe; genetic predisposition; genetics; heart infarction; homozygote; longitudinal study; middle aged; multicenter study; single nucleotide polymorphism; urban population, Adult; Aged; Aged, 80 and over; Environmental Exposure; Europe; Female; Fibrinogen; Gene Frequency; Genetic Predisposition to Disease; Genotype; Homozygote; Humans; Longitudinal Studies; Male; Middle Aged; Myocardial Infarction; Particulate Matter; Polymorphism, Single Nucleotide; Promoter Regions, Genetic; Urban Population}, doi = {10.1164/rccm.200805-751OC}, issn = {1073449X} }
Jacquemin, B., Antoniades, C., Nyberg, F., Plana, E., Müller, M., Greven, S., Salomaa, V., Sunyer, J., Bellander, T., Chalamandaris, A., Pistelli, R., Koenig, W., Peters, A.	Common Genetic Polymorphisms and Haplotypes of Fibrinogen Alpha, Beta, and Gamma Chains Affect Fibrinogen Levels and the Response to Proinflammatory Stimulation in Myocardial Infarction Survivors. The AIRGENE Study [Abstract] [BibTeX]	2008	Journal of the American College of Cardiology 52 (11):941-952	article	DOI
Abstract: Objectives: This study was designed to investigate whether single nucleotide polymorphisms (SNPs) and haplotypes of the fibrinogen gene-cluster (fibrinogen chains alpha [FGA], beta [FGB], and gamma [FGG]) could explain the inter- and intraindividual variability of fibrinogen levels in patients with atherosclerosis. We also searched for genetic determinants affecting the responses of fibrinogen genes to proinflammatory stimulation. Background: The mechanisms regulating fibrinogen levels are not fully understood, and they are likely to be regulated by complex gene-environment interactions. Methods: In the AIRGENE study, 895 survivors of myocardial infarction from 5 European cities were followed prospectively for 6 to 8 months, and plasma fibrinogen, interleukin (IL)-6, and C-reactive protein levels were determined monthly. We analyzed 21 SNPs and the corresponding haplotypes in the 3 fibrinogen genes. Results: Eight SNPs in FGA and FGB were significantly associated with fibrinogen levels. Similarly, 2 different haplotypes in FGA and 3 in FGB were also associated with mean fibrinogen levels. The IL-6 levels had a significant impact on the associations between SNPs/haplotypes in FGA/FGB and fibrinogen levels. We also identified SNPs and haplotypes in FGA and FGB with strong impact on the intraindividual variability of fibrinogen during the follow-up period. Conclusions: We identified common SNPs and haplotypes on FGA/FGB genes, explaining the interindividual and intraindividual variability of fibrinogen levels, in patients with a history of myocardial infarction. We have also identified for the first time, SNPs/haplotypes on FGA/FGB whose effects on fibrinogen expression are modified by the underlying IL-6 levels. These findings may have an impact on risk stratification and the design of genetically guided therapeutic approaches in patients with advanced atherosclerosis. © 2008 American College of Cardiology Foundation.
BibTeX: @Article{Jacquemin2008941, author = {Jacquemin, B and Antoniades, C and Nyberg, F and Plana, E and M{\"{u}}ller, M and Greven, S and Salomaa, V and Sunyer, J and Bellander, T and Chalamandaris, A.-G and Pistelli, R and Koenig, W and Peters, A}, title = {Common Genetic Polymorphisms and Haplotypes of Fibrinogen Alpha, Beta, and Gamma Chains Affect Fibrinogen Levels and the Response to Proinflammatory Stimulation in Myocardial Infarction Survivors. The AIRGENE Study}, year = {2008}, abstract = {Objectives: This study was designed to investigate whether single nucleotide polymorphisms (SNPs) and haplotypes of the fibrinogen gene-cluster (fibrinogen chains alpha [FGA], beta [FGB], and gamma [FGG]) could explain the inter- and intraindividual variability of fibrinogen levels in patients with atherosclerosis. We also searched for genetic determinants affecting the responses of fibrinogen genes to proinflammatory stimulation. Background: The mechanisms regulating fibrinogen levels are not fully understood, and they are likely to be regulated by complex gene-environment interactions. Methods: In the AIRGENE study, 895 survivors of myocardial infarction from 5 European cities were followed prospectively for 6 to 8 months, and plasma fibrinogen, interleukin (IL)-6, and C-reactive protein levels were determined monthly. We analyzed 21 SNPs and the corresponding haplotypes in the 3 fibrinogen genes. Results: Eight SNPs in FGA and FGB were significantly associated with fibrinogen levels. Similarly, 2 different haplotypes in FGA and 3 in FGB were also associated with mean fibrinogen levels. The IL-6 levels had a significant impact on the associations between SNPs/haplotypes in FGA/FGB and fibrinogen levels. We also identified SNPs and haplotypes in FGA and FGB with strong impact on the intraindividual variability of fibrinogen during the follow-up period. Conclusions: We identified common SNPs and haplotypes on FGA/FGB genes, explaining the interindividual and intraindividual variability of fibrinogen levels, in patients with a history of myocardial infarction. We have also identified for the first time, SNPs/haplotypes on FGA/FGB whose effects on fibrinogen expression are modified by the underlying IL-6 levels. These findings may have an impact on risk stratification and the design of genetically guided therapeutic approaches in patients with advanced atherosclerosis. {\textcopyright} 2008 American College of Cardiology Foundation.}, journal = {Journal of the American College of Cardiology}, volume = {52}, number = {11}, pages = {941-952}, keywords = {C reactive protein; fibrinogen; fibrinogen alpha; fibrinogen beta; fibrinogen gamma; interleukin 6; unclassified drug, adult; aged; article; atherosclerosis; Europe; female; fibrinogen blood level; follow up; gene cluster; genetic analysis; genetic variability; haplotype; heart infarction; human; inflammation; major clinical study; male; priority journal; prospective study; regulatory mechanism; single nucleotide polymorphism; survivor, Aged; C-Reactive Protein; European Continental Ancestry Group; Fibrinogen; Genotype; Haplotypes; Humans; Interleukin-6; Male; Middle Aged; Myocardial Infarction; Polymorphism, Single Nucleotide}, doi = {10.1016/j.jacc.2008.06.016}, issn = {07351097} }
Khuseyinova, N., Greven, S., Rückerl, R., Trischler, G., Loewel, H., Peters, A., Koenig, W.	Variability of serial lipoprotein-associated phospholipase A2 measurements in post-myocardial infarction patients: Results from the AIRGENE Study Center Augsburg [Abstract] [BibTeX]	2008	Clinical Chemistry 54 (1):124-130	article	DOI
Abstract: BACKGROUND: Of the numerous emerging biomarkers for coronary heart disease (CHD), lipoprotein-associated phospholipase A2 (Lp-PLA2), an enzyme involved in lipid metabolism and inflammatory pathways, seems to be a promising candidate. Implementation of Lp-PLA2 measurement into clinical practice, however, requires data on the reliability of such measurements. METHODS: We measured Lp-PLA2 concentrations by ELISA in blood samples drawn from 200 post-myocardial infarction patients (39-76 years) at 6 monthly intervals between May 2003 and February 2004, for a total of 1143 samples. We estimated analytical, within-individual, and between-individual variation, the critical difference, and the intraclass correlation coefficient of reliability (ICC) to assess the reliability of serial Lp-PLA2 measurements. RESULTS: The mean (SD) plasma Lp-PLA2 concentration for the study participants was 188.7 (41.8) μg/L, with no significant difference between men and women. The analytical CV for Lp-PLA2 was 4.4%, the within-individual biological CV was 15%, and the between-individual CV was 22%. The ICC was 0.66. An important part of the total variation in plasma Lp-PLA 2 concentration was explained by the between-individual variation (as a percentage of the total variance, 66.1%), whereas the within-individual variance was 31.3%. The analytical variance was as low as 2.6%. CONCLUSIONS: Between-individual variation in Lp-PLA2 concentration was substantially greater than within-individual variation. In general, our data demonstrate considerable stability and good reproducibility of serial Lp-PLA2 measurements, results that compared favorably with those for the more commonly measured lipid markers. © 2007 American Association for Clinical Chemistry.
BibTeX: @Article{Khuseyinova2008124, author = {Khuseyinova, N and Greven, S and R{\"{u}}ckerl, R and Trischler, G and Loewel, H and Peters, A and Koenig, W}, title = {Variability of serial lipoprotein-associated phospholipase A2 measurements in post-myocardial infarction patients: Results from the AIRGENE Study Center Augsburg}, year = {2008}, abstract = {BACKGROUND: Of the numerous emerging biomarkers for coronary heart disease (CHD), lipoprotein-associated phospholipase A2 (Lp-PLA2), an enzyme involved in lipid metabolism and inflammatory pathways, seems to be a promising candidate. Implementation of Lp-PLA2 measurement into clinical practice, however, requires data on the reliability of such measurements. METHODS: We measured Lp-PLA2 concentrations by ELISA in blood samples drawn from 200 post-myocardial infarction patients (39-76 years) at 6 monthly intervals between May 2003 and February 2004, for a total of 1143 samples. We estimated analytical, within-individual, and between-individual variation, the critical difference, and the intraclass correlation coefficient of reliability (ICC) to assess the reliability of serial Lp-PLA2 measurements. RESULTS: The mean (SD) plasma Lp-PLA2 concentration for the study participants was 188.7 (41.8) $\mu$g/L, with no significant difference between men and women. The analytical CV for Lp-PLA2 was 4.4\%, the within-individual biological CV was 15\%, and the between-individual CV was 22\%. The ICC was 0.66. An important part of the total variation in plasma Lp-PLA 2 concentration was explained by the between-individual variation (as a percentage of the total variance, 66.1\%), whereas the within-individual variance was 31.3\%. The analytical variance was as low as 2.6\%. CONCLUSIONS: Between-individual variation in Lp-PLA2 concentration was substantially greater than within-individual variation. In general, our data demonstrate considerable stability and good reproducibility of serial Lp-PLA2 measurements, results that compared favorably with those for the more commonly measured lipid markers. {\textcopyright} 2007 American Association for Clinical Chemistry.}, journal = {Clinical Chemistry}, volume = {54}, number = {1}, pages = {124-130}, keywords = {1 alkyl 2 acetylglycerophosphocholine esterase; lipid, adult; aged; analysis of variance; analytic method; article; blood sampling; controlled study; correlation coefficient; enzyme linked immunosorbent assay; female; heart infarction; human; major clinical study; male; protein variant; reliability; reproducibility, 1-Alkyl-2-acetylglycerophosphocholine Esterase; Adult; Aged; Case-Control Studies; Cohort Studies; Enzyme-Linked Immunosorbent Assay; Female; Humans; Male; Middle Aged; Myocardial Infarction; Reproducibility of Results}, doi = {10.1373/clinchem.2007.093468}, issn = {00099147} }
Kolz, M., Koenig, W., Müller, M., Andreani, M., Greven, S., Illig, T., Khuseyinova, N., Panagiotakos, D., Pershagen, G., Salomaa, V., Sunyer, J., Peters, A.	DNA variants, plasma levels and variability of C-reactive protein in myocardial infarction survivors: Results from the AIRGENE study [Abstract] [BibTeX]	2008	European Heart Journal 29 (10):1250-1258	article	DOI
Abstract: Aims: C-reactive protein represents the classical acute-phase protein produced in the liver in response to inflammatory stimuli. This study evaluated the association of gene polymorphisms with differences in C-reactive protein concentrations and assessed its intra-individual variability as a marker of individual response. Methods and results: One thousand and three myocardial infarction (MI) survivors were recruited in six European cities, and C-reactive protein concentrations were measured repeatedly during a 6-month period. We investigated 114 polymorphisms in 13 genes, all involved in the innate inflammatory pathway. We found two polymorphisms within the C-reactive protein (CRP) gene rs1800947 and rs1205, of which the minor alleles were strongly associated with lower levels of C-reactive protein (P < 10-6). A haplotype, identified by those two polymorphisms, was associated with the lowest C-reactive protein concentrations (P < 10-6). Additionally, the minor alleles of several variants were significantly associated with greater individual variability of C-reactive protein concentrations (P < 10 -3). Conclusion: The present study investigated the association of polymorphisms with inter- and intra-individual variability of C-reactive protein levels. Two minor alleles of C-reactive protein variants were associated with lower C-reactive protein concentrations. Regarding intra-individual variability, we observed associations with the minor alleles of several variants in selected candidate genes, including the CRP gene itself. © The Author 2007.
BibTeX: @Article{Kolz20081250, author = {Kolz, M and Koenig, W and M{\"{u}}ller, M and Andreani, M and Greven, S and Illig, T and Khuseyinova, N and Panagiotakos, D and Pershagen, G and Salomaa, V and Sunyer, J and Peters, A}, title = {DNA variants, plasma levels and variability of C-reactive protein in myocardial infarction survivors: Results from the AIRGENE study}, year = {2008}, abstract = {Aims: C-reactive protein represents the classical acute-phase protein produced in the liver in response to inflammatory stimuli. This study evaluated the association of gene polymorphisms with differences in C-reactive protein concentrations and assessed its intra-individual variability as a marker of individual response. Methods and results: One thousand and three myocardial infarction (MI) survivors were recruited in six European cities, and C-reactive protein concentrations were measured repeatedly during a 6-month period. We investigated 114 polymorphisms in 13 genes, all involved in the innate inflammatory pathway. We found two polymorphisms within the C-reactive protein (CRP) gene rs1800947 and rs1205, of which the minor alleles were strongly associated with lower levels of C-reactive protein (P \< 10-6). A haplotype, identified by those two polymorphisms, was associated with the lowest C-reactive protein concentrations (P \< 10-6). Additionally, the minor alleles of several variants were significantly associated with greater individual variability of C-reactive protein concentrations (P \< 10 -3). Conclusion: The present study investigated the association of polymorphisms with inter- and intra-individual variability of C-reactive protein levels. Two minor alleles of C-reactive protein variants were associated with lower C-reactive protein concentrations. Regarding intra-individual variability, we observed associations with the minor alleles of several variants in selected candidate genes, including the CRP gene itself. {\textcopyright} The Author 2007.}, journal = {European Heart Journal}, volume = {29}, number = {10}, pages = {1250-1258}, keywords = {C reactive protein; DNA, adult; aged; allele; article; c reactive protein gene; DNA polymorphism; female; gene; heart infarction; human; major clinical study; male; priority journal, Aged; C-Reactive Protein; DNA; Female; Genotype; Humans; Male; Middle Aged; Myocardial Infarction; Polymorphism, Single Nucleotide}, doi = {10.1093/eurheartj/ehm442}, issn = {0195668X} }
Scheipl, F., Greven, S., Küchenhoff, H.	Size and power of tests for a zero random effect variance or polynomial regression in additive and linear mixed models [Abstract] [BibTeX]	2008	Computational Statistics and Data Analysis 52 (7):3283-3299	article	DOI
Abstract: Several tests for a zero random effect variance in linear mixed models are compared. This testing problem is non-regular because the tested parameter is on the boundary of the parameter space. Size and power of the different tests are investigated in an extensive simulation study that covers a variety of important settings. These include testing for polynomial regression versus a general smooth alternative using penalized splines. Among the test procedures considered, three are based on the restricted likelihood ratio test statistic (RLRT), while six are different extensions of the linear model F-test to the linear mixed model. Four of the tests with unknown null distributions are based on a parametric bootstrap, the other tests rely on approximate or asymptotic distributions. The parametric bootstrap-based tests all have a similar performance. Tests based on approximate F-distributions are usually the least powerful among the tests under consideration. The chi-square mixture approximation for the RLRT is confirmed to be conservative, with corresponding loss in power. A recently developed approximation to the distribution of the RLRT is identified as a rapid, powerful and reliable alternative to computationally intensive parametric bootstrap procedures. This novel method extends the exact distribution available for models with one random effect to models with several random effects. © 2007 Elsevier Ltd. All rights reserved.
BibTeX: @Article{Scheipl20083283, author = {Scheipl, F and Greven, S and K{\"{u}}chenhoff, H}, title = {Size and power of tests for a zero random effect variance or polynomial regression in additive and linear mixed models}, year = {2008}, abstract = {Several tests for a zero random effect variance in linear mixed models are compared. This testing problem is non-regular because the tested parameter is on the boundary of the parameter space. Size and power of the different tests are investigated in an extensive simulation study that covers a variety of important settings. These include testing for polynomial regression versus a general smooth alternative using penalized splines. Among the test procedures considered, three are based on the restricted likelihood ratio test statistic (RLRT), while six are different extensions of the linear model F-test to the linear mixed model. Four of the tests with unknown null distributions are based on a parametric bootstrap, the other tests rely on approximate or asymptotic distributions. The parametric bootstrap-based tests all have a similar performance. Tests based on approximate F-distributions are usually the least powerful among the tests under consideration. The chi-square mixture approximation for the RLRT is confirmed to be conservative, with corresponding loss in power. A recently developed approximation to the distribution of the RLRT is identified as a rapid, powerful and reliable alternative to computationally intensive parametric bootstrap procedures. This novel method extends the exact distribution available for models with one random effect to models with several random effects. {\textcopyright} 2007 Elsevier Ltd. All rights reserved.}, journal = {Computational Statistics and Data Analysis}, volume = {52}, number = {7}, pages = {3283-3299}, keywords = {Asymptotic analysis; Parameter estimation; Polynomials; Regression analysis; Stochastic models, Null distributions; Polynomial regressions; Tested parameters; Zero random effects, Statistical tests}, doi = {10.1016/j.csda.2007.10.022}, issn = {01679473} }
Wilbert-Lampen, U., Leistner, D., Greven, S., Pohl, T., Sper, S., Völker, C., Güthlin, D., Plasse, A., Knez, A., Küchenhoff, H., Steinbeck, G.	Cardiovascular events during World Cup Soccer [Abstract] [BibTeX]	2008	New England Journal of Medicine 358 (5):475-483	article	DOI
Abstract: BACKGROUND: The Fédération Internationale de Football Association (FIFA) World Cup, held in Germany from June 9 to July 9, 2006, provided an opportunity to examine the relation between emotional stress and the incidence of cardiovascular events. METHODS: Cardiovascular events occurring in patients in the greater Munich area were prospectively assessed by emergency physicians during the World Cup. We compared those events with events that occurred during the control period: May 1 to June 8 and July 10 to July 31, 2006, and May 1 to July 31 in 2003 and 2005. RESULTS: Acute cardiovascular events were assessed in 4279 patients. On days of matches involving the German team, the incidence of cardiac emergencies was 2.66 times that during the control period (95% confidence interval [CI], 2.33 to 3.04; P<0.001); for men, the incidence was 3.26 times that during the control period (95% CI, 2.78 to 3.84; P<0.001), and for women, it was 1.82 times that during the control period (95% CI, 1.44 to 2.31; P<0.001). Among patients with coronary events on days when the German team played, the proportion with known coronary heart disease was 47.0%, as compared with 29.1% of patients with events during the control period. On those days, the highest average incidence of events was observed during the first 2 hours after the beginning of each match. A subanalysis of serious events during that period, as compared with the control period, showed an increase in the incidence of myocardial infarction with ST-segment elevation by a factor of 2.49 (95% CI, 1.47 to 4.23), of myocardial infarction without ST-segment elevation or unstable angina by a factor of 2.61 (95% CI, 2.22 to 3.08), and of cardiac arrhythmia causing major symptoms by a factor of 3.07 (95% CI, 2.32 to 4.06) (P<0.001 for all comparisons). CONCLUSIONS: Viewing a stressful soccer match more than doubles the risk of an acute cardiovascular event. In view of this excess risk, particularly in men with known coronary heart disease, preventive measures are urgently needed. Copyright © 2008 Massachusetts Medical Society.
BibTeX: @Article{WilbertLampen2008475, author = {Wilbert-Lampen, U and Leistner, D and Greven, S and Pohl, T and Sper, S and V{\"{o}}lker, C and G{\"{u}}thlin, D and Plasse, A and Knez, A and K{\"{u}}chenhoff, H and Steinbeck, G}, title = {Cardiovascular events during World Cup Soccer}, year = {2008}, abstract = {BACKGROUND: The F{\'{e}}d{\'{e}}ration Internationale de Football Association (FIFA) World Cup, held in Germany from June 9 to July 9, 2006, provided an opportunity to examine the relation between emotional stress and the incidence of cardiovascular events. METHODS: Cardiovascular events occurring in patients in the greater Munich area were prospectively assessed by emergency physicians during the World Cup. We compared those events with events that occurred during the control period: May 1 to June 8 and July 10 to July 31, 2006, and May 1 to July 31 in 2003 and 2005. RESULTS: Acute cardiovascular events were assessed in 4279 patients. On days of matches involving the German team, the incidence of cardiac emergencies was 2.66 times that during the control period (95\% confidence interval [CI], 2.33 to 3.04; P<0.001); for men, the incidence was 3.26 times that during the control period (95\% CI, 2.78 to 3.84; P<0.001), and for women, it was 1.82 times that during the control period (95\% CI, 1.44 to 2.31; P<0.001). Among patients with coronary events on days when the German team played, the proportion with known coronary heart disease was 47.0\%, as compared with 29.1\% of patients with events during the control period. On those days, the highest average incidence of events was observed during the first 2 hours after the beginning of each match. A subanalysis of serious events during that period, as compared with the control period, showed an increase in the incidence of myocardial infarction with ST-segment elevation by a factor of 2.49 (95\% CI, 1.47 to 4.23), of myocardial infarction without ST-segment elevation or unstable angina by a factor of 2.61 (95\% CI, 2.22 to 3.08), and of cardiac arrhythmia causing major symptoms by a factor of 3.07 (95\% CI, 2.32 to 4.06) (P<0.001 for all comparisons). CONCLUSIONS: Viewing a stressful soccer match more than doubles the risk of an acute cardiovascular event. In view of this excess risk, particularly in men with known coronary heart disease, preventive measures are urgently needed. Copyright {\textcopyright} 2008 Massachusetts Medical Society.}, journal = {New England Journal of Medicine}, volume = {358}, number = {5}, pages = {475-483}, keywords = {article; emotional stress; female; heart arrhythmia; heart infarction; human; incidence; ischemic heart disease; major clinical study; male; patient assessment; priority journal; sporting event; ST segment elevation; unstable angina pectoris}, doi = {10.1056/NEJMoa0707427}, issn = {00284793} }
Peters, A., Schneider, A., Greven, S., Bellander, T., Forastiere, F., Ibald-Mulli, A., Illig, T., Jacquemin, B., Katsouyanni, K., Koenig, W., Lanki, T., Pekkanen, J., Pershagen, G., Picciotto, S., Rückerl, R., Rosario, A., Stefanadis, C., Sunyer, J.	Air pollution and inflammatory response in myocardial infarction survivors: Gene-environment interactions in a high-risk group [Abstract] [BibTeX]	2007	Inhalation Toxicology 19 (SUPPL. 1):161-175	article	DOI
Abstract: Ambient air pollution has been associated with an increased risk of hospital admission and mortality in potentially susceptible subpopulations, including myocardial infarction (MI) survivors. The multicenter epidemiological study described in this report was set up to study the role of air pollution in eliciting inflammation in MI survivors in six European cities, Helsinki, Stockholm, Augsburg, Rome, Barcelona, and Athens. Outcomes of interest are plasma concentrations of the proinflammatory cytokine interleukin 6 (IL-6) and the acute-phase proteins C-reactive protein (CRP) and fibrinogen. In addition, the study was designed to assess the role of candidate gene polymorphisms hypothesized to lead to a modification of the short-term effects of ambient air pollution. In total, 1003 MI survivors were recruited and assessed with at least 2 repeated clinic visits without any signs of infections. In total, 5813 blood samples were collected, equivalent to an average of 5.8 repeated clinic visits per subject (97% of the scheduled 6 repeated visits). Subjects across the six cities varied with respect to risk factor profiles. Most of the subjects were nonsmokers, but light smokers were included in Rome, Barcelona, and Athens. Substantial inter- and intraindividual variability was observed for IL-6 and CRP. The study will permit assessing the role of cardiovascular disease risk factors, including ambient air pollution and genetic polymorphisms in candidate genes, in determining the inter- and the intraindividual variability in plasma IL-6, CRP, and fibrinogen concentrations in MI survivors. Copyright © Informa Healthcare USA, Inc.
BibTeX: @Article{Peters2007161, author = {Peters, A and Schneider, A and Greven, S and Bellander, T and Forastiere, F and Ibald-Mulli, A and Illig, T and Jacquemin, B and Katsouyanni, K and Koenig, W and Lanki, T and Pekkanen, J and Pershagen, G and Picciotto, S and R{\"{u}}ckerl, R and Rosario, A.S and Stefanadis, C and Sunyer, J}, title = {Air pollution and inflammatory response in myocardial infarction survivors: Gene-environment interactions in a high-risk group}, year = {2007}, abstract = {Ambient air pollution has been associated with an increased risk of hospital admission and mortality in potentially susceptible subpopulations, including myocardial infarction (MI) survivors. The multicenter epidemiological study described in this report was set up to study the role of air pollution in eliciting inflammation in MI survivors in six European cities, Helsinki, Stockholm, Augsburg, Rome, Barcelona, and Athens. Outcomes of interest are plasma concentrations of the proinflammatory cytokine interleukin 6 (IL-6) and the acute-phase proteins C-reactive protein (CRP) and fibrinogen. In addition, the study was designed to assess the role of candidate gene polymorphisms hypothesized to lead to a modification of the short-term effects of ambient air pollution. In total, 1003 MI survivors were recruited and assessed with at least 2 repeated clinic visits without any signs of infections. In total, 5813 blood samples were collected, equivalent to an average of 5.8 repeated clinic visits per subject (97\% of the scheduled 6 repeated visits). Subjects across the six cities varied with respect to risk factor profiles. Most of the subjects were nonsmokers, but light smokers were included in Rome, Barcelona, and Athens. Substantial inter- and intraindividual variability was observed for IL-6 and CRP. The study will permit assessing the role of cardiovascular disease risk factors, including ambient air pollution and genetic polymorphisms in candidate genes, in determining the inter- and the intraindividual variability in plasma IL-6, CRP, and fibrinogen concentrations in MI survivors. Copyright {\textcopyright} Informa Healthcare USA, Inc.}, journal = {Inhalation Toxicology}, volume = {19}, number = {SUPPL. 1}, pages = {161-175}, keywords = {C reactive protein; fibrinogen; interleukin 6, adult; air pollution; ambient air; blood sampling; cardiovascular risk; conference paper; controlled study; DNA modification; Europe; female; gene interaction; genetic polymorphism; heart infarction; high risk population; human; inflammation; major clinical study; male; outcome assessment; patient assessment; priority journal; protein blood level; single nucleotide polymorphism; smoking; survivor, Adult; Aged; Aged, 80 and over; Air Pollutants; Air Pollution; Cohort Studies; Environmental Exposure; Female; Gene Expression Regulation; Genotype; Humans; Inflammation; Longitudinal Studies; Male; Middle Aged; Myocardial Infarction; Risk Factors; Survivors}, doi = {10.1080/08958370701496129}, issn = {08958378} }
Rückerl, R., Greven, S., Ljungman, P., Aalto, P., Antoniades, C., Bellander, T., Berglind, N., Chrysohoou, C., Forastiere, F., Jacquemin, B., von Klot, S., Koenig, W., Küchenhoff, H., Lanki, T., Pekkanen, J., Perucci, C., Schneider, A., Sunyer, J., Peters, A.	Air pollution and inflammation (Interleukin-6, C-reactive protein, fibrinogen) in myocardial infarction survivors [Abstract] [BibTeX]	2007	Environmental Health Perspectives 115 (7):1072-1080	article	DOI
Abstract: Background: Numerous studies have found that ambient air pollution has been associated with cardiovascular disease exacerbation. Objectives: Given previous findings, we hypothesized that particulate air pollution might induce systemic inflammation in myocardial infarction (MI) survivors, contributing to an increased vulnerability to elevated concentrations of ambient particles. Methods: A prospective longitudinal study of 1,003 MI survivors was performed in six European cities between May 2003 and July 2004. We compared repeated measurements of interleukin 6 (IL-6), fibrinogen, and C-reactive protein (CRP) with concurrent levels of air pollution. We collected hourly data on particle number concentrations (PNC), mass concentrations of particulate matter (PM) < 10 μm (PM10) and < 2.5 μm (PM2.5), gaseous pollutants, and meteorologic data at central monitoring sites in each city. City-specific confounder models were built for each blood marker separately, adjusting for meteorology and time varying and time invariant covariates. Data were analaysed with mixed-effects models. Results: Pooled rsults show an increase in IL-6 when cncentrations of PNC were elevated 12-17 hr before blood withdrawal [percent change of geometric mean, 2.7; 95% cionfidence interval (CI), 1.0-4.6). Five day cumulative exposure to PM10 was associated with increasded fibrinogen concentrations (percent change of arithmentic mean, 0.6; 95% CI, 0.1-1). Results remained stable for smokers, diabetics, and patients with heart failure. No consistent associations were found for CRP. Conclusions: Results indicate an immediate respnse to PNC on the IL-6 level, possibly leading to the production of acute-phase proteins, as seen in increased fibrinogen levels. This might provide a link between air pollution and adverse cardiac events.
BibTeX: @Article{Rckerl20071072, author = {R{\"{u}}ckerl, R and Greven, S and Ljungman, P and Aalto, P and Antoniades, C and Bellander, T and Berglind, N and Chrysohoou, C and Forastiere, F and Jacquemin, B and von Klot, S and Koenig, W and K{\"{u}}chenhoff, H and Lanki, T and Pekkanen, J and Perucci, C.A and Schneider, A and Sunyer, J and Peters, A}, title = {Air pollution and inflammation (Interleukin-6, C-reactive protein, fibrinogen) in myocardial infarction survivors}, year = {2007}, doi = {10.1289/ehp.10021}, abstract = {Background: Numerous studies have found that ambient air pollution has been associated with cardiovascular disease exacerbation. Objectives: Given previous findings, we hypothesized that particulate air pollution might induce systemic inflammation in myocardial infarction (MI) survivors, contributing to an increased vulnerability to elevated concentrations of ambient particles. Methods: A prospective longitudinal study of 1,003 MI survivors was performed in six European cities between May 2003 and July 2004. We compared repeated measurements of interleukin 6 (IL-6), fibrinogen, and C-reactive protein (CRP) with concurrent levels of air pollution. We collected hourly data on particle number concentrations (PNC), mass concentrations of particulate matter (PM) \< 10 $\mu$m (PM10) and \< 2.5 $\mu$m (PM2.5), gaseous pollutants, and meteorologic data at central monitoring sites in each city. City-specific confounder models were built for each blood marker separately, adjusting for meteorology and time varying and time invariant covariates. Data were analaysed with mixed-effects models. Results: Pooled rsults show an increase in IL-6 when cncentrations of PNC were elevated 12-17 hr before blood withdrawal [percent change of geometric mean, 2.7; 95\% cionfidence interval (CI), 1.0-4.6). Five day cumulative exposure to PM10 was associated with increasded fibrinogen concentrations (percent change of arithmentic mean, 0.6; 95\% CI, 0.1-1). Results remained stable for smokers, diabetics, and patients with heart failure. No consistent associations were found for CRP. Conclusions: Results indicate an immediate respnse to PNC on the IL-6 level, possibly leading to the production of acute-phase proteins, as seen in increased fibrinogen levels. This might provide a link between air pollution and adverse cardiac events.}, journal = {Environmental Health Perspectives}, volume = {115}, number = {7}, pages = {1072-1080}, keywords = {acute phase protein; C reactive protein; fibrinogen; interleukin 6, adult; aged; air pollution; ambient air; article; cigarette smoking; confidence interval; controlled study; diabetes mellitus; disease association; disease exacerbation; exhaust gas; female; fibrinogen blood level; Finland; Germany; Greece; heart failure; heart infarction; human; hypothesis; inflammation; Italy; longitudinal study; major clinical study; male; meteorological phenomena; particulate matter; pollution monitoring; priority journal; protein blood level; Spain; survivor; Sweden; time series analysis, Air Pollution; C-Reactive Protein; Fibrinogen; Humans; Inflammation; Interleukin-6; Myocardial Infarction}, doi = {10.1289/ehp.10021}, issn = {00916765} }
Greven, S., Bailer, A., Kupper, L., Muller, K., Craft, J.	A parametric model for studying organism fitness using step-stress experiments [Abstract] [BibTeX]	2004	Biometrics 60 (3):793-799	article	DOI
Abstract: We propose a method based on parametric survival analysis to analyze step-stress data. Step-stress studies are failure time studies in which the experimental stressor is increased at specified time intervals. While this protocol has been frequently employed in industrial reliability studies, it is less common in the life sciences. Possible biological applications include experiments on swimming performance of fish using a step function defining increasing water velocity over time, and treadmill tests on humans. A likelihood-ratio test is developed for comparing the failure times in two groups based on a piecewise constant hazard assumption. The test can be extended to other piecewise distributions and to include covariates. An example data set is used to illustrate the method and highlight experimental design issues. A small simulation study compares this analysis procedure to currently used methods with regard to type I error rate and power.
BibTeX: @Article{Greven2004793, author = {Greven, S and Bailer, A.J and Kupper, L.L and Muller, K.E and Craft, J.L}, title = {A parametric model for studying organism fitness using step-stress experiments}, year = {2004}, abstract = {We propose a method based on parametric survival analysis to analyze step-stress data. Step-stress studies are failure time studies in which the experimental stressor is increased at specified time intervals. While this protocol has been frequently employed in industrial reliability studies, it is less common in the life sciences. Possible biological applications include experiments on swimming performance of fish using a step function defining increasing water velocity over time, and treadmill tests on humans. A likelihood-ratio test is developed for comparing the failure times in two groups based on a piecewise constant hazard assumption. The test can be extended to other piecewise distributions and to include covariates. An example data set is used to illustrate the method and highlight experimental design issues. A small simulation study compares this analysis procedure to currently used methods with regard to type I error rate and power.}, journal = {Biometrics}, volume = {60}, number = {3}, pages = {793-799}, keywords = {hazard; modeling; survival, biometry; environmental impact assessment; fish; nonhuman; review; risk assessment; swimming; water contamination, Analysis of Variance; Animals; Biometry; Exercise Test; Fishes; Humans; Likelihood Functions; Models, Statistical; Physical Fitness; Proportional Hazards Models; Survival Analysis; Swimming; Water Pollution}, doi = {10.1111/j.0006-341X.2004.00230.x}, issn = {0006341X} }