The 2022 Northern European Stata Conference
Oslo, Norway, Wednesday October 12, 2022
The 2022 Northern European Stata Conference was held in Oslo, Norway at the Oslo Cancer Cluster Innovation Park on Wednesday the 12th of October 2022. The conference contained many interesting presentations, and below you find the program, the abstracts, and the PowerPoint presentations.
09:00–09:05 A welcome address by Bjarte Aagnes, Cancer Registry of Norway
Chair: Ronnie Babigumira, Cancer Registry of Norway.
09:05–09.30 Frida Lundberg, Karolinska Institutet: Application of stpm2 to estimate relative survival for cancer patients in the Nordic countries
In this presentation, I will describe the benefits and challenges with comparing population-based survival across the Nordic countries using the relative survival framework. I used the NORDCAN database including patients diagnosed with cancer 1990 to 2016 from Denmark, Finland, Iceland, Norway and Sweden. We adopted a model-based approach using flexible parametric survival models and compared to non-parametric estimates. The commands stpm2 and standsurv were used for obtaining parametric estimates, and strs for non-parametric estimates. I will discuss issues such as age-standardization, model stability, winsorizing and conditional survival. Presentation
09:30–10:15 Paul Lambert, University of Leicester and Karolinska Institutet: Improving fitting and predictions for flexible parametric survival models
Flexible parametric survival models have been available in Stata since 2000 with Patrick Royston’s stpm command. I developed stpm2 in 2008, which added various extensions. However, the command is old and does not take advantage of some of the features Stata has added over the years. I will introduce stpm3, which has been completely rewritten and adds a number of useful features, including:
- Full support for factor variables (including for time-dependent effects).
- Use of extended functions within a varlist. Incorporate various functions (splines, fractional polynomial functions, etc.) directly within a varlist. These also work when including interactions and time-dependent effects.
- Easier and more intuitive predictions. These fully synchronize with the extended functions making predictions for complex models with multiple interactions/nonlinear effects incredibly simple. Make predictions for specific covariate patterns and perform various types of contrasts.
- Directly save predictions to one or more frames. This separates the data used to analyze the data for predictions.
- Obtain various marginal estimates using standsurv. This synchronizes with stpm3 factor variables and extended functions, making marginal estimates much easier and less prone to user mistakes for complex models.
- Model on the log(hazard) scale. Do all the above for standard survival models, competing-risks models, multistate models, and relative survival models all within the same framework.
10:15–10:40 Caroline Weibull, Karolinska Institutet · War on Cancer: Survival by first-line treatment type and timing of progression among follicular lymphoma patients
In follicular lymphoma (FL), progression of disease within 24 months has emerged as a popular prognostic marker for overall survival (OS). While it has considerable clinical relevance, there are also inherent limitations in relation to the fixed time point of 24 months, and potential variation by treatment type and choice of comparison group. In this talk, I will highlight some of the methodical limitations, and present the first results from a large population-based cohort of FL patients. National register-based information has been combined with detailed medical record data to create a unique cohort with detailed treatment and follow-up information. We allow progression to be time-varying and estimate relative rates, as well as OS by first-line treatment and timing of progression using an illness-death modelling approach. Stata packages merlin and multistate were applied, and example code will be presented. Our findings show that progression is associated with worse survival beyond the 24-month time point, illustrating the need for individualized management by timing of progression for optimal care of patients with FL.
Based on joint work by Weibull CE, Wästerlid T, Wahling BE, Andersson PO, Ekberg S, Lockmer S, Enblad G, Crowther MJ, Kimby E, Smedby KE. Presentation
11:00–11.15 Enoch Yi-Tung Chen, Karolinska Institutet: The Devil Is In The Details … And The Data – Tutorial On Preparing Data for Multi-state Modelling
Data preparation for multi-state models is a foundation prior to estimation using parametric models or non-parametric approaches. In Stata, for example, the multistate package’s command -- msset, is a data preparation tool to transform data in wide format into long format. However, msset is not optimal for multi-state model settings with reversible transitions, i.e., transitions which allow recovery from one state to another. In this case, correctly defining each transition’s risk time and event (status) without using a program may be preferred. This presentation aims to guide users to prepare data for a reversible multi-state model from wide format to long format without using a data preparation command and will provide hand-in-hand tutorials with an example data. Presentation
11:15–11:40 Niels Henrik Bruun, Aalborg University Hospital: Establishing upper reference limits for left-censored and contaminated data
When establishing reference interval limits, measurements can sometimes be characterized by either A) being left-censored or B) being contaminated in the upper end. Although solutions to both characteristics have been described separately, no one to our knowledge has been handling the case when both characteristics are present. Left-censored data (A) are often wrongfully handled simply by using limit-of-detection (LOD), which leads to high mean estimates and too low standard deviation estimates and hence incorrect cut-offs. Ignoring the characteristic (B), researchers often use transformations to handle the observed Non-Gaussianity, which leads to too high cut-offs and an increased proportion of false negatives (type 2 error). We propose a method based on normal quantile plots and OLS regressions to find the upper limit of a reference interval for measurements in a case characterized by A) and B). We also demonstrate, how our method can be used to identify whether B) is present in a dataset. We demonstrate our proposed method using real data in two cases.
Based on joint work by Niels Henrik Bruun, Stine Linding Andersen, Nanna Maria Uldall Torp, and Peter Astrup Christensen. Presentation
Chair: Tor Åge Myklebust, Cancer Registry of Norway.
13:15–13:40 Nicolai T. Borgen, University of Oslo: Flexible and fast estimation of quantile treatment effects: The rqr and rqrplot commands.
Using quantile regression models to estimate quantile treatment effects is becoming increasingly popular. This paper introduces the rqr command that can be used to estimate residualized quantile regression (RQR) coefficients and the rqrplot postestimation command that can be used to effortless plot the coefficients. The main advantages of the rqr command compared to other Stata commands that estimate (unconditional) quantile treatment effects are that it can include high-dimensional fixed effects and that it is considerably faster than the other commands. Presentation
13:40–14:05 Nicola Orsini, Karolinska Institutet: Visualisations of marginal and conditional quantiles based on weighted mixed effects models
Dose-response meta-analysis is widely used in a variety of fields to answer research questions based on multiple studies. A challenge in such applications is presenting the magnitude of uncertainty emerging from the data in light of the assumed statistical model.
Aim of this talk is to illustrate a visualisation tool that follows the command -drmeta- to graph marginal and conditional quantiles of the predicted dose-response relationships based on weighted mixed-effects models estimated on tables of aggregated data.
The developed post-estimation command works with different study designs, dose transformations, and outcome measures; it allows the investigator to derive any quantile (0.01 to 0.99) of the point-wise dose-response relationship; it allows the investigator to define a fine grid of dose values and to choose a referent; it shades quantiles to help distinguishing common vs extreme quantiles; it allows the user to overlay the study-specific BLUPs; it returns both static images for research articles and interactive html visualizations for web dissemination; it is based on Plotly Graphing Library taking advantage of the Stata/Python integration.
Real and simulated data will be used to illustrate the use of the post-estimation command. Presentation
14:05–15:05 Enrique Pinzon, StataCorp: Econometrics strikes back: GMM and two-way fixed effect
Two-way fixed effects is not a broken methodology. As Wooldridge (2021) shows, the estimator can be used to obtain heterogeneous treatment effects. I illustrate how to obtain these treatment effects using GMM. Additionally, I show how some other proposed estimators for heterogeneous treatment effects can be fit using GMM. Presentation
Coffee break 15:05–15:20
15:20–15:45 Mustafa Coban, Institute for Employment Research (IAB): Recursive bivariate copula estimation and decomposition of marginal effects
This article describes a new Stata command -rbicopula- for fitting copula-based maximum-likelihood estimation of recursive bivariate models that enable a flexible residual distribution and differ from bivariate copula or probit models in allowing the first dependent variable to appear on the right-hand side of the second dependent variable. The new command provides various copulas allowing the user to choose a copula which best captures the dependence features of the data caused by the presence of common unobserved heterogeneity. Although the estimation of model parameters does not differ from the bivariate case, the existing user-written command -bicop- does not consider the structural model's recursive nature for predictions and doesn't enable -margins- as a postestimation command. -rbicopula- estimates the model parameters, computes treatment effects of the first dependent variable and gives the marginal effects of independent variables. In addition, marginal effects can be decomposed into direct and indirect effects if covariates appear in both equations. Moreover, the postestimation commands incorporate two goodness-of-fit tests. Dependent variables of the recursive bivariate model may be binary, ordinal, or a mixture of both. I present and explain the -rbicopula- command and the available postestimation commands using simulated data and data from the Stata website. Presentation
15:45–16:10 Jan Ditzen, Free University of Bozen-Bolzano: Illuminating the factor and dependence structure in large panel models
In panel models a precise understanding about the number of common factors and dependence across the cross-sectional dimension is key for any applied work. This talk will give an overview about how to estimate the number of common factors and how to test for cross-sectional dependence. It does so by presenting two community contribute commands: xtnumfac and xtcd2. xtnumfac implements 10 different methods to estimate the number of factors, among them the popular methods by Bai & Ng (2002) and Ahn & Horenstein (2013). The degree of cross-section dependence is investigated using xtcd2. xtcd2 allows implements three different tests for cross-section dependence, based on Pesaran (2015), Juodis & Reese (2021) and Pesaran & Xie (2021). The talk includes a review of the theory, a discussion of the commands and empirical examples. Presentation
16:10–16:35 Ben Jann, University of Bern: sttex - a new dynamic document command for Stata and LaTeX
In this talk, I will introduce a new command for processing a dynamic LaTeX document in Stata, i.e., a document containing both LaTeX paragraphs and Stata code. A key feature of the new command is that it tracks changes in the Stata code and executes the code only when needed, allowing for an efficient workflow. The command is useful for creating automated statistical reports, writing articles with data analysis, preparing slides for a methods course or a conference talk, or even writing a complete textbook with examples of applications. Presentation
Coffee break 16:35–16:50
16:50–17:15 Anna Johansson, Karolinska Institutet and Cancer Registry of Norway: Estimating adjusted absolute risks in a cross-sectional register-based study with logit and margins
In epidemiology we are often interested in quantifying effects of exposures not only on the relative scale (e.g. risk ratios and odds ratios), but also on the absolute scale (e.g. risks and risk differences). In a cross-sectional study using data from the Swedish Medical Birth Register, we used logit to estimate odds ratios of adverse obstetric outcomes for different exposures. Adjusted absolute risks and absolute risks differences were estimated using the post-estimation command margins. We will also discuss possibilities to extend these methods to matched cross-sectional data. Presentation
17:15-17:25 Bjarte Aagnes, Cancer Registry of Norway: Securing Stata in a secure environment. Data access and logging.
At CRN we develop a secure environment for using Stata. A short description of this work is given describing the data access and logging of data extraction (JDBC + Java plugins) and Stata commands. Presentation
17:25–18:00 Open panel discussion with Stata developers
- Kristin MacDonald, Executive director of statistical services
- Enrique Pinzon, Associate director of econometrics
Chair Tor Åge Myklebust, Ph.D., Statistician, Cancer Registry of Norway, Oslo, Norway.
Anna L.V. Johansson, Ph.D., Assistant Professor, Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden, and Cancer Registry of Norway, Oslo, Norway.
Arne Risa Hole Ph.D., Professor of Economics, Department of Economics, Universitat Jaume I, Spain.
Morten W. Fagerland Ph.D. Head of Section for Biostatistics, Epidemiology, and Health Economics at Oslo Centre for Biostatistics and Epidemiology, Oslo University Hospital, Norway.
Peter Hedström Ph.D., Professor of Analytical Sociology, Linköping University.
Committee email: StataConferenceOslo@kreftregisteret.no
Bjarte Aagnes, Cancer Registry of Norway.