VAT: Incl. excl.


The 2022 Northern European Stata Conference

Oslo, Norway, Wednesday October 12, 2022




The 2022 Northern European Stata Conference will be held in Oslo, Norway at the Oslo Cancer Cluster Innovation Park on Wednesday the 12th of October 2022. The conference will start at 09.00, with registration from 08.30, and end at 17.00 (CEST). 



09:00–09:05 A welcome address by Bjarte Aagnes, Cancer Registry of Norway


Session 1 

Chair: Ronnie Babigumira, Cancer Registry of Norway.

09:05–09.30  Frida Lundberg, Karolinska Institutet: Application of stpm2 to estimate relative survival for cancer patients in the Nordic countries

In this presentation, I will describe the benefits and challenges with comparing population-based survival across the Nordic countries using the relative survival framework. I used the NORDCAN database including patients diagnosed with cancer 1990 to 2016 from Denmark, Finland, Iceland, Norway and Sweden. We adopted a model-based approach using flexible parametric survival models and compared to non-parametric estimates. The commands stpm2 and standsurv were used for obtaining parametric estimates, and strs for non-parametric estimates. I will discuss issues such as age-standardization, model stability, winsorizing and conditional survival.


09:30–10:15  Paul Lambert, University of Leicester and Karolinska Institutet: Improving fitting and predictions for flexible parametric survival models

Flexible parametric survival models have been available in Stata since 2000 with Patrick Royston’s stpm command. I developed stpm2 in 2008, which added various extensions. However, the command is old and does not take advantage of some of the features Stata has added over the years. I will introduce stpm3, which has been completely rewritten and adds a number of useful features, including:
- Full support for factor variables (including for time-dependent effects).
- Use of extended functions within a varlist. Incorporate various functions (splines, fractional polynomial functions, etc.) directly within a varlist. These also work when including interactions and time-dependent effects.
- Easier and more intuitive predictions. These fully synchronize with the extended functions making predictions for complex models with multiple interactions/nonlinear effects incredibly simple. Make predictions for specific covariate patterns and perform various types of contrasts.
- Directly save predictions to one or more frames. This separates the data used to analyze the data for predictions.
- Obtain various marginal estimates using standsurv. This synchronizes with stpm3 factor variables and extended functions, making marginal estimates much easier and less prone to user mistakes for complex models.
- Model on the log(hazard) scale. Do all the above for standard survival models, competing-risks models, multistate models, and relative survival models all within the same framework.


10:15–10:40  Caroline Weibull, Karolinska Institutet · War on Cancer: Survival by first-line treatment type and timing of progression among follicular lymphoma patients

In follicular lymphoma (FL), progression of disease within 24 months has emerged as a popular prognostic marker for overall survival (OS). While it has considerable clinical relevance, there are also inherent limitations in relation to the fixed time point of 24 months, and potential variation by treatment type and choice of comparison group. In this talk, I will highlight some of the methodical limitations, and present the first results from a large population-based cohort of FL patients. National register-based information has been combined with detailed medical record data to create a unique cohort with detailed treatment and follow-up information. We allow progression to be time-varying and estimate relative rates, as well as OS by first-line treatment and timing of progression using an illness-death modelling approach. Stata packages merlin and multistate were applied, and example code will be presented. Our findings show that progression is associated with worse survival beyond the 24-month time point, illustrating the need for individualized management by timing of progression for optimal care of patients with FL.

Based on joint work by Weibull CE, Wästerlid T, Wahling BE, Andersson PO, Ekberg S, Lockmer S, Enblad G, Crowther MJ, Kimby E, Smedby KE.


Break 10:40–11:00


11:00–11.15  Enoch Yi-Tung Chen, Karolinska Institutet: The Devil Is In The Details … And The Data – Tutorial On Preparing Data for Multi-state Modelling

Data preparation for multi-state models is a foundation prior to estimation using parametric models or non-parametric approaches. In Stata, for example, the multistate package’s command -- msset, is a data preparation tool to transform data in wide format into long format. However, msset is not optimal for multi-state model settings with reversible transitions, i.e., transitions which allow recovery from one state to another. In this case, correctly defining each transition’s risk time and event (status) without using a program may be preferred. This presentation aims to guide users to prepare data for a reversible multi-state model from wide format to long format without using a data preparation command and will provide hand-in-hand tutorials with an example data.


11:15–11:40 Niels Henrik Bruun, Aalborg University Hospital: Establishing upper reference limits for left-censored and contaminated data

When establishing reference interval limits, measurements can sometimes be characterized by either A) being left-censored or B) being contaminated in the upper end. Although solutions to both characteristics have been described separately, no one to our knowledge has been handling the case when both characteristics are present. Left-censored data (A) are often wrongfully handled simply by using limit-of-detection (LOD), which leads to high mean estimates and too low standard deviation estimates and hence incorrect cut-offs. Ignoring the characteristic (B), researchers often use transformations to handle the observed Non-Gaussianity, which leads to too high cut-offs and an increased proportion of false negatives (type 2 error). We propose a method based on normal quantile plots and OLS regressions to find the upper limit of a reference interval for measurements in a case characterized by A) and B). We also demonstrate, how our method can be used to identify whether B) is present in a dataset. We demonstrate our proposed method using real data in two cases.
Based on joint work by Niels Henrik Bruun, Stine Linding Andersen, Nanna Maria Uldall Torp, and Peter Astrup Christensen.


11:40–11:55 Bjarte Aagnes, Cancer Registry of Norway: Securing Stata in a secure environment. Data access and logging

At Cancer Registry of Norway we currently develop a secure environment for using Stata. A short description of this work is given describing data access, and logging of data extraction (Java plugins+JDBC), and logging of Stata commands.


Lunch 12:00–13:00


Session 2

Chair: Tor Åge Myklebust, Cancer Registry of Norway.

13:00–13:25 Morten Wang Fagerland, Oslo University Hospital: Quantile regression in Stata: Performance, precision, and power

Quantile regression (command qreg) estimates quantiles of the outcome variable, conditional on the values of the independent variables, with median regression as the default form. Quantile regression can be used for several purposes: to estimate medians instead of means as a measure of central tendency—for instance, when data are markedly skewed; to estimate a particular quantile that may be of interest, such as the 10th quantile of birthweight to find predictors of low birthweight; or to study how the effects of independent variables vary over different quantiles of the dependent variable. Specifying the variance–covariance estimator for quantile regression is not straightforward. qreg offers both independent and identically distributed (i.i.d.) and robust estimators. The density estimation technique (DET) can be fitted, residual (i.i.d. only), or kernel. Three different bandwidth methods are available with the fitted and residual DETs, and eight kernel functions are available for the kernel DET. There is also a bootstrap option, which puts the total number of methods at 26. A natural question arises: which one to use? The aim of this presentation is to explore the performance of the methods and to arrive at some overall recommendations for which methods to use.

13:25–13:50 Nicola Orsini, Karolinska Institutet: Visualisations of marginal and conditional quantiles based on weighted mixed effects models

Dose-response meta-analysis is widely used in a variety of fields to answer research questions based on multiple studies. A challenge in such applications is presenting the magnitude of uncertainty emerging from the data in light of the assumed statistical model.
Aim of this talk is to illustrate a visualisation tool that follows the command -drmeta- to graph marginal and conditional quantiles of the predicted dose-response relationships based on weighted mixed-effects models estimated on tables of aggregated data.
The developed post-estimation command works with different study designs, dose transformations, and outcome measures; it allows the investigator to derive any quantile (0.01 to 0.99) of the point-wise dose-response relationship; it allows the investigator to define a fine grid of dose values and to choose a referent; it shades quantiles to help distinguishing common vs extreme quantiles; it allows the user to overlay the study-specific BLUPs; it returns both static images for research articles and interactive html visualizations for web dissemination; it is based on Plotly Graphing Library taking advantage of the Stata/Python integration.
Real and simulated data will be used to illustrate the use of the post-estimation command.


13:50–14:50 Enrique Pinzon, StataCorp: Econometrics strikes back: GMM and two-way fixed effect

Two-way fixed effects is not a broken methodology. As Wooldridge (2021) shows, the estimator can be used to obtain heterogeneous treatment effects. I illustrate how to obtain these treatment effects using GMM. Additionally, I show how some other proposed estimators for heterogeneous treatment effects can be fit using GMM.


Coffee break 14:55–15:15


15:15–15:40 Mustafa Coban, Institute for Employment Research (IAB): Recursive bivariate copula estimation and decomposition of marginal effects

This article describes a new Stata command -rbicopula- for fitting copula-based maximum-likelihood estimation of recursive bivariate models that enable a flexible residual distribution and differ from bivariate copula or probit models in allowing the first dependent variable to appear on the right-hand side of the second dependent variable. The new command provides various copulas allowing the user to choose a copula which best captures the dependence features of the data caused by the presence of common unobserved heterogeneity. Although the estimation of model parameters does not differ from the bivariate case, the existing user-written command -bicop- does not consider the structural model's recursive nature for predictions and doesn't enable -margins- as a postestimation command. -rbicopula- estimates the model parameters, computes treatment effects of the first dependent variable and gives the marginal effects of independent variables. In addition, marginal effects can be decomposed into direct and indirect effects if covariates appear in both equations. Moreover, the postestimation commands incorporate two goodness-of-fit tests. Dependent variables of the recursive bivariate model may be binary, ordinal, or a mixture of both. I present and explain the -rbicopula- command and the available postestimation commands using simulated data and data from the Stata website.


15:40–16:05 Jan Ditzen, Free University of Bozen-Bolzano: Illuminating the factor and dependence structure in large panel models

In panel models a precise understanding about the number of common factors and dependence across the cross-sectional dimension is key for any applied work. This talk will give an overview about how to estimate the number of common factors and how to test for cross-sectional dependence. It does so by presenting two community contribute commands: xtnumfac and xtcd2. xtnumfac implements 10 different methods to estimate the number of factors, among them the popular methods by Bai & Ng (2002) and Ahn & Horenstein (2013). The degree of cross-section dependence is investigated using xtcd2. xtcd2 allows implements three different tests for cross-section dependence, based on Pesaran (2015), Juodis & Reese (2021) and Pesaran & Xie (2021). The talk includes a review of the theory, a discussion of the commands and empirical examples.


16:05–16:30 Ben Jann, University of Bern: sttex - a new dynamic document command for Stata and LaTeX

In this talk, I will introduce a new command for processing a dynamic LaTeX document in Stata, i.e., a document containing both LaTeX paragraphs and Stata code. A key feature of the new command is that it tracks changes in the Stata code and executes the code only when needed, allowing for an efficient workflow. The command is useful for creating automated statistical reports, writing articles with data analysis, preparing slides for a methods course or a conference talk, or even writing a complete textbook with examples of applications.


16:30–16:55 Anna Johansson, Karolinska Institutet and Cancer Registry of Norway: Estimating adjusted absolute risks in a cross-sectional register-based study with logit and margins

In epidemiology we are often interested in quantifying effects of exposures not only on the relative scale (e.g. risk ratios and odds ratios), but also on the absolute scale (e.g. risks and risk differences). In a cross-sectional study using data from the Swedish Medical Birth Register, we used logit to estimate odds ratios of adverse obstetric outcomes for different exposures. Adjusted absolute risks and absolute risks differences were estimated using the post-estimation command margins. We will also discuss possibilities to extend these methods to matched cross-sectional data.


16:55–17:30 Open panel discussion with Stata developers
- Kristin MacDonald, Executive director of statistical services
- Enrique Pinzon, Associate director of econometrics



Scientific committee

Chair Tor Åge Myklebust, Ph.D., Statistician, Cancer Registry of Norway, Oslo, Norway. 

Anna L.V. Johansson, Ph.D., Assistant Professor, Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden, and Cancer Registry of Norway, Oslo, Norway.

Arne Risa Hole Ph.D., Professor of Economics, Department of Economics, Universitat Jaume I, Spain.

Morten W. Fagerland Ph.D. Head of Section for Biostatistics, Epidemiology, and Health Economics at Oslo Centre for Biostatistics and Epidemiology, Oslo University Hospital, Norway.

Peter Hedström Ph.D., Professor of Analytical Sociology, Linköping University.

Committee email:


General Chair 

Bjarte Aagnes, Cancer Registry of Norway.




Logistics organizers

The conference is jointly organized by The Cancer Registry of Norway – Institute of Population-based Cancer Research, and Metrika Consulting AB the official Stata distributor for Russia and the Nordic and Baltic countries. 



To register for the meeting, please send an email to containing your name, affiliation, and contact details.




Post-conference short course: 

Thursday October 13, 2022

Multilevel Mixed Effects Survival Analysis

Michael J. Crowther, PhD
Biostatistician, Karolinska Institutet
Honorary Senior Lecturer, University of Bristol
Founder and CEO, Red Door Analytics


Target audience: Statisticians and researchers with a good working knowledge of the principles and practice of survival analysis, including modelling of survival data.


· Provide an overview of multilevel mixed effects survival analysis including recurrent event analysis and joint recurrent-terminal event models
· Introduce and illustrate tools in Stata for conducting multilevel survival analysis, including both modelling and prediction, with a focus on calculating clinically useful predictions.

Web :