Stata 19 highlights
Stata 19 contains a wealth of new and interesting features such as:
Machine learning via H2O: Ensemble decision trees
Machine learning methods are often used to solve research and business problems focused on prediction when the problems require more advanced modeling than linear or generalized linear models. Ensemble decision tree methods, which combine multiple trees for better predictions, are popular for such tasks. H2O is a scalable machine learning platform that supports data analysis and machine learning, including ensemble decision tree methods such as random forest and gradient boosting machine (GBM). View YouTube video.
Conditional average treatment effects (CATE)
Treatment effects estimate the causal effect of a treatment on an outcome. This effect may be constant or it may vary across different subpopulations. CATE informs you whether and how treatment effects differ. View YouTube video.
High-dimensional fixed effects (HDFE)
You can now absorb not just one, but multiple high-dimensional categorical variables in your linear regression, with or without fixed effects, and in linear models accounting for endogeneity using two-stage least squares. This is useful when you want your model to be adjusted for these variables but estimating their effect is not of interest and is computationally expensive. View YouTube video.
Bayesian variable selection for linear regression
The new bayesselect command provides a flexible Bayesian approach to identify the subset of predictors that are most relevant to your outcome. It accounts for model uncertainty when estimating model parameters and performs Bayesian inference for regression coefficients. View YouTube video.
Marginal Cox PH models for interval-censored multiple-event data
Interval-censored multiple-event data commonly arise in longitudinal studies because each study subject may experience several types of events and those events are not observed directly but are known to occur within some time interval. For example, an epidemiologist studying chronic diseases might collect data on patients with multiple conditions, such as heart disease and metabolic disease, during different doctor visits. Similarly, a sociologist might conduct surveys to record major life events, such as job changes and marriages, at regular intervals. View YouTube video.
Meta-analysis for correlations
The meta suite now supports meta-analysis of correlation coefficients, allowing investigation of the strength and direction of relationships between variables across multiple studies. For instance, you may have studies reporting the correlation between education and income levels or between physical activity and improvements in mental health and wish to perform a meta analysis. View YouTube video.
Correlated random-effects (CRE) model
Easily fit CRE models to panel data with the new cre option of the xtreg command. View YouTube video.
Panel-data vector autoregressive (VAR) model
Fit vector autoregressive (VAR) models to panel data. Compute impulse–response functions, perform Granger causality tests and stability tests, include additional covariates, and much more. The new xtvar command has similar syntax and postestimation procedures as var, but it is appropriate for panel data rather than time-series data. View YouTube video.
Bayesian bootstrap and replicate weights
You can use the new bayesboot prefix to perform Bayesian bootstrap of statistics produced by official and community-contributed commands. View YouTube video.
Control-function linear and probit models
Fit control-function linear and probit models with the new cfregress and cfprobit commands. Control-function models offer a more flexible approach to traditional instrumental-variables (IV) methods by including the endogenous variable itself and its first-stage residual in the main regression; the residual term is called a control function. View YouTube video.
Latent class model-comparison statistics
When you perform latent class analysis or finite mixture modeling, it is fundamental to determine the number of latent classes that best fits your data. With the new lcstats command, you can use statistics such as entropy and a variety of information criteria, as well as the Lo–Mendell–Rubin (LMR) adjusted likelihood-ratio test and Vuong–Lo–Mendell–Rubin (VLMR) likelihood-ratio test, to help you determine the appropriate number of classes. View YouTube video.
And much, much more
For additional information, please click here.