-
Notifications
You must be signed in to change notification settings - Fork 34
Description
Several arguments of a function fit
typically refer to dataframe variables which are not regressors. Some examples: variables used to compute standard errors, weight variables, variables used to specify rows on which to estimate the model, high dimensional fixed effects, mixed models, etc.
It would be nice to think about the best syntax to refer to these variable. I have thought about three potential syntaxes:
- Define a macro for each argument that needs to capture variable names
fit(df, @formula(y ~ x1), @weight(x2), @vcov(cluster(x3+x4)), @where(x5 >= 0), maxiter = 100)
- Define a model macro that accepts multiple arguments, i.e.
@model(expr, args...)
.fit(df, @model(y ~ x1, weight = x2, vcov = cluster(x3+x4), where = (x5 >= 0)), maxiter = 100)
- Define a fit macro that accepts multiple arguments (syntax closest to reg in Stata and to
@with
in DataFramesMeta.jl), i.e.@fit(expr1, expr2, args...)
@fit df y ~ x1 weight = x2 vcov = cluster(x3+x4) where = (x5 >= 0) maxiter = 100 # or (either would work) @fit(df, y ~ x1, weight = x2, vcov = cluster(x3+x4), where = (x5 >= 0), maxiter = 100)
An additional benefit is that agreeing on a syntax would help to standardize the names of commonly used arguments like "weights" "vcov" "where" across different packages that do statistical estimations. Enforcing these keyword arguments across different statistical estimations, like in Stata, could do a lot to improve the user experience.