Skip to content

Multipart Formula #21

@matthieugomez

Description

@matthieugomez

Several arguments of a function fit typically refer to dataframe variables which are not regressors. Some examples: variables used to compute standard errors, weight variables, variables used to specify rows on which to estimate the model, high dimensional fixed effects, mixed models, etc.

It would be nice to think about the best syntax to refer to these variable. I have thought about three potential syntaxes:

  1. Define a macro for each argument that needs to capture variable names
    fit(df, @formula(y ~ x1), @weight(x2), @vcov(cluster(x3+x4)), @where(x5 >= 0), maxiter = 100)
  2. Define a model macro that accepts multiple arguments, i.e. @model(expr, args...).
    fit(df, @model(y ~ x1, weight = x2, vcov = cluster(x3+x4), where = (x5 >= 0)), maxiter = 100)
  3. Define a fit macro that accepts multiple arguments (syntax closest to reg in Stata and to @with in DataFramesMeta.jl), i.e. @fit(expr1, expr2, args...)
    @fit df y ~ x1 weight = x2 vcov = cluster(x3+x4) where = (x5 >= 0) maxiter = 100
    # or (either would work)
    @fit(df, y ~ x1, weight = x2, vcov = cluster(x3+x4), where = (x5 >= 0), maxiter = 100)

An additional benefit is that agreeing on a syntax would help to standardize the names of commonly used arguments like "weights" "vcov" "where" across different packages that do statistical estimations. Enforcing these keyword arguments across different statistical estimations, like in Stata, could do a lot to improve the user experience.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions