I have a predicament that keeps recurring. I have a large dataset with a categorical variable. I want to fit a regression and output what the model's predicted value is out to a single column. Currently, I can do this by iteratively subsetting on each level of the categorical variable, fitting the model, then mapping the results back to the output column:
| inputlookup test_generic.csv
| stats values(x1) as x1
| mvexpand x1
| map search="inputlookup test_generic.csv | search x1=$x1$ | fit LinearRegression response from x2"
I would attach the data I prepared for this question, but I don't have the karma. My question is this:
Q: Is there a way to do this by how the | fit LinearRegression ... is specified?
I have to think there's a better way.
If it helps, this would be fit in R as:
dat <- read.csv("test_generic.csv",header=T)
mod <- lm(response ~ -1 + x1*x2, data=dat)
It could also be fit in python as:
import pandas
import statsmodels.formula.api as sm
dat = pandas.read_csv('test_generic.csv')
mod = sm.ols(formula="response ~ -1 + x1*x2", data=dat).fit()
Thanks in advance!
PS: Here's some data for the test_generic.csv lookup:
"response","x1","x2"
3084,"Alt-Control",221
5623,"Alt-Control",237.8
4957,"Alt-Control",381.5
4019,"Alt-Control",196.8
3283,"Alt-Control",356.45
7365,"Clinical",381.5
3099,"Clinical",483.9
6144,"Clinical",162.6
5499,"Clinical",277.06
3211,"Clinical",422.1
8448,"Control",319.2
14243,"Control",242.5
15917,"Control",229.6
11399,"Control",335.5
6960,"Control",196.9
... View more