Skip to content

AddLinearModel and CompactLinearModel class objects along with introduction of fitlm and stepwiselm#424

Open
Pasta-coder wants to merge 26 commits intognu-octave:mainfrom
Pasta-coder:grp2idx
Open

AddLinearModel and CompactLinearModel class objects along with introduction of fitlm and stepwiselm#424
Pasta-coder wants to merge 26 commits intognu-octave:mainfrom
Pasta-coder:grp2idx

Conversation

@Pasta-coder
Copy link
Copy Markdown
Contributor

@Pasta-coder Pasta-coder commented Apr 6, 2026

This PR introduces a major overhaul and implementation LinearModel and CompactLinearModel class objects along with introduction of fitlm . it also introduces stepwiselm but it's unusable .

textinfos , demos , BISTs are added but are not sufficient (as per statistics package convention)

related to #266 . (if not mergeable then it can definitely be used as a reference pr)

few successful manual tests are (while the entire testsuite against which i am testing can be found in the file final_test.md) -

rng(0);
n = 100;
x1 = randn(n, 1);
x2 = randn(n, 1);
x3 = randn(n, 1);
catVar = randi([1, 3], n, 1);  % 3 levels
y = 2 + 1.5*x1 - 0.7*x2 + 0.3*(catVar==2) + 0.6*(catVar==3) + 0.5*randn(n,1);
tbl = table(x1, x2, x3, categorical(catVar, 1:3, {'A','B','C'}), y, ...
            'VariableNames', {'x1','x2','x3','Color','y'});

[g, gn] = grp2idx({'Red','Blue','Red','Green'})

[g, gn] = grp2idx(categorical({'Red','Blue','Red','Green'}))

[g, gn] = grp2idx([1, 2, NaN, 1, 2])

[g, gn] = grp2idx([])

[g, gn] = grp2idx(ones(10,1))


D = dummyvar({'Red','Blue','Red','Green'})

D = dummyvar(categorical({'Red','Blue','Red','Green'}))

D = dummyvar([1, 2, 1, 3])

D = dummyvar([1,2; 1,3; 2,1; 2,3])

D_ref = dummyvar([1,2,1,3])
D_full = dummyvar([1,2,1,3], 'full')




mdl = fitlm(tbl, 'y ~ x1 + x2')

mdl = fitlm(tbl, 'y ~ 1')

mdl = fitlm(tbl, 'y ~ x1 + x2 - 1')

mdl = fitlm(tbl, 'y ~ x1*x2')

mdl = fitlm(tbl, 'y ~ x1:x2')

mdl = fitlm(tbl, 'y ~ x1/Color')   % Color is categorical

mdl = fitlm(tbl, 'y ~ (x1 + x2)^2')

mdl = fitlm(tbl, 'y ~ x1^3')

mdl = fitlm(tbl, 'y ~ x1^2 - x1')

mdl = fitlm(tbl, 'y ~ x1 + C(Color)')

mdl = fitlm(tbl, 'y ~ x1 + x2 + x3')

mdl = fitlm(tbl, 'y ~ Color')

mdl = fitlm(tbl, 'y ~ Color * x1')

mdl = fitlm(tbl, 'y ~ x1 + Color')

mdl = fitlm(tbl, 'y ~ x1 + catVar', 'CategoricalVars', 'catVar')

mdl = fitlm(tbl, 'y ~ Color', 'DummyVarCoding', 'full')

mdl = fitlm(tbl, 'y ~ x1 + x2 + Color');




compactMdl = compact(mdl)

mdl.Coefficients

mdl.Rsquared

mdl.RMSE

mdl.DFE

mdl.Formula

mdl.NumCoefficients

mdl.NumEstimatedCoefficients

mdl.MSE

mdl.SSE
mdl.SSR
mdl.SST

mdl.ModelCriterion

mdl.CoefficientCovariance




mdl = fitlm(tbl, 'y ~ x1 + x2 + Color');

Xnew = [1.2, 0.5, categorical(2, 1:3, {'A','B','C'})];
ypred = predict(mdl, Xnew)

[ypred, yci] = predict(mdl, Xnew)

[ypred, yci] = predict(mdl, Xnew, 'Alpha', 0.01)

[ypred, yci] = predict(mdl, Xnew, 'Prediction', 'observation')
[ypred, yci] = predict(mdl, Xnew, 'Prediction', 'curve')

[ypred, yci] = predict(mdl, Xnew, 'Simultaneous', true)

Xnew_tbl = table(1.2, 0.5, categorical(2, 1:3, {'A','B','C'}), ...
                 'VariableNames', {'x1','x2','Color'});
ypred = predict(mdl, Xnew_tbl)

ypred = feval(mdl, Xnew)




mdl = fitlm(tbl, 'y ~ x1 + x2 + Color')
disp(mdl)

mdl = fitlm(tbl, 'y ~ x1 + x2 - 1')
disp(mdl)

X_rd = randn(100,2);
X_rd(:,3) = X_rd(:,1) + X_rd(:,2);
y_rd = X_rd * [1; -1; 0.5] + 0.1*randn(100,1);
mdl_rd = fitlm(X_rd, y_rd)
disp(mdl_rd)

mdl_rob = fitlm(tbl, 'y ~ x1 + x2', 'RobustOpts', 'on')
disp(mdl_rob)






mdl = fitlm(tbl, 'y ~ x1 + x2 + Color');

ci = coefCI(mdl)

p = coefTest(mdl)

Added an explicit check in  to throw a fatal error when passed
an empty array, matching MATLAB's standard behavior. This prevents the
silent propagation of empty variables into dependent architectures.
Added explicit validation to match MATLAB parity. dummyvar now correctly
throws a dimension error for non-column categorical variables, preventing
the silent collapse of the design matrix. It also correctly rejects raw
cell arrays to prevent unintended dynamic type conversions.
- Throw error when formula references unrecognized variable names (F1.4.5)
- Route DummyVarCoding NV pair through fitlm pipeline to design builder
- Implement 'full' dummy coding (all levels, no reference drop) for categoricals (F1.4.6)
- Default remains 'reference' coding for backward compatibility
- Convert Coefficients property from struct to table with RowNames (F2.1.2)
- Create LinearFormula class with clean disp for Formula encapsulation (F2.1.6)
- Update CompactLinearModel disp/predict to handle LinearFormula objects
- Fix addTerms/removeTerms dimension mismatch when Terms matrix includes response column
- Implement table-to-design-matrix conversion in predict with categorical dummy expansion
- Only extract predictor columns active in the model via InModel flag
- Add feval method as predict alias (MATLAB parity)
- Add feval dispatch in subsref
@Pasta-coder Pasta-coder changed the title Fix categorical variable mapping in grp2idx implementation LinearModel and CompactLinearModel class objects along with introduction of fitlm and stepwiselm Apr 19, 2026
@Pasta-coder Pasta-coder changed the title implementation LinearModel and CompactLinearModel class objects along with introduction of fitlm and stepwiselm AddLinearModel and CompactLinearModel class objects along with introduction of fitlm and stepwiselm Apr 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant