|
|
Multivariate Adaptive Regression Spline Modeling
Using
|
|
Menu Item |
Description |
|
Specification Frame |
This frame organizes various controls that you may use to specify MARSPLINE model components including the dependent variable, independent variables, and lag coefficients. If a categorical variable is specified for an independent variable, the MARSPLINE routine will automatically identify it as categorical when it is processed. |
|
|
|
|
Dependent Variable |
Use this drop-down list to specify the series that you wish to analyze. |
|
|
|
|
Probit Checkbox |
Specifies that the independent variable is a 0-1 variable. When specified, the MARSPLINE model estimates the probability of success/failure based on the independent variables in the model. |
|
|
|
|
Categorical Checkbox |
Specifies that the dependent variable is a categorical variable. When specified, the application will automatically determine the number of categories (must be coded as integer) and expand the categorical variable into multiple binary (0-1) variables. |
|
|
|
|
Independent Variable |
Use this drop-down list to specify a random or categorical regressor variable component in the model. |
|
|
|
|
Lags |
Specifies the lag parameters associated with a random variables or categorical variables. A categorical variable may contain more than one lag parameter; however only one lag specification may be added to the model at a time. For random variables, multiple lag parameters may be added to the model as a group. Multiple lags may be specified using the “TO” keyword to separate contiguous lags. Individual lags may be separated by commas. For example, the user could specify contiguous lags as “0, 1, 3” or as “0 TO 1, 3”. |
|
|
|
|
Add |
Clicking on Add appends a new component to the MARSPLINE model which is displayed in the model component grid. Multiple instances of the same independent variable may be added to the model as long as the lag operators are unique. For example, in the above form, the user could have optioned to specify the TEMPERTR{0} and TEMPERTR{1} components separately. |
|
|
|
|
Model Components Frame |
The model components frame organizes form controls to display the MARSPLINE model components in a grid format, as well as to edit and delete model components. |
|
|
|
|
Model Component Grid |
The components of the MARSPLINE model and their attributes are displayed in this grid. The first column displays the independent variable name, the second column displays the individual or grouped lag operators within braces, the third column indicates whether the independent variable is predetermined as random or categorical, the fourth and fifth columns are not used at this time. |
|
|
|
|
Edit |
The user can modify a model component by first placing the mouse cursor on the grid row of interest and then clicking on the Edit button. The Specification Frame will reflect the current attributes of the model component and the Add button will be replaced by the Mod button. Make the necessary changes in the Specification Frame and then click on the Mod button to complete the changes. |
|
|
|
|
|
The user can delete a model component by
placing the mouse cursor on the grid row of interest and then clicking on
the |
|
|
|
|
Clear |
Clears all model components from the model component grid. |
|
|
|
|
Save |
Saves the information in the model component grid to a specified tab-delimited file. |
|
|
|
|
Recall |
Recalls the model component grid information from a specified tab-delimited file created (see Save option above). |
|
|
|
|
|
This frame organizes form controls related to how the data is indexed (by date or none), and what data span is modeled and analyzed. |
|
|
|
|
Date Variable |
Use this drop-down list to specify the date variable associated with your series. If your SCA Data Macro contains a variable named "DATE", it is automatically assigned by SCA WorkBench. If you have an alternative index variable or date variable, you may select it from the drop-down list. If your SCA Data Macro does not contain a DATE variable, leave the dropdown list empty. WorkBench will then use the observation number as a date index. If your time series is more than 10,000 observations, WorkBench will not use your DATE variable for indexing. Instead, observation number will be used. |
|
|
|
|
Begin Span |
Use the Begin drop-down list to omit observations from the beginning of a time series being analyzed. |
|
|
|
|
End Span |
Use the End drop-down list to omit observations from the back of a time series being analyzed. |
|
|
|
|
Back |
Depending on the tab you are currently working in, clicking on the Back button will move you one tab to the left. If you are in the Model tab, you will move to the MARSPLINE Data Viewer dialog box where you may choose a new SCA data macro or leave the MARSPLINE Modeling Environment. |
|
|
|
|
Exit |
Exits the MARSPLINE modeling environment. |
|
|
|
|
Execute |
Executes MARSPLINE model estimation, validation, linear model comparison, diagnostics, and graphs by submitting a dynamically created program script to SCAB34S SPLINES. When completed, you will automatically be placed in the Results tab. |
|
|
|
The Options tab sets the estimation limits placed on a MARSPLINE model, controls the detail of output and graphics that is produced, and allocates the workspace size of the SCAB34S SPLINES product. More estimation options are available in the MARSPLINE matrix subroutine that are not exposed in this MARSPLINE Modeling Environment interface. The user may employ these other options by directly editing the B34S script generated by WorkBench.

|
Menu Item |
Description |
|
MARSPLINE Estimation Limits Frame |
This frame organizes various controls that set options in MARSPLINE model estimation. Here, the user specifies the maximum number of basis functions (or spline knots) that may be included in the MARSPLINE model, the degrees of freedom (or penalty) imposed for knot optimization, and the maximum number of interactions that may occur between regressor variables. The minimum span between knot placements is automatically determined by MARSPLINE. |
|
|
|
|
Maximum Number of Knots |
The maximum number of possible knots limits the number of knots that may be included in the MARSPLINE model during its selection process. Increasing this setting allows for a greater number of basis functions to be evaluated before pruning and sometimes results in a better performing model. The default is 5. |
|
|
|
|
DF (knot optimization) |
Sets the number of degrees of freedom charged for unrestricted knot optimization. |
|
|
|
|
Maximum Interactions |
Sets the maximum number of interactions between variables for any given basis function. |
|
|
|
|
Minimum Span between Knots |
The minimum span allowed between spline knots is automatically determined by the MARSPLINE subroutine when this option is set to 0. The number of regressor variables and the number of observations in the series determine the minimum span between knots. This setting currently can not be modified by the user. |
|
|
|
|
Diagnostics and Graphics Frame |
This frame organizes controls related to the amount of output produced for MARSPLINE estimation and diagnostics. The math form option will display the model in a form that is easily transferred into a standard program language. The alternative is a model displayed in summation form. The contribution chart option produces contribution (or leverage) charts for all variables that are used in the final model. If the variable is additive (no interations with other variables) the companion variables are set to their median values. If interactions exist, three charts are produced that set the companion variables to their minimum, median, and maximum. The vertical axis displays the predicted values when the target variable takes on a range between its minimum and maximum while all other variables are held constant to their median values. |
|
|
|
|
Display Output for Model |
Typically, you want to see the MARSPLINE model summary and the OLS model summary. |
|
|
|
|
Display Forecast Table |
The forecast table displays the original series and the predicted series for both the MARSPLINE model and OLS models. |
|
|
|
|
Show Diagnostic Tables |
Several diagnostics are available for the dependent variable and the residuals from the estimated models. Among the diagnostics are a statistical description tables, sample autocorrelation tables, and Hinich nonlinear testing. The Hinich test wil only be displayed for residual series greater than 50 cases. |
|
|
|
|
Show Graphics |
Several graphics are created including time plot of the dependent variable, Actual vs. Predicted, ACF and PACF plots, and modified Q-Statistic plot. |
|
|
|
|
Workspace Size |
The SCAB34S SPLINES product requires its workspace size to be set when the program is initiated. The default workspace is of 2000000 is adequate to handle moderate size datasets. The user may increase the workspace size if needed. Please note that workspace limit is imposed by the amount of available RAM memory of the computer. |
|
|
|
This tab allows you to evaluate the performance of MARSPLINE model prediction and validate the MARSPLINE model against a linear regression model method using simple OLS, MINIMAX or L1 estimation. A common problem with most nonlinear modeling methods is over-fitting. Models that over-fit the data often perform well within the sample, but do substantially worse when predicting out of sample. The MARSPLINE routine addresses this problem by penalizing overly complex models; controlling the minimum span between spline knots and allowing the user to set the number of degrees of freedom for unrestricted knot optimization, among others.

![]()

Although the MARSPLINE modeling approach is well suited to cross-sectional data, it can also be employed successfully on time series data. Lewis and Stevens (1991) discuss the application of MARSPLINE models (ASTAR) on lagged values of Y as an alternative to Threshold Autoregressive (SETAR) models (see Tong, 1983). The MARSPLINE user interface offered in WorkBench leverages its utility in time series applications by allowing the dependent variable to be lagged in the model thus addressing some issues related to serial correlation in the data.
The default validation setting compares the in-sample fit of the estimated MARSPLINE model against the in-sample fit of a simple OLS regression model. All available observations are used to evaluate fit using root mean squared error (RMSE) and mean absolute percentage error (MAPE) criteria.
Other options are available to validate the MARSPLINE model. For example, if the user is primarily interested in evaluating the fit of the model in the later part of the series, a holdout sample can be specified by typing the number of observations (or percentage) to be marked from the back of the series. After specifying the holdout, the user can evaluate in-sample fit for the “holdout period” only by setting the option “Include holdout in estimation (compare holdout only)”. The user also has two choices to evaluate the prediction performance of the model where the holdout period is not used in training the model. Those options are to estimate the model up to the first forecast origin only or to re-estimate the model at each forecast origin. Note that re-estimating the model at each forecast origin is computing intensive and the spline knot placements and model structure may change across the sample period since the knots and variable inclusion is performed automatically during MARSPLINE estimation.
The MARSPLINE model can be visually examined as a multi-dimensional surface. The boundaries of this estimated surface are defined by the response of the dependent variable to a range of levels in the regressor variables. Problems arise from out of sample forecasting when the values of the regressor variables fall outside their minimum and maximum range in model estimation. When this occurs, the dependent variable’s responses to these new levels are not known. Consequently, it is important that the MARSPLINE model be re-examined occasionally and the model retrained if new maximums or minimums are present in the updated data.
As another validation criterion, the user can compare the improvement of a MARSPLINE model versus a regression model with the same right-hand side variables. Diagnostics are produced for both the MARSPLINE and regression models. If the dependent variable is nonlinear in its response to levels of the regressor variables, the MARSPLINE model should reveal vast improvement in model fit and out-of-sample forecasting performance criteria.
A confusion matrix is produced for the MARSPLINE-Probit model and the comparison linear model for evaluating classification power of the models. The user has a choice for determining the probability cut-off value for classification of positive and negative cases for the final confusion matrix. The user can allow the system to set the probability cut-off automatically using the maximum G-MEAN values as the criteria, or using specific cut-off values. If GMEAN1 is used, the cut-off will slightly favor True-Positive classifications and if GMEAN2 is used, the cut-off will consider equally True-Positive and True-Negative classifications. Since the determination of cut-off probability thresholds is subjective, a table of ratio statistics for a range of cut-off probability values is also provided in the output.
|
Menu Item |
Description |
|
Validation Settings Frame |
This frame organizes controls for specifying a holdout sample for forecast performance and model validation. It also provides controls for the user to specify the type of validation for in-sample or out-of-sample forecasting. |
|
|
|
|
# to holdout |
Specifies the number of observations that are to be reserved from the back of the dependent variable for evaluating forecast performance. The percentage of the holdout sample relative to the series length is computed and is displayed in % to holdout. |
|
|
|
|
% to holdout |
Specifies the size of the holdout sample as a percentage of the length of the dataset. The actual number of observations reserved from the back of the series is computed and displayed in # to holdout. |
|
|
|
|
Compare all obs |
Evaluate the in-sample fit of the model for all observations. |
|
|
|
|
Compare holdout only for in-sample fit |
Evaluate the in-sample fit of the model for the defined holdout sample only |
|
|
|
|
Compare holdout for out-of-sample fit |
Evaluate the out-of-sample forecasts defined by the holdout sample. The model is estimated using observations up to the first forecast origin only |
|
|
|
|
OLS Method Comparison Frame |
This frame organizes controls to validate the MARSPLINE model against a regression model with the same right-hand-side variables used in the MARSPLINE model. |
|
|
|
|
Logistic Method Comparsion Frame |
This frame organizes controls to validate the MARS-Probit model against a Logit or Probit model with the same right-hand side variables used in the MARSPLINE model. |
|
|
|
|
Perform comparison |
By default a comparison is made to MARSPLINE using a simple OLS regression estimation method if the dependent variable is random. A comparison is not automatically performed if the dependent variable is specified as a logistical variable. |
|
|
|
|
OLS model |
Estimates a regression model using the ordinary least squares (OLS) method. |
|
|
|
|
MINIMAX model |
Estimates a regression model using the MINIMAX method which minimizes |
|
|
|
|
L1 model |
Estimates a regression model using the
L1 method which minimizes |
|
|
|
|
Logistic model |
Estimates a logistic regression model in comparison to a MARSprobit model. |
|
|
|
|
Probit model |
Estimates a probit regression model in comparison to a MARSprobit model. |
|
|
|
|
Probability thresholds |
The threshold values for classifying a predicted case as a positive or negative instance. |
|
|
|
The results tab provides a convenient facility to view output from MARSPLINE model estimation. It also allows you to view the input commands for SCAB34S SPLINES execution. If there are errors during estimation, you can view the log file for a detailed account of all commands executed and error messages.
After the user executes the MARSPLINE model application by clicking on the Execute button, SCAB34S SPLINES will display a graph of the actual versus fitted data. This indicates that the MARSPLINE procedure has completed. The user should click anywhere on the graph to close it.
After the graph disappears, the user will be placed on the Results tab of the MARSPLINE Modeling environment where the output is listed.

|
Menu Item |
Description |
|
View MARSPLINE Output File |
Displays the MARSPLINE modeling results and tabulated diagnostics. |
|
|
|
|
View MARSPLINE Input Commands |
Displays the input commands submitted to SCAB34S SPLINES. You can modify the commands directly in this window and submit the modified command file by clicking on the Execute button. |
|
|
|
|
View MARSPLINE Log File |
Displays a detailed command and error log for jobs submitted to SCAB34S SPLINES |
|
|
|
|
Print |
Send information displayed in the viewer to the printer. |
|
|
|
|
Save |
Saves the information in the viewer to a file. You may want to use this feature to save the modeling script with intentions of executing it later from the System -> Run SCA with Macro menu, or the System -> Run SCAB34S Program File menu. |
|
|
|
|
Execute |
While you are in the Results tab, if you click on Execute, you will send the information in the viewer to SCAB34S SPLINES for processing. |
|
|
|
The Graphics tab provides a facility to view high-resolution plots that were generated. If you previously selected the Show/Create Graphs option, the individual graphs will initially be displayed on screen. When you click on the graph, the next generated graph will appear until all graphics have been created. As the graphs are displayed, they are also being saved as Windows Meta Files using fixed names such as “yvar.wmf” or “acfa.wmf”.

You can review all created graphic files by selecting the graph from the set of radio buttons provided in the small tabbed area to the left of the viewer control. In the example below, we are viewing the sample autocorrelations of the MARSPLINE model residuals. The name of the graphic file (acfa.wmf) is displayed for reference. Since the graphs are saved to fixed file names, they are overwritten each time you generate a new set of graphs from the MARSPLINE modeling environment. If you wish to save the graphic file for future reference, please use the Save button on this tab to copy the file to a new name. Please do not rename the file extension because the Save button only renames the file. It does not convert it to a new format. You can view those renamed files by using the Load Graph from File facility. You may send the graph to the printer by clicking on the Print command button. If you double-click on the graph image it will load in the external program that is associated with WMF files on your computer (e.g., Windows FAX/Picture Viewer).
If you elected to create contribution charts, at least one contribution (leverage) chart will be generated for each variable used in the final model. Since a variable number of charts may be created based on number of explanatory variables and interactions, the file names are sequenced from CChart01 – CChart## and may be viewed by selecting the file name from the listbox provided. An example of a contribution chart is displayed below:

The contribution chart reveals how the predicted
value (vertical axis) is influenced by a particular explanatory variable
(horizontal axis) in the model when all other variables are held
constant. In the above graph, we
are viewing the leverage of temperature at lag=0 when all other variables in
the model are set to their median values. The title in the chart includes
information on the lag, number of interaction terms, and companion variable
value setting. The CChart*.wmf files are overwritten, therefore the file
should be renamed or moved to another location if the file is to be saved for
future reference.