|
|
Generalized Additive Modeling Using
|
|
Menu Item |
Description |
|
Specification Frame |
This frame organizes various controls that you may use to specify GAM model components including the dependent variable, independent variables, and lag coefficients. If a categorical variable is specified for an independent variable, the GAMFIT routine will automatically identify it as categorical when it is processed and expand it into 0-1 binary variables. |
|
|
|
|
Dependent Variable |
Use this drop-down list to specify the series that you wish to analyze. |
|
|
|
|
Logit Checkbox |
Specifies that the independent variable is a 0-1 variable. When specified, the GAM model estimates the probability of success/failure based on the independent variables in the model using the logit linking function. |
|
|
|
|
Categorical Checkbox |
Specifies that the dependent variable is a categorical variable. When specified, the application will automatically determine the number of categories (must be coded as integer) and expand the categorical variable into binary (0-1) variables. |
|
|
|
|
Independent Variable |
Use this drop-down list to specify a predictor or categorical variable components in the model. |
|
|
|
|
Lags |
Specifies the lag parameters associated with a random variables or categorical variables. A categorical variable may contain more than one lag parameter; however only one lag specification may be added to the model at a time. For random variables, multiple lag parameters may be added to the model as a group. Multiple lags may be specified using the “TO” keyword to separate contiguous lags. Individual lags may be separated by commas. For example, the user could specify contiguous lags as “0, 1, 3” or as “0 TO 1, 3”. |
|
|
|
|
D.P. (NL fit) |
Specifies the number of degrees of freedom to be used on the variable for smoothing. Specifying the degrees of freedom to 1 restricts the variable as linear. The default is 3 (cubic). |
|
|
|
|
Add |
Clicking on Add appends a new component to the GAM model which is displayed in the model component grid. Multiple instances of the same independent variable may be added to the model as long as the lag operators are unique. For example, in the above form, the user could specify TEMPERTR{0} and TEMPERTR{1} components separately. |
|
|
|
|
Model Components Frame |
The model components frame organizes form controls to display the GAM model components in a grid format, as well as to edit and delete model components. |
|
|
|
|
Model Component Grid |
The components of the GAM model and their attributes are displayed in this grid. The first column displays the independent variable name, the second column displays the individual or grouped lag operators within braces, the third column indicates whether the independent variable is predetermined as a predictor or categorical. The fourth column indicates the number of degrees of freedom for smoothing, and fifth column indicates that a categorical variable is specified and that the number of unique categories will be determined by the program. |
|
|
|
|
Edit |
The user can modify a model component by first placing the mouse cursor on the grid row of interest and then clicking on the Edit button. The Specification Frame will reflect the current attributes of the model component and the Add button will be replaced by the Mod button. Make the necessary changes in the Specification Frame and then click on the Mod button to complete the changes. |
|
|
|
|
|
The user
can delete a model component by placing the mouse cursor on the grid row of
interest and then clicking on the |
|
|
|
|
Clear |
Clears all model components from the model component grid. |
|
|
|
|
Save |
Saves the information in the model component grid to a specified tab-delimited file. |
|
|
|
|
Recall |
Recalls the model component grid information from a specified tab-delimited file created (see Save option above). |
|
|
|
|
|
This frame organizes form controls related to how the data is indexed (by date or none), and what data span is modeled and analyzed. |
|
|
|
|
Date Variable |
Use this drop-down list to specify the date variable associated with your series. If your SCA Data Macro contains a variable named "DATE", it is automatically assigned by SCA WorkBench. If you have an alternative index variable or date variable, you may select it from the drop-down list. If your SCA Data Macro does not contain a DATE variable, leave the dropdown list empty. WorkBench will then use the observation number as a date index. If your time series is more than 10,000 observations, WorkBench will not use your DATE variable for indexing. Instead, observation number will be used. |
|
|
|
|
Begin Span |
Use the Begin drop-down list to omit observations from the beginning of a time series being analyzed. |
|
|
|
|
End Span |
Use the End drop-down list to omit observations from the back of a time series being analyzed. |
|
|
|
|
Back |
Depending on the tab you are currently working in, clicking on the Back button will move you one tab to the left. If you are in the Model tab, you will move to the GAM Data Viewer dialog box where you may choose a new SCA data macro or leave the GAM Modeling Environment. |
|
|
|
|
Exit |
Exits the GAM modeling environment. |
|
|
|
|
Execute |
Executes GAM model estimation, validation, linear model comparison, diagnostics, and graphs by submitting a dynamically created program script to SCAB34S SPLINES. When completed, you will automatically be placed in the Results tab. |
|
|
|
The Options tab sets the estimation limits placed on a GAM model, controls the detail of output and graphics that is produced, and allocates the workspace size of the SCAB34S SPLINES product. More estimation options are available in the GAMFIT matrix subroutines that are not exposed in this GAM Modeling Environment interface. The user may employ these other options by directly editing the B34S script generated by WorkBench.

|
Menu Item |
Description |
|
GAM Estimation Limits Frame |
This frame organizes various controls that set options in GAM model estimation. Here, the user specifies the convergence tolerance for inner and outer looping, and the maximum number of iterations for back-fitting and local scoring. |
|
|
|
|
Convergence Tolerance (Inner Loop) |
Set the convergence tolerance for inner looping in the GAM smoothing algorithm. The default value is 0.1D-8 |
|
|
|
|
Convergence Tolerance (Outer Loop) |
Set the convergence tolerance for outer looping in the GAM smoothing algorithm. The default value is 0.1D-8 |
|
|
|
|
Max. Interactions (Back-fitting) |
Set the maximum number of iterations for back-fitting. The default is 1000. |
|
|
|
|
Max. Iterations (Local scoring) |
Set the maximum number of iterations for local scoring. The default is 1000. |
|
|
|
|
Diagnostics and Graphics Frame |
This frame organizes controls related to the amount of output produced for GAM estimation and diagnostics. The diagnostic charts option produces surface (or leverage) charts for all variables that are used in the final model. |
|
|
|
|
Display Output for Model |
Typically, you want to see the GAM model summary and the OLS model summary. |
|
|
|
|
Display Forecast Table |
The forecast table displays the original series and the predicted series for both the GAM model and OLS models. This can slow down the display of output for larger datasets. |
|
|
|
|
Show Diagnostic Tables |
Several diagnostics are available for the dependent variable and the residuals from the estimated models. Among the diagnostics are a statistical description tables, sample autocorrelation tables, and Hinich nonlinear testing. The Hinich test wil only be displayed for residual series greater than 50 cases. |
|
|
|
|
Show Graphics |
Several graphics are created including time plot of the dependent variable, Actual vs. Predicted, ACF and PACF plots, and modified Q-Statistic plot. |
|
|
|
|
Workspace Size |
The SCAB34S SPLINES product requires its workspace size to be set when the program is initiated. The default workspace is of 2000000 is adequate to handle moderate size datasets. The user may increase the workspace size if needed. Please note that workspace limit is imposed by the amount of available RAM memory of the computer. |
|
|
|
|
GAM Linking Function Frame |
The GAM model requires the specification of a nonlinear link function to declare how the mean of the dependent variable is dependent upon the additive predictor. The error distribution can also be specified. |
|
|
|
|
GAM Linking Function |
Specify the nonlinear link function between the mean of the dependent variable and the additive predictor. The available options are identity, inverse, logit, logarithm, and Cox. The default is identity. |
|
|
|
|
GAM Error Distribution |
Specify the assumed error distribution for fitting. The available options are Gaussian, Binomial, Poisson, Gamma, and Cox. The default is Gaussian. |
|
|
|
This tab allows you to evaluate the performance of GAM model prediction and validate the GAM model against a linear regression model method using simple OLS, MINIMAX, L1, Logit or Probit estimation. A common problem with most nonlinear modeling methods is over-fitting. Models that over-fit the data often perform well within the sample, but do substantially worse when predicting out of sample. Comparing in-sample fit and out-of-sample prediction performance allows the user to evaluate problems related to over-fitting. If over-fitting is suspected, the number of degrees of freedom for GAM smoothing should be reduced for one or more variables of concern. Also, since out-of-sample GAM prediction is accomplished using a polynomial regression approach to approximate the smoothing splines, the setting for number of D.F. for polynomial regression may also affect out-of-sample prediction performance. A low setting may not be able to adequately approximate the curvature whereas a high setting may cause an estimation error. A setting between 3-9 is reasonable for most situations.

![]()

The GAM modeling approach can be used effectively for both cross-sectional data and time series data. The GAM user interface offered in WorkBench leverages its utility in time series applications by allowing the dependent variable and predictor variables to be lagged.
The default validation setting compares the in-sample fit of the estimated GAM model against the in-sample fit of a simple OLS regression model. All available observations are used to evaluate fit using root mean squared error (RMSE) and mean absolute percentage error (MAPE) criteria.
Other options are available to validate the GAM model. For example, if the user is primarily interested in evaluating the fit of the model in the later part of the series, a holdout sample can be specified by typing the number of observations (or percentage) to be marked from the back of the series. After specifying the holdout, the user can evaluate in-sample fit for the “holdout period” only by setting the option “Include holdout in estimation (compare holdout only)”. The user also has two choices to evaluate the prediction performance of the model where the holdout period is not used in training the model.
As another validation criterion, the user can compare the improvement of a GAM model versus a regression model with the same right-hand side variables. Diagnostics are produced for both the GAM and regression models. If the dependent variable is nonlinear in its response to the transformed (smoothed) regressor variables, the GAM model should reveal significant improvement in model fit and out-of-sample forecasting performance.
A confusion matrix is produced for the GAM-Logit model and the comparison linear model for evaluating classification power of the models. The user has a choice for determining the probability cut-off value for classification of positive and negative cases for the final confusion matrix. The user can allow the system to set the probability cut-off automatically using the maximum G-MEAN values as the criteria, or using specific cut-off values. If GMEAN1 is used, the cut-off will slightly favor True-Positive classifications and if GMEAN2 is used, the cut-off will consider equally True-Positive and True-Negative classifications. Since the determination of cut-off probability thresholds is subjective, a table of ratio statistics for a range of cut-off probability values is also provided in the output.
|
Menu Item |
Description |
|
Validation Settings Frame |
This frame organizes controls for specifying a holdout sample for forecast performance and model validation. It also provides controls for the user to specify the type of validation for in-sample or out-of-sample forecasting. |
|
|
|
|
# to holdout |
Specifies the number of observations that are to be reserved from the back of the dependent variable for evaluating forecast performance. The percentage of the holdout sample relative to the series length is computed and is displayed in % to holdout. |
|
|
|
|
% to holdout |
Specifies the size of the holdout sample as a percentage of the length of the dataset. The actual number of observations reserved from the back of the series is computed and displayed in # to holdout. |
|
|
|
|
Compare all obs |
Evaluate the in-sample fit of the model for all observations. |
|
|
|
|
Compare holdout only for in-sample fit |
Evaluate the in-sample fit of the model for the defined holdout sample only |
|
|
|
|
Compare holdout for out-of-sample fit |
Evaluate the out-of-sample forecasts defined by the holdout sample. The model is estimated using observations up to the first forecast origin only |
|
|
|
|
OLS Method Comparison Frame |
This frame organizes controls to validate the GAM model against a regression model with the same right-hand-side variables used in the GAM model. |
|
|
|
|
Logistic Method Comparsion Frame |
This frame organizes controls to validate the logistic GAM model against a Logit or Probit model with the same right-hand side variables used in the logistic GAM model. |
|
|
|
|
Perform comparison |
By default a comparison is made to GAM using a simple OLS regression estimation method if the dependent variable is random. A comparison is not automatically performed if the dependent variable is specified as a logistical variable. |
|
|
|
|
OLS model |
Estimates a regression model using the ordinary least squares (OLS) method. |
|
|
|
|
MINIMAX model |
Estimates a regression model using the MINIMAX method which minimizes |
|
|
|
|
L1 model |
Estimates a
regression model using the L1 method which minimizes |
|
|
|
|
Logistic model |
Estimates a logistic regression model in comparison to a logistic GAM model. |
|
|
|
|
Probit model |
Estimates a probit regression model in comparison to a logistic GAM model. |
|
|
|
|
Probability thresholds |
The threshold values for classifying a predicted case as a positive or negative instance. |
|
|
|
Results Tab
The results tab provides a convenient facility to view output from GAM model estimation. It also allows you to view the input commands for SCAB34S SPLINES execution. If there are errors during estimation, you can view the log file for a detailed account of all commands executed and error messages.
After the user executes the GAM model application by clicking on the Execute button, SCAB34S SPLINES will display a graph of the actual versus fitted data. This indicates that the GAMFIT procedure has completed. The user should click anywhere on the graph (an example is shown below) to close it.

After the graph disappears, the user will be placed on the Results tab of the GAM Modeling environment where the output is listed.

|
Menu Item |
Description |
|
View GAM Output File |
Displays the GAM modeling results and tabulated diagnostics. |
|
|
|
|
View GAM Input Commands |
Displays the input commands submitted to SCAB34S SPLINES. You can modify the commands directly in this window and submit the modified command file by clicking on the Execute button. |
|
|
|
|
View GAM Log File |
Displays a detailed command and error log for jobs submitted to SCAB34S SPLINES |
|
|
|
|
Print |
Send information displayed in the viewer to the printer. |
|
|
|
|
Save |
Saves the information in the viewer to a file. You may want to use this feature to save the modeling script with intentions of executing it later from the System -> Run SCA with Macro menu, or the System -> Run SCAB34S Program File menu. |
|
|
|
|
Execute |
While you are in the Results tab, if you click on Execute, you will send the information in the viewer to SCAB34S SPLINES for processing. |
|
|
|
The Graphics tab provides a facility to view high-resolution plots that were generated. If you previously selected the Show/Create Graphs option, the individual graphs will initially be displayed on screen. When you click on the graph, the next generated graph will appear until all graphics have been created. As the graphs are displayed, they are also being saved as Windows Meta Files using fixed names such as “yvar.wmf” or “acfa.wmf”.

You can review all created graphic files by selecting the graph from the set of radio buttons provided in the small tabbed area to the left of the viewer control. In the example above, we are viewing the OLS and GAM residuals overlaid on each other. The name of the graphic file (gam_res.wmf) is displayed for reference. Since the graphs are saved to fixed file names, they are overwritten each time you generate a new set of graphs from the GAM modeling environment. If you wish to save the graphic file for future reference, please use the Save button on this tab to copy the file to a new name. Please do not rename the file extension because the Save button only renames the file. It does not convert it to a new format. You can view those renamed files by using the Load Graph from File facility. You may send the graph to the printer by clicking on the Print command button. If you double-click on the graph image it will load in the external program that is associated with WMF files on your computer (e.g., Windows FAX/Picture Viewer).
If you elected to create diagnostic charts, curvature plots of the transformed predictor variables in the GAM model are displayed relative to the dependent variable. Since a variable number of charts may be created based on number of explanatory variables, the file names are sequenced from SCOEF___1 – SCOEF__## and may be viewed by selecting the file name from the list box provided. An example of a curvature chart is displayed below:

In the above graph, we are viewing the curvature of smoothed temperature. The SCOEF*.wmf files are overwritten; therefore the file should be renamed or moved to another location if the graph is to be saved for future reference.
[1] The text, Specifying and Diagnostically Testing Econometric Models, by Houston H. Stokes Greenwood Press (1997) documents the basic B34S capability. A comprehensive document covering the B34S matrix command facilities is under preparation.