Alternating Conditional Expectations Analysis Using
SCAB34S SPLINES and SCA WorkBench

Alternating Conditional Expectations (ACE) is provided by the B34S® ProSeries Econometric System and SCAB34S SPLINES software products.  SCA WorkBench provides the user interface to shell a ACE modeling and validation environment in the B34S program suite. 

SCAB34S SPLINES provides a subset of the capabilities in the B34S® ProSeries Econometric System and we refer to these products interchangeably within this document.  SCAB34S SPLINES runs conveniently as an integrated component to SCA WorkBench.  The WorkBench product is a companion to the SCA Statistical System and SCAB34S software, providing a graphical user interface for ACE modeling and analysis.  Within the context of ACE model validation, the predictive performance of these models may be validated by comparing the in-sample and out-of-sample predictive values to linear regression models using OLS, MINIMAX, or L1 estimation methods.  Within the context of using ACE models used for pseudo-logistic analysis, the classification performance of these models may be validated by viewing the Confusion Matrices and Lift-Gains between the ACE model and a linear regression, probit, or logistic model.

The SCAB34S SPLINES product provides a number of procedures to perform common data manipulation tasks, organizational tasks, and statistical/econometric analysis tasks.  It also contains a comprehensive matrix programming language that may be used to customize procedures for specialized use.  No attempt will be made to cover all features of the SCAB34S product in this document or the full range of applications that may be solved using the B34S matrix programming facilities.[1]  Instead, we shall exclusively use the graphical user interface of SCA WorkBench to specify, estimate, and diagnostically test ACE models in SCAB34S SPLINES.  SCA WorkBench automatically specifies the command script executed in the SCAB34S SPLINES product based on menu selections.  A command file is then executed in the SCAB34S engine and the results are read back into WorkBench for examination.  The user may save the program file and modify the command script to address additional analysis requirements that may arise.

A major assumption of any linear process is that the coefficients are stable across all levels of the explanatory variables and, in the case of a time series model, across all time periods.  The ACE model is a very useful method of analysis when it is suspected that certain predictor variables may be nonlinear with respect to the dependent variable.  There are many theoretical reasons consistent with this occurring in many different applications including energy, finance, economics, medical, social science, and manufacturing. 

ACE models can be used as a diagnostic tool in determining potential nonlinear relationships of transformed predictor variables with respect to the transformed dependent variable.  Here, the user can investigate variable relationships without imposing a priori information and uncover complex relationships that would otherwise not be possible through classical multiple linear regression methods.  Since ACE models are not limited by an imposed functional form, the data itself suggests the functional form of the variable relationships.  ACEFIT uses nonparametric fitting based on a scatter plot smoother to fit a smooth relationship between two or more variables. The smoother summarizes the trend of the transformed response variable as a function of the transformed predictor variables by iteratively smoothing partial residuals in a process known as back-fitting. By examining the surface plots of the transformations employed by ACE, the functional form of the variables can be evaluated and interpreted, where the proportion of variation in dependent variable is explained in the predictor variables. 

 

ACE models are part of a class of nonparametric regression methods.  Generalized Additive Models (GAM) and Multiple Additive Regression Splines (MARSpline) are also classified as nonparametric regression methods.  The novel idea of ACE is to transform the independent variable (response variable) as well as the dependent variables (predictor variables) to define the regression surface.  As with all nonparametric methods, the regression surface is defined in a data-driven manner as opposed to a model-driven manner.    

 

ACE MODELING USING SCAB34S SPLINES AND WORKBENCH

Assume a nonlinear model of the form

 

                                                                                                                   

 

where xi and y are one dimensional vectors, an ACE model (see Brieman-Friedman (1985)) can be written as

 

                                                                                                                

 

If  is invertible, the estimated model in can be written as

 

                                                                                                             

The ACE algorithm minimizes the squared error subject to .  The steps of the ACE algorithm[2] are:

 

(i)           Initialize by setting

(ii)         Fit an additive model to that will obtain new functions

(iii)       Compute and update the left hand side by forming

(iv)       Alternate steps (ii) and (iii) until  does not change

Step (ii) can be thought of as for a fixed  , the minimizing   is   while step (iii) can  be thought of as for fixed , the minimizing    is  

 

SCA WorkBench: A Graphical User Interface

 

SCA WorkBench provides a convenient graphical user interface to SCAB34S SPLINES for ACE modeling.  The WorkBench interface builds the data loading steps and commands based on the user’s menu selections.  The associated commands are then organized as an SCAB34S program file and submitted to the SCAB34S engine. 

The ACE modeling environment in WorkBench is organized by tabs shown below. 

 

The Model  tab is used to specify the variables, variable types, and lagged components of the ACE model.  The Options tab sets the estimation limits placed on a ACE model, and controls the detail of output and graphics that are produced.  The Validation tab provides settings to evaluate the performance of ACE model prediction and to compare the results with a linear regression model (OLS, MINIMAX, L1, LOGIT, or PROBIT estimation).  The Results tab displays the input/output from the model estimation, diagnostics, and forecasting.  The Graphs tab displays a variety of high resolution graphics such as time series plots, residual plots, autocorrelation plots, surface plots, and others.

Once the SCAB34S program file is created by SCA WorkBench, you may save the file for future reference or make changes directly to the commands and re-execute the script from SCA WorkBench.

 

Model Specification Tab

 

This tab is central to specifying the variables and lagged components of the ACE model.  Use the dropdown combo boxes to select your dependent variable and predictor variables.  Click on the Add button to add a predictor variable component to the model.  A categorical variable can be added by putting a check in the Categorical checkbox before clicking on the Add button.  To allow a ACE model to be compared with a linear model, a categorical variable is automatically expanded into 0-1 binary variables which are then substituted in both the ACE and linear comparison model.  When a variable is added into the model, the component will appear in the Model Components grid as they are added.  In the example below, DAYLOAD is selected as the dependent variable.  TEMPERTR is selected as an independent variable with a contemporaneous effect and a lag 1 effect.  The lags are specified in the Lags textbox.  Multiple lags for explanatory variable components can be specified using the word “TO” to separate contiguous lags (e.g., 0 TO 1) or commas to separate non-contiguous lags (e.g., 0, 1, 3). 

 

A component may be deleted or modified by placing your cursor on the specific row of the Model Components grid and then by clicking on the Del or Edit buttons.  If you click on Edit, the Add button will be replaced by the Mod button.  You may make changes using the dropdown combo box for the independent variable and other components in the Specification frame.  Click on the Mod button to complete the modification.

 

 

The features of the Model specification tab are presented below.

 

Menu Item

Description

Specification Frame

This frame organizes various controls that you may use to specify ACE model components including the dependent variable, independent variables, and lag coefficients.

 

If a categorical variable is specified for an independent variable, the ACEFIT routine will automatically identify it as categorical when it is processed and expand it into 0-1 binary variables.

 

 

Dependent Variable

Use this drop-down list to specify the series that you wish to analyze.

 

 

Variable Type

The dependent and independent variables can be specified as Order, Linear, Categorical, or Logit.

 

Order- Specifies that the variable can be freely transformed as a continuous variable without constraint.

 

Linear – Specifies that the variable is not to be transformed.

 

Categorical – Specifies that the variable is coded as categorical (1,2,3,4,..). 

 

Logit - Specifies that the independent variable is a 0-1 variable.  When specified, the ACE model sets the independent variable as categorical but will display the confusion matrix and lift-gain tables as diagnostics. It will also allow comparison of the pseudo-logistic ACE model to a linear Logit, Probit, or OLS model.

 

 

Independent Variable

Use this drop-down list to specify a predictor or categorical variable components in the model.

 

 

Lags

Specifies the lag parameters associated with a random variables or categorical variables.  A categorical variable may contain more than one lag parameter; however only one lag specification may be added to the model at a time.  For random variables, multiple lag parameters may be added to the model as a group.  Multiple lags may be specified using the “TO” keyword to separate contiguous lags.  Individual lags may be separated by commas.  For example, the user could specify contiguous lags as “0, 1, 3” or as “0 TO 1, 3”.

 

 

Add

Clicking on Add appends a new component to the ACE model which is displayed in the model component grid.  Multiple instances of the same independent variable may be added to the model as long as the lag operators are unique.  For example, in the above form, the user could specify TEMPERTR{0} and TEMPERTR{1} components separately.

 

 

Model Components Frame

The model components frame organizes form controls to display the ACE model components in a grid format, as well as to edit and delete model components.

 

 

Model Component Grid

The components of the ACE model and their attributes are displayed in this grid.  The first column displays the independent variable name, the second column displays the individual or grouped lag operators within braces, the third column indicates whether the independent variable is defined as Ordered, Linear, or Categorical.  The fourth column indicates that a categorical variable is specified and that the number of unique categories will be determined by the program.

 

 

Edit

The user can modify a model component by first placing the mouse cursor on the grid row of interest and then clicking on the Edit button.  The Specification Frame will reflect the current attributes of the model component and the Add button will be replaced by the Mod button.  Make the necessary changes in the Specification Frame and then click on the Mod button to complete the changes.

 

 

Del

The user can delete a model component by placing the mouse cursor on the grid row of interest and then clicking on the Del button.

 

 

Clear

Clears all model components from the model component grid.

 

 

Save

Saves the information in  the model component grid to a specified tab-delimited file. 

 

 

Recall

Recalls the model component grid information from a specified tab-delimited file created (see Save option above).

 

 

Set Data Range Frame

This frame organizes form controls related to how the data is indexed (by date or none), and what data span is modeled and analyzed.

 

 

Date Variable

Use this drop-down list to specify the date variable associated with your series.  If your SCA Data Macro contains a variable named "DATE", it is automatically assigned by SCA WorkBench.

 

If you have an alternative index variable or date variable, you may select it from the drop-down list.  If your SCA Data Macro does not contain a DATE variable, leave the dropdown list empty.  WorkBench will then use the observation number as a date index.

 

If your time series is more than 10,000 observations, WorkBench will not use your DATE variable for indexing.  Instead, observation number will be used.

 

 

Begin Span

Use the Begin drop-down list to omit observations from the beginning of a time series being analyzed. 

 

 

End Span

Use the End drop-down list to omit observations from the back of a time series being analyzed.

 

 

Back

Depending on the tab you are currently working in, clicking on the Back button will move you one tab to the left.  If you are in the Model tab, you will move to the ACE Data Viewer dialog box where you may choose a new SCA data macro or leave the ACE Modeling Environment.

 

 

Exit

Exits the ACE modeling environment.

 

 

Execute

Executes ACE model estimation, validation, linear model comparison, diagnostics, and graphs by submitting a dynamically created program script to SCAB34S SPLINES.  When completed, you will automatically be placed in the Results tab.

 

 

 

Options Tab

 

The Options tab sets the estimation limits placed on a ACE model, controls the detail of output and graphics that is produced, and allocates the workspace size of the SCAB34S SPLINES product.  More estimation options are available in the ACEFIT matrix subroutine that are not exposed in this ACE Modeling Environment interface.  The user may employ these other options by directly editing the B34S script generated by WorkBench. 

 

 

 

Menu Item

Description

ACE Estimation Limits Frame

This frame organizes various controls that set options in ACE model estimation.  Here, the user specifies the termination tolerance and the maximum number of total iterations and terminal iterations.

 

 

Termination Threshold Tolerance

Set the minimum threshold for evaluating changes in the R-squared value. Default = .0001

 

 

Max. Iterations

Set the maximum number of iterations for the ACE model estimation

 

 

Max. Terminal Iterations

Set the maximum number of terminal iterations in which the R-squared value is evaluated as less than the termination threshold.

 

 

Diagnostics and Graphics Frame

This frame organizes controls related to the amount of output produced for ACE estimation and diagnostics.  The diagnostic charts option produces surface (or leverage) charts for all variables that are used in the final model. 

 

 

Display Output for Model

Typically, you want to see the ACE model summary and the OLS model summary. 

 

 

Display Forecast Table

The forecast table displays the original series and the predicted series for both the ACE model and OLS models.  This can slow down the display of output for larger datasets.

 

 

Show Diagnostic Tables

Several diagnostics are available for the dependent variable and the residuals from the estimated models.  Among the diagnostics are a statistical description tables, sample autocorrelation tables, and Hinich nonlinear testing.  The Hinich test wil only be displayed for residual series greater than 50 cases.

 

 

Show Graphics

Several graphics are created including time plot of the dependent variable, Actual vs. Predicted, ACF and PACF plots, and modified Q-Statistic plot.

 

 

Workspace Size

The SCAB34S SPLINES product requires its workspace size to be set when the program is initiated.  The default workspace is of 2000000 is adequate to handle moderate size datasets.  The user may increase the workspace size if needed.  Please note that workspace limit is imposed by the amount of available RAM memory of the computer.

 

 

 

Validation Tab

 

This tab allows you to evaluate the performance of ACE model prediction and validate the ACE model against a linear regression model method using simple OLS, MINIMAX, L1, Logit or Probit estimation.  A common problem with most nonlinear modeling methods is over-fitting.  Models that over-fit the data often perform well within the sample, but do substantially worse when predicting out of sample.  Comparing in-sample fit and out-of-sample prediction performance allows the user to evaluate problems related to over-fitting.  If over-fitting is suspected, the number of degrees of freedom for ACE smoothing should be reduced for one or more variables of concern.

 

 

 

The ACE modeling approach can be used effectively for both cross-sectional data and time series data.  The ACE user interface offered in WorkBench leverages its utility in time series applications by allowing the dependent variable and predictor variables to be lagged. 

 

The default validation setting compares the in-sample fit of the estimated ACE model against the in-sample fit of a simple OLS regression model.  All available observations are used to evaluate fit using root mean squared error (RMSE) and mean absolute percentage error (MAPE) criteria. 

 

Other options are available to validate the ACE model.  For example, if the user is primarily interested in evaluating the fit of the model in the later part of the series, a holdout sample can be specified by typing the number of observations (or percentage) to be marked from the back of the series.  After specifying the holdout, the user can evaluate in-sample fit for the “holdout period” only by setting the option “Include holdout in estimation (compare holdout only)”.  The user also has two choices to evaluate the prediction performance of the model where the holdout period is not used in training the model. 

 

As another validation criterion, the user can compare the improvement of a ACE model versus a regression model with the same right-hand side variables.  Diagnostics are produced for both the ACE  and regression models.  If the dependent variable is nonlinear in its response to the transformed (smoothed) regressor variables, the ACE model should reveal significant improvement in model fit and out-of-sample forecasting performance. 

 

A confusion matrix is produced for the pseudo-Logistic ACE model and the comparison linear model for evaluating classification power of the models.  The user has a choice for determining the probability cut-off value for classification of positive and negative cases for the final confusion matrix. The user can allow the system to set the probability cut-off automatically using the maximum G-MEAN values as the criteria, or using specific cut-off values.  If GMEAN1 is used, the cut-off will slightly favor True-Positive classifications and if GMEAN2 is used, the cut-off will consider equally True-Positive and True-Negative classifications.  Since the determination of cut-off  probability thresholds is subjective, a table of ratio statistics for a range of cut-off probability values is also provided in the output. 

 

 

Menu Item

Description

Validation Settings Frame

This frame organizes controls for specifying a holdout sample for forecast performance and model validation.  It also provides controls for the user to specify the type of validation for in-sample or out-of-sample forecasting.

 

 

# to holdout

Specifies the number of observations that are to be reserved from the back of the dependent variable for evaluating forecast performance.  The percentage of the holdout sample relative to the series length is computed and is displayed in % to holdout.

 

 

% to holdout

Specifies the size of the holdout sample as a percentage of the length of the dataset.  The actual number of observations reserved from the back of the series is computed and displayed in # to holdout.

 

 

Compare all obs

Evaluate the in-sample fit of the model for all observations.

 

 

Compare holdout only for in-sample fit

Evaluate the in-sample fit of the model for the defined holdout sample only

 

 

Compare holdout for out-of-sample fit

Evaluate the out-of-sample forecasts defined by the holdout sample.  The model is estimated using observations up to the first forecast origin only

 

 

OLS Method Comparison Frame

This frame organizes controls to validate the ACE model against a regression model with the same right-hand-side variables used in the ACE model. 

 

 

Logistic Method Comparsion Frame

This frame organizes controls to validate the logistic ACE model against a Logit or Probit model with the same right-hand side variables used in the Pseudo-logistic ACE model. 

 

 

Perform comparison

By default a comparison is made to ACE using a simple OLS regression estimation method if the dependent variable is random.  A comparison is not automatically performed if the dependent variable is specified as a logistical variable. 

 

 

OLS model

Estimates a regression model using the ordinary least squares (OLS) method.

 

 

MINIMAX model

Estimates a regression model using the MINIMAX method which

minimizes  .  This estimation method is more sensitive to outliers.

 

 

L1 model

Estimates a regression model using the L1 method which minimizes .  This estimation method is not as sensitive to outliers as OLS or MINIMAX.              

 

 

Logistic model

Estimates a logistic regression model in comparison to a Pseudo-logistic ACE model. 

 

 

Probit model

Estimates a probit regression model in comparison to a Pseudo-logistic ACE model.

 

 

Probability thresholds

The threshold values for classifying a predicted case as a positive or negative instance. 

 

 

 

Results Tab

 

The results tab provides a convenient facility to view output from ACE model estimation.  It also allows you to view the input commands for SCAB34S SPLINES execution.  If there are errors during estimation, you can view the log file for a detailed account of all commands executed and error messages.

 

After the user executes the ACE model application by clicking on the Execute button, SCAB34S SPLINES will display a graph of the actual versus fitted data.  This indicates that the ACEFIT procedure has completed.  The user should click anywhere on the graph (an example is shown below) to close it. 

 

yfit

 

After the graph disappears, the user will be placed on the Results tab of the ACE Modeling environment where the output is listed.

 

 

 

Menu Item

Description

View ACE Output File

Displays the ACE modeling results and tabulated diagnostics.

 

 

View ACE Input Commands

Displays the input commands submitted to SCAB34S SPLINES.  You can modify the commands directly in this window and submit the modified command file by clicking on the Execute button.

 

 

View ACE Log File

Displays a detailed command and error log for jobs submitted to SCAB34S SPLINES

 

 

Print

Send information displayed in the viewer to the printer.

 

 

Save

Saves the information in the viewer to a file.  You may want to use this feature to save the modeling script with intentions of executing it later from the System -> Run SCA with Macro menu, or the System -> Run SCAB34S Program File menu.

 

 

Execute

While you are in the Results tab, if you click on Execute, you will send the information in the viewer to SCAB34S SPLINES for processing. 

 

 

 

Graphs Tab

 

The Graphics tab provides a facility to view high-resolution plots that were generated.  If you previously selected the Show/Create Graphs option, the individual graphs will initially be displayed on screen.  When you click on the graph, the next generated graph will appear until all graphics have been created.  As the graphs are displayed, they are also being saved as Windows Meta Files using fixed names such as “yvar.wmf” or “acfa.wmf”. 

 

 

You can review all created graphic files by selecting the graph from the set of radio buttons provided in the small tabbed area to the left of the viewer control.  In the example above, we are viewing the time plot of the ACE residuals.  The name of the graphic file (resa.wmf) is displayed for reference.  Since the graphs are saved to fixed file names, they are overwritten each time you generate a new set of graphs from the ACE modeling environment.  If you wish to save the graphic file for future reference, please use the Save button on this tab to copy the file to a new name.  Please do not rename the file extension because the Save button only renames the file.  It does not convert it to a new format.  You can view those renamed files by using the Load Graph from File facility.  You may send the graph to the printer by clicking on the Print command button.  If you double-click on the graph image it will load in the external program that is associated with WMF files on your computer (e.g., Windows FAX/Picture Viewer).

 

If you elected to create diagnostic charts, surface plots of the transformed predictor variables in the ACE model are displayed relative to the independent variable.  Since a variable number of charts may be created based on number of predictor variables, the file names are sequenced from ACE___1 – ACE__## and may be viewed by selecting the file name from the list box provided.  An example of a surface chart is displayed below:

 

 

In the above graph, we are viewing the surface of transformed temperature.  The ACE*.wmf files are overwritten, therefore the file should be renamed or moved to another location if the graph is to be saved for future reference.

 

 



[1] The text, Specifying and Diagnostically Testing Econometric Models, by Houston H. Stokes Greenwood Press (1997) documents the basic B34S capability.  A comprehensive document covering the B34S matrix command facilities is under preparation.

[2] See Hastie-Tibshirani (1990, 176) for details. The discussion of ACE has been taken from this key reference with minor modifications.