Time series analysis and its applications have become increasingly important in various fields of research, such as business, economics, engineering, medicine, environometrics, social sciences, politics, and others. Since Box and Jenkins (1970, 1976) published the seminal book Time Series Analysis: Forecasting and Control, a number of books and a vast number of research papers have been published in this area. The goal of this book is to distill and integrate these research results into cohesive and comprehensible methodologies, and to provide a streamlined approach to time series analysis and forecasting.
The use of computers and computer software is essential in any modern quantitative analysis, even more so in time series analysis where complex algorithms and extensive computations are often required. With the speed and capacity of modern computers, in many situations it is preferable to adopt a methodology that simplifies the means of conducting an analysis even if it is at the expense of computation time. Using such an approach, we are able to provide simplified and effective methodologies for complex subjects in time series analysis and forecasting, as will be discussed in this book.
In this chapter, we shall first examine examples of time series data and introduce terminology in time series analysis. We then discuss applications and general principles of time series analysis.
In this chapter, we shall discuss a class of time series models known as autoregressive integrated moving average (ARIMA) models. This class of models has proved to be useful in representing both stationary and nonstationary time series. We first discuss the properties of ARIMA models and learn how to use these properties to build ARIMA models empirically. An ARIMA model may contain only an autoregressive (AR) term, only a moving average (MA) term, or both. We begin by examining useful characteristics of pure AR and pure MA models. Then the more complex mixed autoregressive moving average (ARMA) and ARIMA models will be discussed.
An important consideration when modeling time series is the principle of parsimony. This principle refers to representing the systematic structure of the series with as few parameters as possible. Essentially, this means simpler representations of a time series process are more desirable than more complex ones if both are adequate. This principle leads to the use of mixed ARMA models, rather than just pure AR or pure MA models. The principle of parsimony will be further appreciated when the common occurrence of outliers in time series is taken into consideration.
While a large number of interesting topics may be presented regarding the building and application of ARIMA models, we shall be concise and focus on topics that are more related to practical uses of such models.
In Chapter 2, we have examined the properties of stationary and nonstationary ARIMA models, and shown how these models can be used for forecasting. While this class of ARIMA models encompasses a wide variety of time series, it does not include time series which display repetitive behavior or periodic patterns. This repetitive nature, such as the change of temperature from season to season or the increase in retail sales at Christmas time year after year, is the essence of seasonal time series. This chapter examines seasonal time series models that are useful for such data.
The basic approach and methodology for the identification, estimation, diagnostic checking, and forecasting for seasonal time series models are similar to those developed in Chapter 2 for nonseasonal time series models. The primary difference is that for a seasonal time series, the model needs a seasonal ARIMA component in addition to a nonseasonal ARIMA component. This extension of ARIMA models, largely attributable to Box and Jenkins (1976), greatly increases the flexibility and usefulness of the models, but it also makes the identification of seasonal ARIMA models more complicated. A simplified model identification procedure that is effective for both manual and automatic approaches shall be presented in Chapter 4. In this chapter, we use traditional identification methods for modeling seasonal time series. We shall present theory and methods as simple as possible since the more complex issues can be easily addressed using the procedures to be discussed in Chapter 4.
In forecasting and analysis of time series data, it is well demonstrated that ARIMA and transfer function models (see Chapter 5) are very effective in handling practical applications. Vast advancements in both theory and methods in this area of research have been accomplished over the last several decades. Unfortunately these methods are not as widely used as they should, given the great advantage they offer. It seems that the complexity and often time consuming nature of the model building process imposed a barrier between the methodology and its use in main stream business and industrial applications. In this chapter, we introduce methodologies that may facilitate automatic or expert system modeling of univariate time series. These automatic ARIMA modeling capabilities can also be used in conjunction with transfer function models to accomplish automatic modeling when input or explanatory variables are included.
In addition to their own usefulness, ARIMA models play a very important role in forecasting since the forecasts based on ARIMA models may be regarded as baseline values for forecasting comparison. When forecasts are obtained using a more complicated model (such as a multi-variable or non-linear time series model), they are often compared with those generated by an ARIMA model. If the forecasts generated under a more complicated model are less accurate than those under an ARIMA model, it often signifies mis-specification in the more complicated model, or the existence of outliers in the series (see Chapter 7). With this in mind, an effective automatic ARIMA modeling capability plays an important role in forecasting no matter what methodology is eventually adopted.
In almost all time series books using ARIMA models, the AR operator(s) are typically placed on the left-hand side of the model. Such a model expression makes it difficult to provide an interpretable meaning to the constant term when it is present in the model. For a number of reasons, it it more advantageous to place the AR operator(s) on the right-hand side of the model (thus in a rational form) as discussed in this chapter. Beginning from this chapter, all ARIMA models will be expressed in a rational form in this book.
In this chapter, we first discuss algorithms and procedures that are useful to facilitate automatic identification of ARIMA models. The U.S. GNP series presented in Chapter 3 is used to illustrate these algorithms and procedures in more details.
Univariate ARIMA models are useful for analysis and forecasting of a single time series. In such situations, we can only relate the series to its own past and do not explicitly use the information contained in other pertinent time series. In many cases, however, a time series is not only related to its own past, but may also be influenced by the present and past values of other time series. In this chapter, we discuss models that can accommodate such situations. This class of models is referred to as transfer function models by Box and Jenkins (1976).
Transfer function models, which are extensions of familiar linear regression models, have been widely used in various fields of research. In macroeconomics, transfer function models can be used to study the dynamic interrelationships among the variables in an economic system. In marketing, these models are used to determine the factors, such as advertisement, competition, or economic conditions that may affect the sale of certain products. Transfer function models are also frequently used in environmental studies where we may be interested in how air and water pollution are affected by various environmental factors. We can then determine, among other things, the effectiveness of pollution control policy by using such models. A special case of transfer function models is called intervention models. This class of models is typically used as a means to assess the impact of a discrete intervention on a time series. Intervention analysis will be discussed in Chapter 7. Because of its close relationship with regression models, transfer function models are also referred to as dynamic regression models (see e.g., Pankratz, 1991) or simply time series regression models.
The most complicated task in transfer function modeling is the identification of the transfer function form for each input series, particularly if the transfer function model includes multiple-input variables. In this book, we employ the linear transfer function (LTF) method originated by Liu and Hanssens (1982) and further enhanced by Liu et al. (1983) and Liu and Hudak (1992). The LTF identification method can be used in the same manner no matter if the transfer function model has single-input or multiple-input variables. This method is more practical and easier to use than the cross correlation function (CCF) method discussed in Box and Jenkins (1976) and Box, Jenkins, and Reinsel (1994).
Transfer function models can be used to model single-output and multiple-output systems. In the case of a single-output model, only one equation is required to describe the system. It is referred to as a single-equation transfer function model. A multiple-output transfer function model is referred to as a multi-equation transfer function model or a simultaneous transfer function (STF) model (Wall, 1976; Liu et. al. 1983; Liu and Hudak, 1985, and Liu, 1987). A more complete description of modeling and forecasting using multi-equation models can be found in Liu (1997). In this chapter, we shall only discuss identification, estimation, diagnostic checking, and applications of single-equation transfer function models.
Many economic and business data are compiled each month and are available as monthly time series. Such time series may be subject to two kinds of calendar effects. First, the levels of economic or business activities may change depending on the day in a week. Since the composition of days of the week varies from month to month and year to year, the observed series may be affected by such variation. Such effects, particularly due to the composition of trading days (or work days) in each month, are referred to as trading day effects (Hillmer, Bell and Tiao, 1981; Hillmer, 1982; and Bell and Hillmer, 1983). In addition to trading day variation, some traditional festivals or holidays (e.g., Easter, Chinese New Year and Jewish Passover) are set according to lunar calendars and the dates of such holidays may vary between two adjacent months in the Gregorian calendar from year to year. Since business activities and consumer behavior patterns may be greatly affected by such holidays, the observed time series may vary substantially depending on whether a particular month contains such holidays or not. Such effects are referred to as holiday effects (Liu, 1980, 1986).
Cleveland and Devlin (1980, 1982) point to the existence of calendar effects in economic data and propose some adjustment methods. Liu (1980) studies the effect of holiday variation on the identification and estimation of ARIMA models and suggests modifications of ARIMA models by including holiday information as deterministic input variables. Hillmer, Bell and Tiao (1981), Hillmer (1982), Cleveland and Grupe (1981), and Bell and Hillmer (1983) propose models to handle trading day variation in ARIMA modeling. The background for much of this work is contained in Young (1965), which describes the relevant properties of the calendar and gives the formulas for the calendar adjustment in the X-11 computer program for seasonal adjustment (Shiskin, Young and Musgrave, 1967).
The simplest way to handle time series with calendar effects is to treat it as a special case of transfer function models discussed in Chapter 5. The major difference is that in the former situation, the input series are deterministic and known in advance. Since model identification is the most difficult task in modeling such time series, we shall address this issue in this chapter. The LTF identification method (Liu and Hanssens, 1982) presented in Chapter 5 shall be employed here.
Time series are frequently affected by external events such as strikes, sales promotions, advertising, policy changes, new laws or regulations, and so forth. When these external events are known and are interests of study, they are commonly referred to as intervention events. When the events or the timings of the events are unknown, they are often referred to as outliers if the events have large impact on the time series.
In this chapter we first describe the method of intervention analysis (also referred as impact study) which can be used to evaluate the effect of the external events, or to incorporate the interventions into a time series model to possibly improve parameter estimates or forecasts. We shall also discuss how to detect outliers and adjust their effects in time series modeling and forecasting. It is worth noting that appropriate intervention analysis almost always requires adjustment of outlier effects rendering outlier detection and estimation an integral part of intervention analysis.
There are many possible ways to forecast a time series. The main emphasis of forecasting techniques presented thus far is on the methods explicitly based on time series models such as ARIMA and transfer function models. Various ad hoc methods, including those using moving averages and weighted smoothing, had been in use long before model-based forecasting methods were widely accepted. We defer the presentation of these traditional forecasting methods until this chapter in order to better relate the traditional forecasting methods with the model-based methods. With the understanding of their relationships, we can better understand the strength and limitation of traditional forecasting methods, and consider the direction of improvement for forecasting in using such methodology.
Some traditional forecasting methods were developed based on statistical theory, while most others were developed mainly based on empirical experiences. These methods share a similar characteristic. That is, the forecasts are based essentially on smoothing (averaging) past values of a time series using some type of weighting scheme. This chapter will describe three types of forecasting methods: naive, averaging, and smoothing. Naive methods are employed assuming that recent periods are the best predictors of the future. Averaging methods are developed based on an average of weighted observations. Smoothing methods are based on averaging past values of a series in a decreasing (exponential) manner. The last method will be the primary focus of this chapter.
The modern economy has become more and more information-based. This has profoundly altered the environment in which businesses and other organizations operate. Hence it has also altered the way in which business operations and business data are collected and analyzed. Given the widespread use of information technology, a large number of data are collected in on-line, real-time environments, which results in massive amounts of data. Such time-ordered data typically can be aggregated with an appropriate time interval, yielding a large volume of equally spaced time series data. This kind of data can be explored and analyzed using many useful tools and methodologies developed in modern time series analysis. Such practice, known as time series data mining, will be discussed in this chapter. As retail scanning systems, point-of-sale (POS) systems, and more recently on-line transactions through electronic commerce, become indispensable in business operations, mining such time series data will also become an integral part of effective business operation.
The methodologies of time series analysis and forecasting using ARIMA and transfer function models developed in the previous chapters are all useful for time series data mining, particularly automatic time series modeling and outlier detection.
The efficacy of statistical models is often enhanced through the use of data transformation, analysis and forecasting using time series models are no exception. With an appropriate transformation on a time series, the model for the series may be simplified; the intervention effects may be better estimated; and the forecasts of future values may be improved.
Most of the statistical methods assume that the variables are normally distributed. A data transformation is a useful tool to achieve Normality for the variables under study. However the mathematical modification of the data in this manner raises issues not only for the interpretation of the modeling results, but also the usefulness of forecasts based on the transformed data. In this chapter, we are particularly interested in the application of power transformations to improve forecasting accuracy when forecasts are retransformed back into original metric.
There are two primary issues in the application of power transformation. The first is the selection of an appropriate lambda value that will either improve the efficacy of the model or improve the accuracy of the forecasts. The second issue involves the correction of biases induced by the forecasts of the transformed series. Both issues are addressed in this chapter.
In conventional time series and econometric models, the variance of random shocks (also referred to as innovations) is typically assumed to be constant. However, many economic and financial time series often exhibit periods of unusually high volatility followed by periods of relative tranquility. In such situations, the assumption of a constant variance is inappropriate. Engle (1982, 1995), Bollerslev (1986), Bollerslev and Ghysels (1996) and others developed a class of models that address such concerns and allow for modeling both the level (the first moment) and the variance (the second moment) of a process. This class of models is referred to as conditional heteroscedastic models. In this chapter, the term volatility refers to a measure associated with either the conditional variance or the conditional standard deviation of a process.
Volatility is an important concept not just in theory, but also in practice in financial markets. It is one of the primary factors in the determination of option prices for stocks and stock indexes. Since the combined option markets is larger than the combined stock markets in the United States, it is easy to understand the interest and the importance of modeling volatility in financial markets. In addition to its use in option pricing, volatility is very important in financial risk management and asset allocation. Finally, modeling the volatility of a time series may improve the efficiency of the estimates of model parameters as well as the accuracy of interval forecast.
The conditional heteroscedastic models discussed in this chapter include the autoregressive conditional heteroscedastic (ARCH) model of Engle (1982), the generalized ARCH (GARCH) model of Bollerslev (1986), the GARCH-M model of Engle, Lilien, and Robins (1987), the exponential GARCH (EGARCH) model of Nelson (1991), and a variety of threshold GARCH models. Useful literature in this area of research can be found in the review articles by Bollerslev, Chou, and Kroner (1992), and Bollerslev, Engle, and Nelson (1999). We shall discuss strength and weaknesses of each model and show applications of the models.
Statistical models are used to capture homogeneous patterns or relationships that may exist in the data. For univariate time series data, certain periodic groupings may behave more similarly or homogenously than others. Examples of periodic groupings may be defined by months of the year, days of the week, or hours of the day, where behavior of the data follow similar autocorrelation and mean (or trend) patterns. In such a situation, we may develop a model that optimizes its efficacy for data in such periodic groupings. Similarly for multi-variable time series data, a time series may be influenced by another time series differently depending upon the values of one or more explanatory series. For example, housing markets may be influenced by interest rates. However the influence of interest rates on the housing markets may be different depending upon whether the interest rates are increased or decreased. In such a case, it may be more appropriate to have separate models to represent such potential asymmetric behavior. In regression analysis, we can accomplish this by simply deleting unnecessary portions of the data during model estimation or by employing a piecewise regression model. In time series modeling, however, data cannot be arbitrarily deleted during model estimation due to the existence of serial correlation or seasonality. In this chapter, we introduce a weighted estimation method to facilitate the practice of omitting (or discounting) data in ARIMA and transfer function modeling. A special application of the weighted estimation method is to facilitate a more general approach for the modeling and forecasting of threshold autoregressive (TAR) models.In time series analysis, it is not uncommon for the pattern or the relationship of the time series to be temporarily disrupted by outliers or structural changes. If the disruptions are isolated and not exceedingly large, outlier detection and adjustment methods discussed in Chapter 7 are sufficient to correct for the biases caused by such disruptions. However if the disruptions are clustered together or if their atypical effects are persistent over a period of time, it may be more appropriate to discount or disregard those portions of the data in time series modeling. The weighted estimation method can be useful in such situations as well.
In Chapter 1 through Chapter 10, we mainly focus on how to represent a univariate time series using a linear ARIMA model. Time series encountered in practice, however, may not always exhibit characteristics of a linear process. Thus, in this book we explore various nonlinear models that may enhance the efficacy and usefulness of a time series model.
In Chapter 11, we study ARCH and GARCH models for conditional heteroscedastic time series. The ARCH/GARCH models are a special class of nonlinear time series models which are particularly useful for an economic and financial time series with its conditional variances (volatility) depending on the past innovations of the series. Except for the GARCH-M models, in which the mean level depends on the conditional variance, other ARCH/GARCH models are nonlinear in variance, but still linear in mean.
In Chapter 12, we discuss time series analysis using linear ARIMA models and the weighted estimation method. The threshold autoregressive (TAR) model is a special case under such an application. A TAR model can be viewed as a piecewise linear approximation of a nonlinear AR model for a time series. This class of nonlinear models has been discussed extensively in a number of literature, see e.g., Tong (1978, 1983, 1990), Tong and Lim (1980), Maravell (1983), Tsay (1986, 1989), Tiao and Tsay (1994), Chen and Tsay (1991, 1993), and shown to be useful in time series analysis and related applications. In Chapter 12, we have addressed the issue of TAR model estimation by using the weighted estimation method. In this chapter, we shall explore the nonlinearity test of a time series and the identification of a TAR model for a time series.
Analyses of business, economic, environmental or industrial time series often require that we consider modeling several time series jointly. In some situations, it is appropriate to assume that the relationship(s) between the input variable(s) and the output variable are unidirectional. That is, we may be able to assume that there is no feedback relationship. For example, it is reasonable to assume that crude oil prices affect the prices of gasoline, but gasoline prices should not affect crude oil prices (at least in the short term). Similarly, the temperature recorded in an attic are affected by the outside temperature, but not conversely. When the assumption of unidirectional relationships can be justified, it may be appropriate to employ transfer function models for time series analyses.
However, in many applications a unidirectional assumption may not be appropriate. Physical laws may dictate the consideration of interrelationships in the case of industrial or environmental data. In the case of business and economic data, there may not be sufficient theoretical understanding to establish any a priori causality. For example, it is difficult to postulate the dynamic relationships between major economic variables such as money supply, interest rate, inflation, producer price and industrial production using only economic theory. In fact, when studying such variables, a primary objective of a time series analysis may be to understand the causal relationships among the variables of the system.
In this chapter, we consider vector autoregressive-moving average models, also referred to as vector ARMA or VARMA models. This class of models allows for general dynamic relationships among variables in a system. Therefore vector ARMA models may be more adequate to represent the dynamic relationships among series of interest and provide more accurate forecasts than those obtained from univariate ARIMA models. As will be shown, the vector ARMA model is a multivariate analogue of the univariate ARIMA model.
Simultaneous transfer function models are natural extensions of classical econometric models using simultaneous equation systems. The primary difference is that the disturbance term for the classical econometric models are often assumed to be white noise, while STF models allow the disturbance terms to follow an ARMA process. Unlike vector ARMA models, STF models allow for both reduced form and structural form, as well as allow for explicit incorporation of exogenous variables. The latter is an important flexibility. In many situations in practice, certain variables are clearly exogenous and the inclusion of such variables in a vector ARMA model will either result in an overly complicated model, or even worse, an unattainable model. For example, as discussed in Chapter 6, trading-day and moving-holiday (e.g., Easter holiday) effects are important in the development of a forecasting model. These effects can be readily incorporated into an STF model, while it cannot be accomplished easily using a vector ARMA model. Similarly, intervention effects can easily be accommodated in an STF model, but not in a vector ARMA model.
Since classical econometric models using simultaneous equation systems is a special case of STF models, many issues related to identification, estimation, and forecasting using simultaneous equation systems are also present in the application of STF models. We shall address some of these issues in this chapter, particularly those issues more relevant to forecasting.
Knowledge of the dynamic relationships among economic variables is essential to economic and business research. Traditionally, the structure of economic relationships is specified according to economic theories while the parameters are estimated by statistical methods. Testing the existence of such relationships consist of (a) choosing an alternative hypothesis or theory, and (b) comparing the maintained hypothesis (often specified by a prior theory) versus the alternative. As suggested by Sims (1980) and Leamer (1983, 1985), when a model is specified according to incorrect theories, the estimation will be biased and the empirical results will be useless for economic applications. Moreover, competing economic theories that imply an antithetical structure of economic relationships may be nonetheless plausible on logical grounds. Choosing among competing theories is therefore an empirical issue. Thus it is necessary to develop statistical procedures for detecting dynamic relationships when there are several hypotheses being considered.
In this chapter, we examine the dynamic relationships between two economic variables following the causality concept proposed by Granger (1969). Most early empirical works on Granger causality applied Sims' test methodology (Sims, 1972). Haugh (1976) suggested an alternative method for such empirical inquiries. Chen and Lee (1990) considered a vector ARMA (VARMA) test. They find that many controversies associated with Granger causality are at least partially attributable to the differences in the choice of the alternative hypotheses and the specific test statistics. The traditional hypothesis testing procedure provides a framework to contrast a null hypothesis versus an alternative hypothesis. Without imposing a priori restrictions, an empirical study of dynamic relationships often involves several non-nested hypotheses. Consequently, a more systematic approach is required to examine the multiple hypotheses so that the test conclusion is not affected by the a priori choice of the alternative. The VARMA test developed by Chen and Lee (1990) addresses these issues very adequately and is the focus of causality testing discussed in this chapter.
This chapter is written based on the research results provided by Chung Chen and other collaborators.