Chapter 1. Introduction to Time Series Analysis and Forecasting
Time series analysis and its applications have become increasingly important in various fields
of research, such as business, economics, engineering, medicine, environometrics, social sciences,
politics, and others. Since Box and Jenkins (1970, 1976) published the seminal book
Time Series Analysis: Forecasting and Control, a number of books and a vast number of research
papers have been published in this area. The goal of this book is to distill and integrate these
research results into cohesive and comprehensible methodologies, and to provide a streamlined
approach to time series analysis and forecasting.
The use of computers and computer software is essential in any modern quantitative analysis,
even more so in time series analysis where complex algorithms and extensive computations are
often required. With the speed and capacity of modern computers, in many situations it is
preferable to adopt a methodology that simplifies the means of conducting an analysis even if
it is at the expense of computation time. Using such an approach, we are able to provide
simplified and effective methodologies for complex subjects in time series analysis and
forecasting, as will be discussed in this book.
In this chapter, we shall first examine examples of time series data and
introduce terminology in time series analysis. We then discuss applications and general
principles of time series analysis.
Chapter 2. Autoregressive Integrated Moving Average Models
In this chapter, we shall discuss a class of time series models known as autoregressive integrated
moving average (ARIMA) models. This class of models has proved to be useful in representing both
stationary and nonstationary time series. We first discuss the properties of ARIMA models and
learn how to use these properties to build ARIMA models empirically. An ARIMA model may contain
only an autoregressive (AR) term, only a moving average (MA) term, or both. We begin
by examining useful characteristics of pure AR and pure MA models. Then the more complex mixed
autoregressive moving average (ARMA) and ARIMA models will be discussed.
An important consideration when modeling time series is the principle of parsimony.
This principle refers to representing the systematic structure of the series with as few parameters
as possible. Essentially, this means simpler representations of a time series process are more desirable
than more complex ones if both are adequate. This principle leads to the use of mixed ARMA models,
rather than just pure AR or pure MA models. The principle of parsimony will be further appreciated
when the common occurrence of outliers in time series is taken into consideration.
While a large number of interesting topics may be presented regarding the building and application
of ARIMA models, we shall be concise and focus on topics that are more related to practical uses of
such models.
Chapter 3. Seasonal ARIMA Models
In Chapter 2, we have examined the properties of stationary and nonstationary ARIMA models, and
shown how these models can be used for forecasting. While this class of ARIMA models encompasses
a wide variety of time series, it does not include time series which display repetitive behavior
or periodic patterns. This repetitive nature, such as the change of temperature from season to
season or the increase in retail sales at Christmas time year after year, is the essence of
seasonal time series. This chapter examines seasonal time series models that are useful for
such data.
The basic approach and methodology for the identification, estimation, diagnostic checking, and
forecasting for seasonal time series models are similar to those developed in Chapter 2 for
nonseasonal time series models. The primary difference is that for a seasonal time series, the
model needs a seasonal ARIMA component in addition to a nonseasonal ARIMA component. This extension
of ARIMA models, largely attributable to Box and Jenkins (1976), greatly increases the flexibility
and usefulness of the models, but it also makes the identification of seasonal ARIMA models more
complicated. A simplified model identification procedure that is effective for both manual and
automatic approaches shall be presented in Chapter 4. In this chapter, we use traditional
identification methods for modeling seasonal time series. We shall present theory and methods as
simple as possible since the more complex issues can be easily addressed using the procedures to be
discussed in Chapter 4.
Chapter 4. ARIMA Modeling Using Expert Systems
In forecasting and analysis of time series data, it is well demonstrated that ARIMA and transfer
function models (see Chapter 5) are very effective in handling practical applications. Vast
advancements in both theory and methods in this area of research have been accomplished over the
last several decades. Unfortunately these methods are not as widely used as they should, given
the great advantage they offer. It seems that the complexity and often time consuming nature of
the model building process imposed a barrier between the methodology and its use in main stream
business and industrial applications. In this chapter, we introduce methodologies that may
facilitate automatic or expert system modeling of univariate time series. These automatic ARIMA
modeling capabilities can also be used in conjunction with transfer function models to accomplish
automatic modeling when input or explanatory variables are included.
In addition to their own usefulness, ARIMA models play a very important role in forecasting since
the forecasts based on ARIMA models may be regarded as baseline values for forecasting comparison.
When forecasts are obtained using a more complicated model (such as a multi-variable or non-linear
time series model), they are often compared with those generated by an ARIMA model. If the forecasts
generated under a more complicated model are less accurate than those under an ARIMA model, it often
signifies mis-specification in the more complicated model, or the existence of outliers in the series
(see Chapter 7). With this in mind, an effective automatic ARIMA modeling capability plays an
important role in forecasting no matter what methodology is eventually adopted.
In almost all time series books using ARIMA models, the AR operator(s) are typically placed on the
left-hand side of the model. Such a model expression makes it difficult to provide an interpretable
meaning to the constant term when it is present in the model. For a number of reasons, it it more
advantageous to place the AR operator(s) on the right-hand side of the model (thus in a rational form)
as discussed in this chapter. Beginning from this chapter, all ARIMA models will be expressed in a
rational form in this book.
In this chapter, we first discuss algorithms and procedures that are useful to facilitate automatic
identification of ARIMA models. The U.S. GNP series presented in Chapter 3 is used to illustrate
these algorithms and procedures in more details.
Chapter 5. Transfer Function Models
Univariate ARIMA models are useful for analysis and forecasting of a single time series. In such
situations, we can only relate the series to its own past and do not explicitly use the information
contained in other pertinent time series. In many cases, however, a time series is not only related
to its own past, but may also be influenced by the present and past values of other time series.
In this chapter, we discuss models that can accommodate such situations. This class of models is
referred to as transfer function models by Box and Jenkins (1976).
Transfer function models, which are extensions of familiar linear regression models, have been
widely used in various fields of research. In macroeconomics, transfer function models can be used
to study the dynamic interrelationships among the variables in an economic system. In marketing,
these models are used to determine the factors, such as advertisement, competition, or economic
conditions that may affect the sale of certain products. Transfer function models are also frequently
used in environmental studies where we may be interested in how air and water pollution are affected
by various environmental factors. We can then determine, among other things, the effectiveness of
pollution control policy by using such models. A special case of transfer function models is called
intervention models. This class of models is typically used as a means to assess the impact of a
discrete intervention on a time series. Intervention analysis will be discussed in Chapter 7.
Because of its close relationship with regression models, transfer function models are also referred
to as dynamic regression models (see e.g., Pankratz, 1991) or simply
time series regression models.
The most complicated task in transfer function modeling is the identification of the transfer
function form for each input series, particularly if the transfer function model includes
multiple-input variables. In this book, we employ the linear transfer function (LTF) method
originated by Liu and Hanssens (1982) and further enhanced by Liu et al. (1983) and Liu and
Hudak (1992). The LTF identification method can be used in the same manner no matter if the
transfer function model has single-input or multiple-input variables. This method is more
practical and easier to use than the cross correlation function (CCF) method discussed in
Box and Jenkins (1976) and Box, Jenkins, and Reinsel (1994).
Transfer function models can be used to model single-output and multiple-output systems.
In the case of a single-output model, only one equation is required to describe the system.
It is referred to as a single-equation transfer function model. A multiple-output transfer function
model is referred to as a multi-equation transfer function model or a simultaneous transfer
function (STF) model (Wall, 1976; Liu et. al. 1983; Liu and Hudak, 1985, and Liu, 1987).
A more complete description of modeling and forecasting using multi-equation models can be found
in Liu (1997). In this chapter, we shall only discuss identification, estimation, diagnostic
checking, and applications of single-equation transfer function models.
Chapter 6. Analysis of Time Series with Calendar Effects
Many economic and business data are compiled each month and are available as monthly
time series. Such time series may be subject to two kinds of calendar effects.
First, the levels of economic or business activities may change depending on the day in
a week. Since the composition of days of the week varies from month to month and year
to year, the observed series may be affected by such variation. Such effects, particularly
due to the composition of trading days (or work days) in each month, are referred to as
trading day effects (Hillmer, Bell and Tiao, 1981; Hillmer, 1982; and Bell and Hillmer, 1983).
In addition to trading day variation, some traditional festivals or holidays (e.g., Easter,
Chinese New Year and Jewish Passover) are set according to lunar calendars and the dates of
such holidays may vary between two adjacent months in the Gregorian calendar from year to
year. Since business activities and consumer behavior patterns may be greatly affected
by such holidays, the observed time series may vary substantially depending on whether a
particular month contains such holidays or not. Such effects are referred to as holiday
effects (Liu, 1980, 1986).
Cleveland and Devlin (1980, 1982) point to the existence of calendar effects in economic
data and propose some adjustment methods. Liu (1980) studies the effect of holiday variation
on the identification and estimation of ARIMA models and suggests modifications of ARIMA
models by including holiday information as deterministic input variables. Hillmer, Bell
and Tiao (1981), Hillmer (1982), Cleveland and Grupe (1981), and Bell and Hillmer (1983)
propose models to handle trading day variation in ARIMA modeling. The background for
much of this work is contained in Young (1965), which describes the relevant properties of
the calendar and gives the formulas for the calendar adjustment in the X-11 computer program
for seasonal adjustment (Shiskin, Young and Musgrave, 1967).
The simplest way to handle time series with calendar effects is to treat it as a special
case of transfer function models discussed in Chapter 5. The major difference is that in the
former situation, the input series are deterministic and known in advance. Since model
identification is the most difficult task in modeling such time series, we shall address
this issue in this chapter. The LTF identification method (Liu and Hanssens, 1982) presented
in Chapter 5 shall be employed here.
Chapter 7. Intervention Analysis and Outlier Detection
Time series are frequently affected by external events such as strikes, sales promotions,
advertising, policy changes, new laws or regulations, and so forth. When these external
events are known and are interests of study, they are commonly referred to as
intervention events. When the events or the timings of the events are unknown, they are
often referred to as outliers if the events have large impact on the time series.
In this chapter we first describe the method of intervention analysis (also referred
as impact study) which can be used to evaluate the effect of the external events,
or to incorporate the interventions into a time series model to possibly improve parameter
estimates or forecasts. We shall also discuss how to detect outliers and adjust their
effects in time series modeling and forecasting. It is worth noting that appropriate
intervention analysis almost always requires adjustment of outlier effects rendering outlier
detection and estimation an integral part of intervention analysis.
Chapter 8. Forecasting Using Exponential Smoothing Methods
There are many possible ways to forecast a time series. The main emphasis of forecasting
techniques presented thus far is on the methods explicitly based on time series models such
as ARIMA and transfer function models. Various ad hoc methods, including those using moving
averages and weighted smoothing, had been in use long before model-based forecasting methods
were widely accepted. We defer the presentation of these traditional forecasting methods
until this chapter in order to better relate the traditional forecasting methods with the
model-based methods. With the understanding of their relationships, we can better understand
the strength and limitation of traditional forecasting methods, and consider the direction of
improvement for forecasting in using such methodology.
Some traditional forecasting methods were developed based on statistical theory, while most
others were developed mainly based on empirical experiences. These methods share a similar
characteristic. That is, the forecasts are based essentially on smoothing (averaging) past
values of a time series using some type of weighting scheme. This chapter will describe three
types of forecasting methods: naive, averaging, and smoothing. Naive methods are employed
assuming that recent periods are the best predictors of the future. Averaging methods are
developed based on an average of weighted observations. Smoothing methods are based on averaging
past values of a series in a decreasing (exponential) manner. The last method will be the
primary focus of this chapter.
Chapter 9. Time Series Data Mining
The modern economy has become more and more information-based. This has profoundly altered
the environment in which businesses and other organizations operate. Hence it has also
altered the way in which business operations and business data are collected and analyzed.
Given the widespread use of information technology, a large number of data are collected
in on-line, real-time environments, which results in massive amounts of data. Such
time-ordered data typically can be aggregated with an appropriate time interval,
yielding a large volume of equally spaced time series data. This kind of data can be
explored and analyzed using many useful tools and methodologies developed in modern
time series analysis. Such practice, known as time series data mining, will be discussed
in this chapter. As retail scanning systems, point-of-sale (POS) systems, and more
recently on-line transactions through electronic commerce, become indispensable in
business operations, mining such time series data will also become an integral part
of effective business operation.
The methodologies of time series analysis and forecasting using ARIMA and transfer
function models developed in the previous chapters are all useful for time series data
mining, particularly automatic time series modeling and outlier detection.
Chapter 10. Power Transformations and Forecasting
The efficacy of statistical models is often enhanced through the use of data transformation,
analysis and forecasting using time series models are no exception. With an appropriate
transformation on a time series, the model for the series may be simplified; the
intervention effects may be better estimated; and the forecasts of future values may
be improved.
Most of the statistical methods assume that the variables are normally distributed. A
data transformation is a useful tool to achieve Normality for the variables under study.
However the mathematical modification of the data in this manner raises issues not only
for the interpretation of the modeling results, but also the usefulness of forecasts based
on the transformed data. In this chapter, we are particularly interested in the application
of power transformations to improve forecasting accuracy when forecasts are retransformed
back into original metric.
There are two primary issues in the application of power transformation. The first is the selection
of an appropriate lambda value that will either improve the efficacy of the model or improve the
accuracy of the forecasts. The second issue involves the correction of biases induced by the
forecasts of the transformed series. Both issues are addressed in this chapter.
Chapter 11. Time Series Models with Heteroscedasticity
In conventional time series and econometric models, the variance of random shocks (also referred to as
innovations) is typically assumed to be constant. However, many economic and financial time series
often exhibit periods of unusually high volatility followed by periods of relative tranquility. In such
situations, the assumption of a constant variance is inappropriate. Engle (1982, 1995), Bollerslev (1986),
Bollerslev and Ghysels (1996) and others developed a class of models that address such concerns and allow
for modeling both the level (the first moment) and the variance (the second moment) of a process. This
class of models is referred to as conditional heteroscedastic models. In this chapter, the term volatility
refers to a measure associated with either the conditional variance or the conditional standard deviation of
a process.
Volatility is an important concept not just in theory, but also in practice in financial markets. It is one
of the primary factors in the determination of option prices for stocks and stock indexes. Since the combined
option markets is larger than the combined stock markets in the United States, it is easy to understand the interest
and the importance of modeling volatility in financial markets. In addition to its use in option pricing, volatility
is very important in financial risk management and asset allocation. Finally, modeling the volatility of a time
series may improve the efficiency of the estimates of model parameters as well as the accuracy of interval forecast.
The conditional heteroscedastic models discussed in this chapter include the autoregressive conditional
heteroscedastic (ARCH) model of Engle (1982), the generalized ARCH (GARCH) model of Bollerslev (1986), the
GARCH-M model of Engle, Lilien, and Robins (1987), the exponential GARCH (EGARCH) model of Nelson (1991), and a
variety of threshold GARCH models. Useful literature in this area of research can be found in the review articles
by Bollerslev, Chou, and Kroner (1992), and Bollerslev, Engle, and Nelson (1999). We shall discuss strength and
weaknesses of each model and show applications of the models.
Chapter 12. Segmented Time Series Modeling and Forecasting
Statistical models are used to capture homogeneous patterns or relationships that may exist
in the data. For univariate time series data, certain periodic groupings may behave more
similarly or homogenously than others. Examples of periodic groupings may be defined by
months of the year, days of the week, or hours of the day, where behavior of the data
follow similar autocorrelation and mean (or trend) patterns. In such a situation, we may
develop a model that optimizes its efficacy for data in such periodic groupings. Similarly
for multi-variable time series data, a time series may be influenced by another time series
differently depending upon the values of one or more explanatory series. For example, housing
markets may be influenced by interest rates. However the influence of interest rates on the
housing markets may be different depending upon whether the interest rates are increased or
decreased. In such a case, it may be more appropriate to have separate models to represent
such potential asymmetric behavior. In regression analysis, we can accomplish this by simply
deleting unnecessary portions of the data during model estimation or by employing a piecewise
regression model. In time series modeling, however, data cannot be arbitrarily deleted during
model estimation due to the existence of serial correlation or seasonality. In this chapter,
we introduce a weighted estimation method to facilitate the practice of omitting (or discounting)
data in ARIMA and transfer function modeling. A special application of the weighted estimation
method is to facilitate a more general approach for the modeling and forecasting of threshold
autoregressive (TAR) models.
In time series analysis, it is not uncommon for the pattern or the relationship of the time
series to be temporarily disrupted by outliers or structural changes. If the disruptions are
isolated and not exceedingly large, outlier detection and adjustment methods discussed in Chapter 7
are sufficient to correct for the biases caused by such disruptions. However if the disruptions are
clustered together or if their atypical effects are persistent over a period of time, it may be more
appropriate to discount or disregard those portions of the data in time series modeling. The weighted
estimation method can be useful in such situations as well.
Chapter 13. Nonlinear Time Series Models
In Chapter 1 through Chapter 10, we mainly focus on how to represent a univariate time series using a linear
ARIMA model. Time series encountered in practice, however, may not always exhibit characteristics of a
linear process. Thus, in this book we explore various nonlinear models that may enhance the efficacy and
usefulness of a time series model.
In Chapter 11, we study ARCH and GARCH models for conditional heteroscedastic time series. The ARCH/GARCH
models are a special class of nonlinear time series models which are particularly useful for an economic
and financial time series with its conditional variances (volatility) depending on the past innovations
of the series. Except for the GARCH-M models, in which the mean level depends on the conditional variance,
other ARCH/GARCH models are nonlinear in variance, but still linear in mean.
In Chapter 12, we discuss time series analysis using linear ARIMA models and the weighted estimation method.
The threshold autoregressive (TAR) model is a special case under such an application. A TAR model can be
viewed as a piecewise linear approximation of a nonlinear AR model for a time series. This class of nonlinear
models has been discussed extensively in a number of literature, see e.g., Tong (1978, 1983, 1990), Tong and
Lim (1980), Maravell (1983), Tsay (1986, 1989), Tiao and Tsay (1994), Chen and Tsay (1991, 1993), and shown
to be useful in time series analysis and related applications. In Chapter 12, we have addressed the issue
of TAR model estimation by using the weighted estimation method. In this chapter, we shall explore the
nonlinearity test of a time series and the identification of a TAR model for a time series.
Chapter 14. Multivariate Time Series Analysis and Forecasting Using Vector ARMA Models
Analyses of business, economic, environmental or industrial time series often require that we consider
modeling several time series jointly. In some situations, it is appropriate to assume that the relationship(s)
between the input variable(s) and the output variable are unidirectional. That is, we may be able to assume
that there is no feedback relationship. For example, it is reasonable to assume that crude oil prices affect
the prices of gasoline, but gasoline prices should not affect crude oil prices (at least in the short term).
Similarly, the temperature recorded in an attic are affected by the outside temperature, but not conversely.
When the assumption of unidirectional relationships can be justified, it may be appropriate to employ transfer
function models for time series analyses.
However, in many applications a unidirectional assumption may not be appropriate. Physical laws may dictate
the consideration of interrelationships in the case of industrial or environmental data. In the case of business
and economic data, there may not be sufficient theoretical understanding to establish any a priori causality.
For example, it is difficult to postulate the dynamic relationships between major economic variables such as
money supply, interest rate, inflation, producer price and industrial production using only economic theory.
In fact, when studying such variables, a primary objective of a time series analysis may be to understand the
causal relationships among the variables of the system.
In this chapter, we consider vector autoregressive-moving average models, also referred to as vector ARMA
or VARMA models. This class of models allows for general dynamic relationships among variables in a system.
Therefore vector ARMA models may be more adequate to represent the dynamic relationships among series of
interest and provide more accurate forecasts than those obtained from univariate ARIMA models. As will be shown,
the vector ARMA model is a multivariate analogue of the univariate ARIMA model.
Chapter 15. Multivariate Time Series Analysis and Forecasting Using Simultaneous Transfer Function Models
Simultaneous transfer function models are natural extensions of classical econometric models using simultaneous
equation systems. The primary difference is that the disturbance term for the classical econometric models are
often assumed to be white noise, while STF models allow the disturbance terms to follow an ARMA process. Unlike
vector ARMA models, STF models allow for both reduced form and structural form, as well as allow for explicit
incorporation of exogenous variables. The latter is an important flexibility. In many situations in practice,
certain variables are clearly exogenous and the inclusion of such variables in a vector ARMA model will either
result in an overly complicated model, or even worse, an unattainable model. For example, as discussed in
Chapter 6, trading-day and moving-holiday (e.g., Easter holiday) effects are important in the development of
a forecasting model. These effects can be readily incorporated into an STF model, while it cannot be accomplished
easily using a vector ARMA model. Similarly, intervention effects can easily be accommodated in an STF model,
but not in a vector ARMA model.
Since classical econometric models using simultaneous equation systems is a special case of STF models, many issues
related to identification, estimation, and forecasting using simultaneous equation systems are also present in the
application of STF models. We shall address some of these issues in this chapter, particularly those issues more
relevant to forecasting.