Notes on the Use of Data in Research

This handout covers some guidelines to the problems of selecting and documenting appropriate data for time series analysis.

Data selection is not the same as data mining - data should ideally be selected PRIOR to estimation, to fit the requirements of the problem to hand. A particular issue to applied time series econometrics is the relatively easy access we now have to electronic databases. These databases contain a myriad of series which seem suitable for any estimation we may need.

In any assessment of the worth of applied work a particular issue is the appropriateness of the data used. Attention to detail in this area is vital - this is mainly a matter of thorough recording of your reasons for selecting a particular data series, usually in the text, giving a decent definition of the series, and the source. For example the same data series could be described as follows - in each case the first is not acceptable and the second is.

GDP - Gross Domestic Product, quarterly, DX database

GDP(E) - Nominal Gross Domestic Product (Expenditure basis), quarterly, TSS database from the DX database [code TSS-GDPX....E]

RUS - real 3 month US interest rate, quarterly, DX database

RUS - 3 month US Treasury bill rate, quarterly - constructed as a the average of end- month data from the RBA Bulletin series, Table H1, from the DX database

[code RBA - INT3MTHUS] deflated by the US inflation rate, constructed from the US CPI obtained from the IFS database on the STARS database [code US.64.A].

The reason for this seeming pedantry is that applied econometric work should be fundamentally replicable. When you complete a piece of work there are two aspects you need to consider. First, can somebody following your paper independently find data and produce consistent results (allowing only for data revisions which can sometimes be an enormous problem in itself). And second, when they contact you because the first step has failed, can you provide the data which will produce exactly the results you reported - that is KEEP YOUR ORIGINAL DATA. This may seem an obvious point, but its often not done, databases get updated and nobody can then tell whether the inability to reproduce results is due to a mistake or data revisions. For work such as theses it is vital you keep your original data well documented for several reasons: firstly, your examiners may want to see it/use it, secondly it may happen that you wish to publish the results in the future and you will forget how it was all constructed, and thirdly you, or somebody else who contacts you, may wish to use the data for some other research in the future.


