Term Project
by Dan Brockman

University of California at Berkeley (Extension)
Summer 2007
Instructor: Jianmin Liu, Ph.D.
SAS-X405.5 Advanced Programming Using the SAS System

Currency Volatilities
and Correlations

Dr. Liu gave us the project assignment early in the course. We were to compute volatilities of price movements, then compute and write the lower triangle of the correlation matrix to a text file.

However, he didn't give us the data until several weeks later. I wanted to begin. For my initial inputs, I used my data from the project for the previous course , which was quite unrelated to currencies and markets.

Click on the image for a larger view.

I realized I was using one kind of data to create and test programs to process another kind of data. I decided to write programs that would process any set of time series indexed by days. The necessary generality and modularity of the resulting code would become the centerpiece of my project.

I'm very grateful to have had this opportunity to produce some SAS code demonstrably robust enough for use in several contexts.

Mark Gschwind's Currency Prices

Output text file

Note: Variable Sd is the SAS datevalue,
a julian date device, corresponding
to the index date of the time series.

Log file
Main program source code (fx2.sas)
Screen output
The third and last set of data I processed were series of currency prices obtained from somewhere by my classmate Mark Gschwind.

At this point, I'd worked out the (obvious) bugs in my collection of macro subprograms. Assembling them to process this data was relatively easy, due to the modularity of the design. It almost makes me wish I had 20 more volumes of input data and 30 or 40 hours to calculate their coefficients of volatility. That's how modularity pays off.

In two or three hours, I had completed the work. I set the configuration parameters in the main program, using a trading calendar (250 days per year). I included the modules to calculate and difference the logs of the prices. I debugged the few new lines that read the data from the text input file, and produced the output.

Mark's data had one bad date ("07/32/2007") in it, which the program duly reported both on the screen and in the log file.

Jianmin Liu's Logs of Currency Prices

Output text file
Log file
Main program source code (unk-1.sas)
Screen output
Before processing Mark's data, I processed Dr. Liu's.

Writing the programs had taken me so long that when I went searching for Dr. Liu's input data, I had forgotten what it looked like. I found a file named "data_file.txt" that seemed to contain logs of currency prices. I didn't know whether Dr. Liu had provided this set. I used them as "Data of Unknown Source". I've since concluded this was Dr. Liu's input.

Getting it to run took two or three days. As the second set of data after developing the initial complete program, it elicited a number of bugs. I had to revise some of the macro modules to add a parameter or two. I wrote code to read the text input.

I was expecting at least one, but I found no bad date.

Dan's Blood Pressure Data

Output text file
Log file
Main program source code (BP-1.sas)
Screen output
The first data I processed were blood pressure data.

Writing the code to enable the first successful execution required two or three weeks.

I used my blood pressure data from the project in SAS-X405.7. These were daily time series. Though not currency prices, or even prices, they served well enough to construct and exercise the programs.

I found one bad (missing) date.

Catalog of subprograms

I tested these macros using %include syntax in the main programs or in special-purpose test programs. When I had tested them adequately, I adjusted the main programs to invoke these macros via the autocall facility and the sasautos option.

annvol.sas - Volatility calculator.
Addresses Req 5 and 6.(1) of the assignment. Illustrates use of (a) named macro parameters ("parm="), (b) call symgetn() and call symput(), and (c) numbered variables (var1-var15).
annvol calculates annualized volatilities for a data set of time series. For a given date, the volatility is the standard deviation of the observations on the latest n days (Cox & Rubenstein, "Options Markets", pp. 255-6). n is an input parameter. The input value for a series for a given date should be a difference: the value of the variable less the value of the same variable on the previous day, as calculated by cchg.sas. Annualized volatility is the calculated standard deviation multiplied by the square root of the number of days per year, an input parameter. Using appropriate parameters, annvol will calculate volatility for a trading calendar of 250 days per year and 5 days per week, a standard calendar of 365 days per year, or for a simple series of observations assumed to occur on consecutive days.

cchg.sas - Calculate day-to-day differences in observations.
Accommodates trading calendar or conventional calendar.

clog.sas - Computes logarithms for specified variables.
Illustrates use of %eval().
Invoked by fx2.sas and BP-1.sas which have non-logarithmic input.

consecdays.sas - Assures no missing dates in time series.
consecdays simplifies subsequent logic by inserting dummy observations for missing dates, assuring the series contains consecutive dates without gaps, though the variables have missing values on those dummy observation dates.

corrwndo.sas - Prints correlation matrix for time windows.
corrwndo addresses Reqs 4, 5.(2), and 6 of the assignment.
corrwndo loops through the input data set (which is the output of annvol). It accommodates a trading calendar or a conventional calendar. It extracts a time window of data on each loop, as specified by parameters, by invoking wndo2. corrwndo invokes mkvarlist2 to list and count variables. It invokes findavar to create unique variables for temporary use without chancing interference with data set variables. It invokes getdsname to create unique data set names for temporary use, and delds to delete those temporary data sets when no longer needed. It uses proc corr to calculate the correlation matrix into a temporary data set. It uses proc printto to direct text output to the screen or to a file, as specified by a parameter. It uses the "missing" option and proc print to print the lower triangle of the correlation matrix.

cvtymdsas.sas - Converts a date stored as yyyymmdd to SAS datevalue.
BP-1.sas uses cvtymdsas to convert its input without losing information in case of invalid dates.

deabbr.sas - Converts var1-var4 to var1 var2 var3 var4, etc.
fx2.sas and unk-1.sas use deabbr to expand lists of variable names. I wrote these programs without awareness of the names of the input series, arbitrarily assigning names such as "cur1-cur15".

delds.sas - Deletes a data set when you dont know its name.
If you don't know its name, then you must know the name of the variable that contains the name of the data set.
Lore: I wrote delds for a previous project. Modular code is reusable.

dtchk.sas - Writes err msg in log on detecting invalid date.
Used in racedtck.sas to test alternate algorithms for testing date validity.

dtchkf.sas - Like dtchk + exception ds + replace bad date w missing.
dtchkf addresses Req 3 of the assignment.
dtchkf is an evolution of dtchk. dtchkf checks for date validity and writes exceptions to a dataset designated for that purpose. It replaces invalid dates with the missing value and writes an error message to the log file.

findavar.sas - Get unused unique variable name.

getdsname.sas - Get unused unique dataset name for temporary use.
Lore: I wrote delds for a previous project. Modular code is reusable.

mkvarlist2.sas - Make list of var names in data set.
mkvarlist2 illustrates use of (a) %global variables, (b) %eval, (c) macro debugging options mprint, mlogic and symbolgen, and (d) their opposites nomprint, nomlogic and nosymbolgen. It also illustrates (e) use of %sysfunc, and (f) %put.
mkvarlist2 examines a data set and returns in global macro variables (1) the list of all the variables in the data set, (2) the list of the numeric variables and their number, and (3) the list of the character variables and their number.

trimset.sas - Cuts leading & trailing obs with all values missing.
For economy of presentation, we use trimset to remove from our datasets leading or trailing records containing all missing values. I found numerous "missing observations" leading and trailing useful data for some data sets.

wndo2.sas - Selects time series data from time interval according to parameters.
wndo2 illustrates use of %local.

The main programs BP-1.sas, unk-1.sas
and fx2.sas perform most of their actions by invoking macro subprograms.

fx2.sas uses macros written to enable unk-1.sas, which uses macros written to enable BP-1.sas .

BP-1.sas uses a few old macros written to enable the Blood Pressure Project.

The main programs illustrate the use of proc sql.

Supplemental Programs

Log file



I wrote racedtck and raceprep to test some alternate date calculations for efficiency. raceprep produces a list of 1,000,000 random dates. racedtck executes an algorithm against the list of dates.

The results were inconclusive, much like similar tests I've done with other programming languages. These races with date algorithms illustrate the value of using straightforward algorithmic devices in programming, and the value of testing hypotheses as children of the Enlightment.

I used lemma to examine closely the behavior of the SAS lag functions. It was an early version of some algorithms eventually expressed in the project main programs.

I used the other programs at the left to unit test various macros and alternative algorithms.

Home | Up | Daniel Brockman | spicetrader.net | Links | Contact