What's wrong with my macro program and how to fix it?

deleted_user · Posted 05-28-2010 07:53 AM

Hi,

I am a SAS newbie. I wrote this macro, but it doesn't work. I guess it goes wrong when reaching the 4th program, but I still can't figure out how to fix it. Can you please help? Thanks a lot...

%macro clean_rawfiles(importfile=, filelocation=, filename=, varnum=);
proc import datafile="C:\ARC DS Downloading (HoangLuong)\DataFiles\&importfile..csv"
out=&filelocation..&filename
replace;
getnames=no;
run;
data time &filename;
set &filelocation..&filename;
if var1="Code" then output time;
else output &filename;
run;
data time1 (drop=var2-var&varnum);
set time (drop=var1);
array x

var2 - var&varnum;
do i = 1 to dim(x);
time = x;
output;
end;
run;
data &filename1 (drop=var2-var&varnum); /*I guess it goes WRONG from here*/
set &filename (rename=(var1=dscode));
array x

var2 - var&varnum;
do i = 1 to dim(x);
&filename = x;
output;
end;
run;
proc sql;
drop table &filelocation..&filename;
quit;
proc sql;
create table &filelocation..&filename as
select dscode, time as year, &filename
from time1, &filename1
where time1.i = &filename1.i and dscode ^= ""
order by dscode, time;
quit;
%mend clean_rawfiles;

Message was edited by: hoangluong Message was edited by: hoangluong

Ksharp · Posted 05-28-2010 11:06 AM

Please use
[pre]
options mprint mlogic;
[/pre]
to check the error in the log.

post the log.

In addition, "where time1.i = &filename1.i and dscode ^= " looks like maybe "&filename1..i "

Ksharp

Cynthia_sas · Posted 05-28-2010 12:32 PM

Hi:
There is so much unknown about your macro program. Turning on these options:
options mprint mlogic symbolgen;

will help you debug what's happening. However, what I find very helpful in situations like this is for you to tell us, in narrative form, what you think is happening or what you WANT to have happen. For example:

1) The PROC IMPORT step is reading in a CSV file identified by &IMPORTFILE and is writing out a SAS dataset identified by &FILELOCATION (the library) and &FILENAME (the dataset).

2) The Data step program that starts:
DATA time &filename; is creating 2 SAS datasets from the SAS dataset created above. I know that if VAR1="Code", then I have ????????? and so I need that information in the TIME dataset. All the rest of the observations go into the WORK.&FILENAME dataset. The variables that I have read in from the CSV file are:
VAR1 - VAR&VARNUM (I need to supply &VARNUM in the macro program because ????????)
The TIME dataset only needs ??????;
The WORK.&FILENAME dataset needs ??????

3) The dataset that starts DATA time1; is using the ?????? dataset as input. I have an ARRAY statement and a DO loop because ??????????. I am setting the TIME variable equal to X (the array name ????) instead of
time = x(i); because ???????? Then in my output statement, I am doing an OUTPUT for each iteration through the DO loop. This means that my data will come into the program like this:
***** show what your variables are coming in *****
and after the program is over, then WORK.TIME1 will look like this:
**** show what you WANT your variables to look like coming out of the program

4) Then I create DATA WORK.&FILENAME1 from WORK.&FILENAME. Since I do not ever PASS a value for &FILENAME1, I expect that the value for &FILENAME1 will come from ??????. I have a DO loop in this program. In the DO loop, I have the statement:
&filename=x;

Again, since X is the name of the ENTIRE array, what I think is going to happen here is ??????????.

5) Then, I have a PROC SQL step to drop the dataset that I created in the PROC IMPORT step. I do this because ????????.

6) Last, I have another PROC SQL step where I create &FILELOCATION..&FILENAME from my 2 datasets: WORK.TIME1 and &FILENAME1 (which I think I created in step 4) I think that I can use I (generally a DO loop index) here in the SQL step. However, the SQL step is NOT inside a Macro %DO loop, so I am using this syntax because I want ?????????. In this processing, I envision joining the cleaned up TIME and &FILENAME datasets together based on the WHERE condition:
where time1.i = &filename1.i and dscode ^= ""

In this program, I think the value of the variable I is going to be: ??????? and I want this PROC SQL step to execute ????? times based on ???????.

Sometimes, it is useful to have a verbal description of what your whole process is. It's also useful to see a little bit of the data. In addition, these papers are very good papers which explain some of the basics of Macro processing and SAS Array processing.
http://www2.sas.com/proceedings/sugi28/056-28.pdf
http://support.sas.com/rnd/papers/sgf07/arrays1780.pdf
http://www2.sas.com/proceedings/sugi29/070-29.pdf

There is a HUGE difference between a Macro %DO loop and a regular DATA step DO loop and I'm not convinced that you are using the DATA step DO loop in the correct way in your program. Nor am I really sure just WHAT it is your program is trying to do. That's where having some information explaining the process would be useful.

cynthia

deleted_user · Posted 05-29-2010 01:51 AM

Hi, thanks so much for the help/suggestions. I've fixed the program and now it's running without errors.

The errors occurred because I didn't put the period (.) between &filename and 1, i.e. should be &filename.1 rather than &filename1. Why so? I can't figure out. I just think &filename resolves to DY (see my testing with file DY), and 1 is a number I want attached to DY so I'll have DY1.

I have a time series dataset, with 100,000 firms and each having 13 years of data. When I download the data from the database, the data is arranged like this:

Firm 1997 1998 .... 2010
firm1 xxx xxx .... xxx
firm2 yyy yyy .... yyy
and so on.

Now I want to restructure the data so that I have the correct PANEL DATA structure, that is,

Firm Year VariableOfInterest
firm1 1997 xxxx
firm1 1998 yyyy
....
firm1 2010 zzgg

firm2 1997 zxxx
firm2 1998 gagag

and so on.

Thanks so much for the quick reply and for the very useful tutorials (esp. "Array: Construction and Usage...."...

Peter_C · Posted 05-29-2010 08:13 AM

the compiler allows you to have macro variables named like filename1 as well as file and filename. So how do you think it should tell the difference and which is referenced by &filename1
??

Cynthia_sas · Posted 05-29-2010 06:13 PM

Hi:
This is where a description and seeing the data might be useful. Given the structure that you outline, I'm not sure I understand what your macro program is actually doing. From your description, the problem sounds like a more simple transpose problem. If all you want to do is take a structure like this (showing data only...no headers)
[pre]
firm1 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113
firm2 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213
firm3 3301 3302 3303 3304 3305 3306 3307 3308 3309 3310 3311 3312 3313
firm4 4401 4402 4403 4404 4405 4406 4407 4408 4409 4410 4411 4412 4413
[/pre]

And turn it into a structure like this (only 2 firms shown):
[pre]
firm1 1998 1101
firm1 1999 1102
firm1 2000 1103
firm1 2001 1104
firm1 2002 1105
firm1 2003 1106
firm1 2004 1107
firm1 2005 1108
firm1 2006 1109
firm1 2007 1110
firm1 2008 1111
firm1 2009 1112
firm1 2010 1113
firm2 1998 2201
firm2 1999 2202
firm2 2000 2203
firm2 2001 2204
firm2 2002 2205
firm2 2003 2206
firm2 2004 2207
firm2 2005 2208
firm2 2006 2209
firm2 2007 2210
firm2 2008 2211
firm2 2009 2212
firm2 2010 2213
[/pre]

Then there are a couple of things to know. First, SAS variable names can't start with a number. So your dataset variables have to be something other than 1998, 1999. In my program below, where I read in the "original" structure, I call the variable y1998, y1999, etc

[pre]
data orig_data;
length firm $8;
infile datalines dsd dlm=',';
input firm y1998 y1999 y2000 y2001 y2002 y2003 y2004 y2005 y2006 y2007 y2008 y2009 y2010;
return;
datalines;
"firm1",1101,1102,1103,1104,1105,1106,1107,1108,1109,1110,1111,1112,1113
"firm2",2201,2202,2203,2204,2205,2206,2207,2208,2209,2210,2211,2212,2213
"firm3",3301,3302,3303,3304,3305,3306,3307,3308,3309,3310,3311,3312,3313
"firm4",4401,4402,4403,4404,4405,4406,4407,4408,4409,4410,4411,4412,4413
;
run;

proc print data=orig_data noobs;
title 'Original Data';
run;

proc sort data=orig_data;
by firm ;
run;

proc transpose data=orig_data out=panelout;
by firm ;
run;

proc print data=panelout;
title 'After Transpose';
run;
[/pre]

After the transpose step, the (partial) transposed data looks like this:
[pre]
After Transpose

Obs firm _NAME_ COL1

1 firm1 y1998 1101
2 firm1 y1999 1102
3 firm1 y2000 1103
4 firm1 y2001 1104
5 firm1 y2002 1105
6 firm1 y2003 1106
7 firm1 y2004 1107
8 firm1 y2005 1108
9 firm1 y2006 1109
10 firm1 y2007 1110
11 firm1 y2008 1111
12 firm1 y2009 1112
13 firm1 y2010 1113
14 firm2 y1998 2201
15 firm2 y1999 2202
16 firm2 y2000 2203
17 firm2 y2001 2204
18 firm2 y2002 2205
19 firm2 y2003 2206
20 firm2 y2004 2207
21 firm2 y2005 2208
22 firm2 y2006 2209
23 firm2 y2007 2210
24 firm2 y2008 2211
25 firm2 y2009 2212
26 firm2 y2010 2213
[/pre]

So from this we see that _NAME_ now holds the former variable for year and the COL1 variable holds the value for that year.

Now, all that remains is to actually make a numeric YEAR variable:
[pre]

data final_panel(keep=firm year COL1);
length year 8;
set panelout;
** get rid of 'y' in from _name_ variable to make year;
year = input(compress(_name_,'y'),4.);
run;

proc print data=final_panel noobs;
title 'After Make YEAR Variable';
run;
[/pre]

Which would result in (only partial observations shown):
[pre]
firm year COL1
firm1 1998 1101
firm1 1999 1102
firm1 2000 1103
firm1 2001 1104
firm1 2002 1105
firm1 2003 1106
firm1 2004 1107
firm1 2005 1108
firm1 2006 1109
firm1 2007 1110
firm1 2008 1111
firm1 2009 1112
firm1 2010 1113
firm2 1998 2201
firm2 1999 2202
firm2 2000 2203
firm2 2001 2204
firm2 2002 2205
firm2 2003 2206
firm2 2004 2207
firm2 2005 2208
firm2 2006 2209
firm2 2007 2210
firm2 2008 2211
firm2 2009 2212
firm2 2010 2213
[/pre]

If you wanted to rename COL1, at this point, you could use the RENAME= option in the last data set. If you do NOT already have a SAS dataset in the original structure and you need to read a dataset from some form -into- a SAS dataset, then you could also do the transpose/transform/output of one year per observation in the initial read program. But I didn't do that because you said your dataset was already in the original form.

When you are working with time series data, you frequently must transpose the entire dataset or subset in order to work with the data in your procedure of choice. There is a good section in the documentation entitled "Working with Time Series Data" that may prove very informative:
http://support.sas.com/documentation/cdl/en/etsug/63348/HTML/default/viewer.htm#/documentation/cdl/e...

cynthia

deleted_user · Posted 06-17-2010 01:07 AM

Thanks so much, Cynthia! Actually, I have tried PROC TRANSPOSE before switching to the use of arrays. The Proc Transpose takes like hours and hours on my PC and I can't figure out why, whereas the array does a much better job. Thanks a lot for suggesting the Proc Means, which I hadn't known until now :-). Thanks a lot

What's wrong with my macro program and how to fix it?

Re: What's wrong with my macro program and how to fix it?

Re: What's wrong with my macro program and how to fix it?

Re: What's wrong with my macro program and how to fix it?

Re: What's wrong with my macro program and how to fix it?

Re: What's wrong with my macro program and how to fix it?

Re: What's wrong with my macro program and how to fix it?

Classroom Training Available!