BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Hi,

I am a SAS newbie. I wrote this macro, but it doesn't work. I guess it goes wrong when reaching the 4th program, but I still can't figure out how to fix it. Can you please help? Thanks a lot...

%macro clean_rawfiles(importfile=, filelocation=, filename=, varnum=);
proc import datafile="C:\ARC DS Downloading (HoangLuong)\DataFiles\&importfile..csv"
out=&filelocation..&filename
replace;
getnames=no;
run;
data time &filename;
set &filelocation..&filename;
if var1="Code" then output time;
else output &filename;
run;
data time1 (drop=var2-var&varnum);
set time (drop=var1);
array x
  • var2 - var&varnum;
    do i = 1 to dim(x);
    time = x;
    output;
    end;
    run;
    data &filename1 (drop=var2-var&varnum); /*I guess it goes WRONG from here*/
    set &filename (rename=(var1=dscode));
    array x
  • var2 - var&varnum;
    do i = 1 to dim(x);
    &filename = x;
    output;
    end;
    run;
    proc sql;
    drop table &filelocation..&filename;
    quit;
    proc sql;
    create table &filelocation..&filename as
    select dscode, time as year, &filename
    from time1, &filename1
    where time1.i = &filename1.i and dscode ^= ""
    order by dscode, time;
    quit;
    %mend clean_rawfiles;



    Message was edited by: hoangluong Message was edited by: hoangluong
  • 6 REPLIES 6
    Ksharp
    Super User
    Please use
    [pre]
    options mprint mlogic;
    [/pre]
    to check the error in the log.

    post the log.


    In addition, "where time1.i = &filename1.i and dscode ^= " looks like maybe "&filename1..i "

    Ksharp
    Cynthia_sas
    SAS Super FREQ
    Hi:
    There is so much unknown about your macro program. Turning on these options:
    options mprint mlogic symbolgen;

    will help you debug what's happening. However, what I find very helpful in situations like this is for you to tell us, in narrative form, what you think is happening or what you WANT to have happen. For example:

    1) The PROC IMPORT step is reading in a CSV file identified by &IMPORTFILE and is writing out a SAS dataset identified by &FILELOCATION (the library) and &FILENAME (the dataset).

    2) The Data step program that starts:
    DATA time &filename; is creating 2 SAS datasets from the SAS dataset created above. I know that if VAR1="Code", then I have ????????? and so I need that information in the TIME dataset. All the rest of the observations go into the WORK.&FILENAME dataset. The variables that I have read in from the CSV file are:
    VAR1 - VAR&VARNUM (I need to supply &VARNUM in the macro program because ????????)
    The TIME dataset only needs ??????;
    The WORK.&FILENAME dataset needs ??????

    3) The dataset that starts DATA time1; is using the ?????? dataset as input. I have an ARRAY statement and a DO loop because ??????????. I am setting the TIME variable equal to X (the array name ????) instead of
    time = x(i); because ???????? Then in my output statement, I am doing an OUTPUT for each iteration through the DO loop. This means that my data will come into the program like this:
    ***** show what your variables are coming in *****
    and after the program is over, then WORK.TIME1 will look like this:
    **** show what you WANT your variables to look like coming out of the program

    4) Then I create DATA WORK.&FILENAME1 from WORK.&FILENAME. Since I do not ever PASS a value for &FILENAME1, I expect that the value for &FILENAME1 will come from ??????. I have a DO loop in this program. In the DO loop, I have the statement:
    &filename=x;

    Again, since X is the name of the ENTIRE array, what I think is going to happen here is ??????????.

    5) Then, I have a PROC SQL step to drop the dataset that I created in the PROC IMPORT step. I do this because ????????.

    6) Last, I have another PROC SQL step where I create &FILELOCATION..&FILENAME from my 2 datasets: WORK.TIME1 and &FILENAME1 (which I think I created in step 4) I think that I can use I (generally a DO loop index) here in the SQL step. However, the SQL step is NOT inside a Macro %DO loop, so I am using this syntax because I want ?????????. In this processing, I envision joining the cleaned up TIME and &FILENAME datasets together based on the WHERE condition:
    where time1.i = &filename1.i and dscode ^= ""

    In this program, I think the value of the variable I is going to be: ??????? and I want this PROC SQL step to execute ????? times based on ???????.

    Sometimes, it is useful to have a verbal description of what your whole process is. It's also useful to see a little bit of the data. In addition, these papers are very good papers which explain some of the basics of Macro processing and SAS Array processing.
    http://www2.sas.com/proceedings/sugi28/056-28.pdf
    http://support.sas.com/rnd/papers/sgf07/arrays1780.pdf
    http://www2.sas.com/proceedings/sugi29/070-29.pdf

    There is a HUGE difference between a Macro %DO loop and a regular DATA step DO loop and I'm not convinced that you are using the DATA step DO loop in the correct way in your program. Nor am I really sure just WHAT it is your program is trying to do. That's where having some information explaining the process would be useful.

    cynthia
    deleted_user
    Not applicable
    Hi, thanks so much for the help/suggestions. I've fixed the program and now it's running without errors.

    The errors occurred because I didn't put the period (.) between &filename and 1, i.e. should be &filename.1 rather than &filename1. Why so? I can't figure out. I just think &filename resolves to DY (see my testing with file DY), and 1 is a number I want attached to DY so I'll have DY1.

    I have a time series dataset, with 100,000 firms and each having 13 years of data. When I download the data from the database, the data is arranged like this:

    Firm 1997 1998 .... 2010
    firm1 xxx xxx .... xxx
    firm2 yyy yyy .... yyy
    and so on.

    Now I want to restructure the data so that I have the correct PANEL DATA structure, that is,

    Firm Year VariableOfInterest
    firm1 1997 xxxx
    firm1 1998 yyyy
    ....
    firm1 2010 zzgg

    firm2 1997 zxxx
    firm2 1998 gagag

    and so on.


    Thanks so much for the quick reply and for the very useful tutorials (esp. "Array: Construction and Usage...."...
    Peter_C
    Rhodochrosite | Level 12
    the compiler allows you to have macro variables named like filename1 as well as file and filename. So how do you think it should tell the difference and which is referenced by &filename1
    ??
    Cynthia_sas
    SAS Super FREQ
    Hi:
    This is where a description and seeing the data might be useful. Given the structure that you outline, I'm not sure I understand what your macro program is actually doing. From your description, the problem sounds like a more simple transpose problem. If all you want to do is take a structure like this (showing data only...no headers)
    [pre]
    firm1 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113
    firm2 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213
    firm3 3301 3302 3303 3304 3305 3306 3307 3308 3309 3310 3311 3312 3313
    firm4 4401 4402 4403 4404 4405 4406 4407 4408 4409 4410 4411 4412 4413
    [/pre]

    And turn it into a structure like this (only 2 firms shown):
    [pre]
    firm1 1998 1101
    firm1 1999 1102
    firm1 2000 1103
    firm1 2001 1104
    firm1 2002 1105
    firm1 2003 1106
    firm1 2004 1107
    firm1 2005 1108
    firm1 2006 1109
    firm1 2007 1110
    firm1 2008 1111
    firm1 2009 1112
    firm1 2010 1113
    firm2 1998 2201
    firm2 1999 2202
    firm2 2000 2203
    firm2 2001 2204
    firm2 2002 2205
    firm2 2003 2206
    firm2 2004 2207
    firm2 2005 2208
    firm2 2006 2209
    firm2 2007 2210
    firm2 2008 2211
    firm2 2009 2212
    firm2 2010 2213
    [/pre]

    Then there are a couple of things to know. First, SAS variable names can't start with a number. So your dataset variables have to be something other than 1998, 1999. In my program below, where I read in the "original" structure, I call the variable y1998, y1999, etc

    [pre]
    data orig_data;
    length firm $8;
    infile datalines dsd dlm=',';
    input firm y1998 y1999 y2000 y2001 y2002 y2003 y2004 y2005 y2006 y2007 y2008 y2009 y2010;
    return;
    datalines;
    "firm1",1101,1102,1103,1104,1105,1106,1107,1108,1109,1110,1111,1112,1113
    "firm2",2201,2202,2203,2204,2205,2206,2207,2208,2209,2210,2211,2212,2213
    "firm3",3301,3302,3303,3304,3305,3306,3307,3308,3309,3310,3311,3312,3313
    "firm4",4401,4402,4403,4404,4405,4406,4407,4408,4409,4410,4411,4412,4413
    ;
    run;

    proc print data=orig_data noobs;
    title 'Original Data';
    run;

    proc sort data=orig_data;
    by firm ;
    run;

    proc transpose data=orig_data out=panelout;
    by firm ;
    run;

    proc print data=panelout;
    title 'After Transpose';
    run;
    [/pre]

    After the transpose step, the (partial) transposed data looks like this:
    [pre]
    After Transpose

    Obs firm _NAME_ COL1

    1 firm1 y1998 1101
    2 firm1 y1999 1102
    3 firm1 y2000 1103
    4 firm1 y2001 1104
    5 firm1 y2002 1105
    6 firm1 y2003 1106
    7 firm1 y2004 1107
    8 firm1 y2005 1108
    9 firm1 y2006 1109
    10 firm1 y2007 1110
    11 firm1 y2008 1111
    12 firm1 y2009 1112
    13 firm1 y2010 1113
    14 firm2 y1998 2201
    15 firm2 y1999 2202
    16 firm2 y2000 2203
    17 firm2 y2001 2204
    18 firm2 y2002 2205
    19 firm2 y2003 2206
    20 firm2 y2004 2207
    21 firm2 y2005 2208
    22 firm2 y2006 2209
    23 firm2 y2007 2210
    24 firm2 y2008 2211
    25 firm2 y2009 2212
    26 firm2 y2010 2213
    [/pre]

    So from this we see that _NAME_ now holds the former variable for year and the COL1 variable holds the value for that year.

    Now, all that remains is to actually make a numeric YEAR variable:
    [pre]

    data final_panel(keep=firm year COL1);
    length year 8;
    set panelout;
    ** get rid of 'y' in from _name_ variable to make year;
    year = input(compress(_name_,'y'),4.);
    run;

    proc print data=final_panel noobs;
    title 'After Make YEAR Variable';
    run;
    [/pre]

    Which would result in (only partial observations shown):
    [pre]
    firm year COL1
    firm1 1998 1101
    firm1 1999 1102
    firm1 2000 1103
    firm1 2001 1104
    firm1 2002 1105
    firm1 2003 1106
    firm1 2004 1107
    firm1 2005 1108
    firm1 2006 1109
    firm1 2007 1110
    firm1 2008 1111
    firm1 2009 1112
    firm1 2010 1113
    firm2 1998 2201
    firm2 1999 2202
    firm2 2000 2203
    firm2 2001 2204
    firm2 2002 2205
    firm2 2003 2206
    firm2 2004 2207
    firm2 2005 2208
    firm2 2006 2209
    firm2 2007 2210
    firm2 2008 2211
    firm2 2009 2212
    firm2 2010 2213
    [/pre]

    If you wanted to rename COL1, at this point, you could use the RENAME= option in the last data set. If you do NOT already have a SAS dataset in the original structure and you need to read a dataset from some form -into- a SAS dataset, then you could also do the transpose/transform/output of one year per observation in the initial read program. But I didn't do that because you said your dataset was already in the original form.

    When you are working with time series data, you frequently must transpose the entire dataset or subset in order to work with the data in your procedure of choice. There is a good section in the documentation entitled "Working with Time Series Data" that may prove very informative:
    http://support.sas.com/documentation/cdl/en/etsug/63348/HTML/default/viewer.htm#/documentation/cdl/e...

    cynthia
    deleted_user
    Not applicable
    Thanks so much, Cynthia! Actually, I have tried PROC TRANSPOSE before switching to the use of arrays. The Proc Transpose takes like hours and hours on my PC and I can't figure out why, whereas the array does a much better job. Thanks a lot for suggesting the Proc Means, which I hadn't known until now :-). Thanks a lot

    sas-innovate-2024.png

    Don't miss out on SAS Innovate - Register now for the FREE Livestream!

    Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

     

    Register now!

    How to Concatenate Values

    Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

    Find more tutorials on the SAS Users YouTube channel.

    Click image to register for webinarClick image to register for webinar

    Classroom Training Available!

    Select SAS Training centers are offering in-person courses. View upcoming courses for:

    View all other training opportunities.

    Discussion stats
    • 6 replies
    • 1268 views
    • 0 likes
    • 4 in conversation