BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
MFraga
Quartz | Level 8

Hi,

 

I am trying tho use a do loop inside a macro to copy and create tables that have similar variables. My difficulty lies in knowing how to write the line of code of my macro do loop for a variable that is composed by letters and numbers. Here is what I have:

 

Data table1985;

input id XYZ81I1985 WXYZ81I1985;

Datalines;

1 2 2

2 3 3

3 5 5

4 9 9

5 10 10

6 17 17

7 18 19

8 20 19

9 30 30

10 24 24;

Run;

 

Data table1986;

input id XYZ86I1986 WXYZ86I1986;

Datalines;

1 2 2

2 3 3

3 5 5

4 9 9

5 10 10

6 17 17

7 18 19

8 20 19

9 30 30

10 24 24;

Run;

 

Data table1987;

input id XYZ86I1987 WXYZ86I1987;

Datalines;

1 2 2

2 3 3

3 5 5

4 9 9

5 10 10

6 17 17

7 18 19

8 20 19

9 30 30

10 24 24;

Run;

 

Data table1988;

input id XYZ86I1988 WXYZ86I1988;

Datalines;

1 2 2

2 3 3

3 5 5

4 9 9

5 10 10

6 17 17

7 18 19

8 20 19

9 30 30

10 24 24;

Run;

 

Data table1989;

input id XYZ86I1989 WXYZ86I1989;

Datalines;

1 2 2

2 3 3

3 5 5

4 9 9

5 10 10

6 17 17

7 18 19

8 20 19

9 30 30

10 24 24;

Run;

 

Data table1990;

input id XYZ86I1990 WXYZ86I1990;

Datalines;

1 2 2

2 3 3

3 5 5

4 9 9

5 10 10

6 17 17

7 18 19

8 20 19

9 30 30

10 24 24;

Run;

 

Data table1991;

input id XYZ91I1991 WXYZ91I1991;

Datalines;

1 2 2

2 3 3

3 5 5

4 9 9

5 10 10

6 17 17

7 18 19

8 20 19

9 30 30

10 24 24;

Run;

 

Data table1992;

input id XYZ91I1992 WXYZ91I1992;

Datalines;

1 2 2

2 3 3

3 5 5

4 9 9

5 10 10

6 17 17

7 18 19

8 20 19

9 30 30

10 24 24;

Run;

 

The macro I am trying to create should create the table_1985 to table_1992 copying my variable XYZ independently from the year just after (81, 86 or 91) and end with the number of the year of the table (from 1985 to 1991). For instance, I would have a column for ID and another for XYZ1985 for the table_1985 or a column ID and another for XYZ1990 for the table_1990. I do not know how I could use a macro for it without having to use several data steps to first rename the “XYZxxIyear” variable into “XYZyear”.

The macro I wrote is like this:

 

%macro XYZ;

%do i = 1985 %to 1992;

%do j = 81 %to 91 by 5;

 

data table_&i.; set table&i.;

XYZ&i.=XYZ&j.I&i.;

 

Keep id XYZ&i.;

Run;

 

End%; end%; mend%;  %XYZ; run;

 

I do not know why it is not working. Is the question clear?

Thanks!

 

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User
  1. Merge all tables into one dataset, by id.
  2. Transpose by id all xyz: variables. Rename col1 to xyz.
  3. Extract the year from the _name_, drop _name_
  4. Sort by id and year
data wide;
merge table:;
by id;
run;

proc transpose data=wide out=long (rename=(col1=xyz));
by id;
var xyz:;
run;

data want;
set long;
year = input(substr(_name_,length(_name)-3),4.);
drop _name_;
run;

proc sort data=want;
by id year;
run;

Untested, posted from my tablet.

 

You now have a dataset suited for all kinds of analysis, where data (years) is not hidden in structure (variable names). Maxim 19: Long Beats Wide.

View solution in original post

11 REPLIES 11
PaigeMiller
Diamond | Level 26

The first step to getting a macro work is to write code without macro variables and outside a macro, that works properly.

 

In this case, your data steps do not work outside of a macro.

 

Data table1985;
input id XYZ81I1985 WXYZ81I1985;
Datalines;
1 2 2;
2 3 3;
3 5 5;
4 9 9;
5 10 10;
6 17 17;
7 18 19;
8 20 19;
9 30 30;
10 24 24;
Run;

 

This produces errors in the log. You need to fix that before you can get a working macro.

 

A couple of other things:

 

  • Don't tell us "I do not know why it is not working" and then not explain further. If there are errors in the log, show us the ENTIRE log for your code.
  • Also, DATALINES will not work inside a macro.
  • Also, you have not explained how to generate the code you want, for example, how do we know that when year is 1985, that the code to be inserted into the variable name is 81I ? This is not explained.

 

All these things need to be addressed before we can take the next step.

--
Paige Miller
MFraga
Quartz | Level 8

Hi Paige, I have corrected the data.

 

Unfortunately, I can not share my log, I can’t share the original data, I am only reproducing a tiny sample of what I have to see if someone could help me.

 

When the year is 1985, the variable should be treated as XYZ1985 only, I do not need to use the 81I information. But I do not know either how I could copy the first three letters and the year to rename a new variable XYZ1985 without the 81I information that should be discarded. Is that clear now?

 

Thanks for your comments.

Quentin
Super User

It's not clear to me yet.  You've shown your input data.  Can you show some examples of the output data you want to create?

Meaning show examples of table_1985, table_1986, table_1987?  You don't have to show all the tables but a few would help. If something special happens every 5 years, maybe show table_1985, table_1986,table_1990, table_1991.

 

Even better would be if you could forget about the macro language for now, and show an example of the SAS code to create a few output data sets.

MFraga
Quartz | Level 8

Hi Quentin,

 

The tables that I want as a result would be this:

 

Data table_1985;

input id XYZ1985 ;

Datalines;

1 2

2 3

3 5

4 9

5 10

6 17

7 18

8 20

9 30

10 24;

Run;

 

Data table_1986;

input id XYZ1986;

Datalines;

1 2

2 3

3 5

4 9

5 10

6 17

7 18

8 20

9 30

10 24;

Run;

 

Data table_1987;

input id XYZ1987;

Datalines;

1 2

2 3

3 5

4 9

5 10

6 17

7 18

8 20

9 30

10 24;

Run;

 

Data table_1988;

input id XYZ1988:

Datalines;

1 2

2 3

3 5

4 9

5 10

6 17

7 18

8 20

9 30

10 24;

Run;

 

Data table_1989;

input id XYZ1989;

Datalines;

1 2

2 3

3 5

4 9

5 10

6 17

7 18

8 20

9 30

10 24;

Run;

 

Data table_1990;

input id XYZ1990;

Datalines;

1 2

2 3

3 5

4 9

5 10

6 17

7 18

8 20

9 30

10 24;

Run;

 

Data table_1991;

input id XYZ1991;

Datalines;

1 2

2 3

3 5

4 9

5 10

6 17

7 18

8 20

9 30

10 24;

Run;

 

Data table_1992;

input id XYZ1992;

Datalines;

1 2

2 3

3 5

4 9

5 10

6 17

7 18

8 20

9 30

10 24;

Run;

 

The data steps I am trying to automate with the macro do loop would be this:

 

Data table_1985; set table1985; XYZ1985 = XYZ81I1985; keep id XYZ1985; run;

Data table_1986; set table1986; XYZ1986 = XYZ86I1986; keep id XYZ1986; run;

Data table_1987; set table1987; XYZ1987 = XYZ86I1987; keep id XYZ1987; run;

Data table_1988; set table1988; XYZ1988 = XYZ86I1988; keep id XYZ1988; run;

Data table_1989; set table1989; XYZ1989 = XYZ86I1989; keep id XYZ1989; run;

Data table_1990; set table1990; XYZ1990 = XYZ86I1990; keep id XYZ1990; run;

Data table_1991; set table1991; XYZ1991 = XYZ91I1991; keep id XYZ1991; run;

Data table_1992; set table1992; XYZ1992 = XYZ91I1992; keep id XYZ1992; run;

 

The only special think every 5 years is that the original variable changes its name, but with no impact to my analysis if I rename it as I am suggesting.

 

Is it clear now the macro I intent to code?

 

Thanks for your comment and help! 

Quentin
Super User

This type of structure, with datasets with years in the name, and variables with years in the name, is almost never a good idea.

 

But since it's Friday, I played with it.

 

I think you only want one loop, but you need a way to map

1981 -> 81
1982 -> 81
1983 -> 81
1984 -> 81
1985 -> 81

1986 -> 86
1987 -> 86
1988 -> 86
1989 -> 86
1990 -> 86

1991 -> 91
1992 -> 91
1993 -> 91
1994 -> 91
1995 -> 91
...

For a programmer, I'm really bad at this every time I have to do it, but you can use the MOD function for this sort of mapping.

 

In a data step, you could do it like:

data foo ;
input x ;
y=x-(mod(x-1,5))-1900 ;
put x= y= ;
cards  ;
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
;
run ;

In the macro language, you could do it like:

 

%macro XYZ();
  %local x y ;
  %do x = 1985 %to 1992;
    %let y=%eval(&x-%sysfunc(mod(%eval(&x-1),5))-1900) ;
    %put &x -> &y;
  %end ;
%mend xyz ;
%xyz()

So if you like the above, then I think using your code it would look like below (untested):

 

%macro XYZ();
  %local i j ;

  %do i = 1985 %to 1992;
    %let j=%eval(&i-%sysfunc(mod(%eval(&i-1),5))-1900) ;

    data table_&i.;
      set table&i.;
      XYZ&i.=XYZ&j.I&i.    
      keep id XYZ&i.;
    run;


  %end ;
%mend xyz ;

options mprint;
%xyz()

That might be what you want.  But be sure to read the MPRINT log carefully, to look at the SAS code that was generated and make sure it's generating want you want.

 

And again, if you describe the big picture of why you are doing this, there are probably better ways to do this (like just put all the data into one dataset, with a different structure).  But it's quitting time where I live. : )

 

Note if your real data goes into the year 2001, above function won't handle it appropriately, because two-digit years are bad.

ballardw
Super User

If your real data only consists of those two variables you might consider using Proc Datasets to copy the data set then rename the variable. Using a data step for such means that each and every observation gets read into the data vector and processed one at a time. The time involved may not be noticed with a few sets of 100 or fewer observations but if any of your data sets have 1,000,000 or more observations you may sit around wondering what is taking so long.

 

Please also note that for your "macro generated code" to make sense then the input data sets, that should have created with data step code should be named table1985 and not table_1985.

MFraga
Quartz | Level 8

Hi, @ballardw , my real data has more than 1,000,000 observations. So I have to execute and wait, I have no choice. That's why I intent to minimize the mistakes using a macro that could help me copy many variables without the risk of writing many data steps. All the other variables in my dataset are fine, but this one that I used in my question (XYZYYYY) was creating me problem. Thanks a lot for your insights!

Tom
Super User Tom
Super User

So are the variables literally named XYZ.....YYYY ?

Or does the XYZ also change?

 

I find it much easier to work with data in datasets than in MACRO code.  So get the list of variable names into a dataset, say with PROC CONTENTS.  Then apply logic to generate the NEWNAME from the original NAME.  Now you have the information you might want to use to rename the variables.

 

So let's say the problem is to combine data from a series of dataset with names like TABLE...yyyy where yyyy is a four digit year.  And each dataset has exactly variables with a names like ...AAAyyyy where yyyy is the same four digit year that is in the dataset name and AAA is the 3 letters to be removed.  So a query like this should get the list of variable names that need to be renamed and the target new name.

proc sql;
create table renames as
 select libname
      , memname
      , substrn(memname,length(memname)-3) as year
      , name
      , cats(substrn(name,1,length(name)-7),calculated year) as newname
  from dictionary.columns
  where libname = 'WORK' 
    and memname like 'TABLE%' 
    and length(name) > 7
    and substrn(memname,length(memname)-3)  = substrn(name,length(name)-3)
    and 1980 <= input(substrn(name,length(name)-3),?4.) <= year(today())
;
quit;

Let's try if for some of your original datasets:

data table_1985;
  input id XYZ81I1985 WXYZ81I1985;
datalines;
1 2 2
2 3 3
3 5 5
;
 
data table_1986;
  input id XYZ86I1986 WXYZ86I1986;
datalines;
1 4 5
5 10 10
6 17 17
;
 
data table_1987;
  input id XYZ86I1987 WXYZ86I1987;
datalines;
1 6 7
8 20 19
9 30 30
;

Result

Obs    libname     memname      year       name        newname

 1      WORK      TABLE_1985    1985    XYZ81I1985     XYZ1985
 2      WORK      TABLE_1985    1985    WXYZ81I1985    WXYZ1985
 3      WORK      TABLE_1986    1986    XYZ86I1986     XYZ1986
 4      WORK      TABLE_1986    1986    WXYZ86I1986    WXYZ1986
 5      WORK      TABLE_1987    1987    XYZ86I1987     XYZ1987
 6      WORK      TABLE_1987    1987    WXYZ86I1987    WXYZ1987

Now that we have that data we can use it to write some code.

Perhaps you want code to MERGE the datasets?

So you could use code like this 

filename code temp;
data _null_;
  set renames end=eof;
  by memname;
  file code;
  if _n_=1 then put 'merge';
  if first.memname then put libname +(-1) '.' memname '(rename=(' @;
  put name '=' newname @;
  if last.memname then put '))';
  if eof then put ';';
run;

To create the MERGE statement. Which we could then use in a step like this 

data want;
%include code/source2;
  by id ;
run;

to combine the datasets to get:

Obs    id    XYZ1985    WXYZ1985    XYZ1986    WXYZ1986    XYZ1987    WXYZ1987

 1      1       2           2           2          2           2          2
 2      2       3           3           .          .           .          .
 3      3       5           5           .          .           .          .
 4      5       .           .          10         10           .          .
 5      6       .           .          17         17           .          .
 6      8       .           .           .          .          20         19
 7      9       .           .           .          .          30         30

But it will probably be more usable to SET the datasets and put the YEAR into its own variable.

proc sql;
create table renames as
 select libname
      , memname
      , catx('.',libname,nliteral(memname)) as dsname 
      , substrn(memname,length(memname)-3) as year
      , nliteral(name) as name
      , nliteral(substrn(name,1,length(name)-7)) as newname
  from dictionary.columns
  where libname = 'WORK' 
    and memname like 'TABLE%' 
    and length(name) > 7
    and substrn(memname,length(memname)-3)  = substrn(name,length(name)-3)
    and 1980 <= input(substrn(name,length(name)-3),?4.) <= year(today())
;
quit;

filename code temp;
data _null_;
  set renames end=eof;
  by memname;
  file code;
  if first.memname then put dsname '(rename=(' @;
  put name '=' newname @;
  if last.memname then put '))';
run;

data want;
  length id 8 year 8;
  set
%include code/source2;
    indsname=dsname 
  ;
  by id ;
  year = input(substrn(dsname,length(dsname)-3),4.);
run;

Result

Obs    id    year    XYZ    WXYZ

 1      1    1985      2      2
 2      1    1986      2      2
 3      1    1987      2      2
 4      2    1985      3      3
 5      3    1985      5      5
 6      5    1986     10     10
 7      6    1986     17     17
 8      8    1987     20     19
 9      9    1987     30     30

 

Kurt_Bremser
Super User
  1. Merge all tables into one dataset, by id.
  2. Transpose by id all xyz: variables. Rename col1 to xyz.
  3. Extract the year from the _name_, drop _name_
  4. Sort by id and year
data wide;
merge table:;
by id;
run;

proc transpose data=wide out=long (rename=(col1=xyz));
by id;
var xyz:;
run;

data want;
set long;
year = input(substr(_name_,length(_name)-3),4.);
drop _name_;
run;

proc sort data=want;
by id year;
run;

Untested, posted from my tablet.

 

You now have a dataset suited for all kinds of analysis, where data (years) is not hidden in structure (variable names). Maxim 19: Long Beats Wide.

MFraga
Quartz | Level 8

Thank you all for the different proposed solutions ! I came back to my dataset today and I got to execute the work that I needed.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 3655 views
  • 7 likes
  • 6 in conversation