Hi,
I am trying tho use a do loop inside a macro to copy and create tables that have similar variables. My difficulty lies in knowing how to write the line of code of my macro do loop for a variable that is composed by letters and numbers. Here is what I have:
Data table1985;
input id XYZ81I1985 WXYZ81I1985;
Datalines;
1 2 2
2 3 3
3 5 5
4 9 9
5 10 10
6 17 17
7 18 19
8 20 19
9 30 30
10 24 24;
Run;
Data table1986;
input id XYZ86I1986 WXYZ86I1986;
Datalines;
1 2 2
2 3 3
3 5 5
4 9 9
5 10 10
6 17 17
7 18 19
8 20 19
9 30 30
10 24 24;
Run;
Data table1987;
input id XYZ86I1987 WXYZ86I1987;
Datalines;
1 2 2
2 3 3
3 5 5
4 9 9
5 10 10
6 17 17
7 18 19
8 20 19
9 30 30
10 24 24;
Run;
Data table1988;
input id XYZ86I1988 WXYZ86I1988;
Datalines;
1 2 2
2 3 3
3 5 5
4 9 9
5 10 10
6 17 17
7 18 19
8 20 19
9 30 30
10 24 24;
Run;
Data table1989;
input id XYZ86I1989 WXYZ86I1989;
Datalines;
1 2 2
2 3 3
3 5 5
4 9 9
5 10 10
6 17 17
7 18 19
8 20 19
9 30 30
10 24 24;
Run;
Data table1990;
input id XYZ86I1990 WXYZ86I1990;
Datalines;
1 2 2
2 3 3
3 5 5
4 9 9
5 10 10
6 17 17
7 18 19
8 20 19
9 30 30
10 24 24;
Run;
Data table1991;
input id XYZ91I1991 WXYZ91I1991;
Datalines;
1 2 2
2 3 3
3 5 5
4 9 9
5 10 10
6 17 17
7 18 19
8 20 19
9 30 30
10 24 24;
Run;
Data table1992;
input id XYZ91I1992 WXYZ91I1992;
Datalines;
1 2 2
2 3 3
3 5 5
4 9 9
5 10 10
6 17 17
7 18 19
8 20 19
9 30 30
10 24 24;
Run;
The macro I am trying to create should create the table_1985 to table_1992 copying my variable XYZ independently from the year just after (81, 86 or 91) and end with the number of the year of the table (from 1985 to 1991). For instance, I would have a column for ID and another for XYZ1985 for the table_1985 or a column ID and another for XYZ1990 for the table_1990. I do not know how I could use a macro for it without having to use several data steps to first rename the “XYZxxIyear” variable into “XYZyear”.
The macro I wrote is like this:
%macro XYZ;
%do i = 1985 %to 1992;
%do j = 81 %to 91 by 5;
data table_&i.; set table&i.;
XYZ&i.=XYZ&j.I&i.;
Keep id XYZ&i.;
Run;
End%; end%; mend%; %XYZ; run;
I do not know why it is not working. Is the question clear?
Thanks!
data wide;
merge table:;
by id;
run;
proc transpose data=wide out=long (rename=(col1=xyz));
by id;
var xyz:;
run;
data want;
set long;
year = input(substr(_name_,length(_name)-3),4.);
drop _name_;
run;
proc sort data=want;
by id year;
run;
Untested, posted from my tablet.
You now have a dataset suited for all kinds of analysis, where data (years) is not hidden in structure (variable names). Maxim 19: Long Beats Wide.
The first step to getting a macro work is to write code without macro variables and outside a macro, that works properly.
In this case, your data steps do not work outside of a macro.
Data table1985;
input id XYZ81I1985 WXYZ81I1985;
Datalines;
1 2 2;
2 3 3;
3 5 5;
4 9 9;
5 10 10;
6 17 17;
7 18 19;
8 20 19;
9 30 30;
10 24 24;
Run;
This produces errors in the log. You need to fix that before you can get a working macro.
A couple of other things:
All these things need to be addressed before we can take the next step.
Hi Paige, I have corrected the data.
Unfortunately, I can not share my log, I can’t share the original data, I am only reproducing a tiny sample of what I have to see if someone could help me.
When the year is 1985, the variable should be treated as XYZ1985 only, I do not need to use the 81I information. But I do not know either how I could copy the first three letters and the year to rename a new variable XYZ1985 without the 81I information that should be discarded. Is that clear now?
Thanks for your comments.
It's not clear to me yet. You've shown your input data. Can you show some examples of the output data you want to create?
Meaning show examples of table_1985, table_1986, table_1987? You don't have to show all the tables but a few would help. If something special happens every 5 years, maybe show table_1985, table_1986,table_1990, table_1991.
Even better would be if you could forget about the macro language for now, and show an example of the SAS code to create a few output data sets.
Hi Quentin,
The tables that I want as a result would be this:
Data table_1985;
input id XYZ1985 ;
Datalines;
1 2
2 3
3 5
4 9
5 10
6 17
7 18
8 20
9 30
10 24;
Run;
Data table_1986;
input id XYZ1986;
Datalines;
1 2
2 3
3 5
4 9
5 10
6 17
7 18
8 20
9 30
10 24;
Run;
Data table_1987;
input id XYZ1987;
Datalines;
1 2
2 3
3 5
4 9
5 10
6 17
7 18
8 20
9 30
10 24;
Run;
Data table_1988;
input id XYZ1988:
Datalines;
1 2
2 3
3 5
4 9
5 10
6 17
7 18
8 20
9 30
10 24;
Run;
Data table_1989;
input id XYZ1989;
Datalines;
1 2
2 3
3 5
4 9
5 10
6 17
7 18
8 20
9 30
10 24;
Run;
Data table_1990;
input id XYZ1990;
Datalines;
1 2
2 3
3 5
4 9
5 10
6 17
7 18
8 20
9 30
10 24;
Run;
Data table_1991;
input id XYZ1991;
Datalines;
1 2
2 3
3 5
4 9
5 10
6 17
7 18
8 20
9 30
10 24;
Run;
Data table_1992;
input id XYZ1992;
Datalines;
1 2
2 3
3 5
4 9
5 10
6 17
7 18
8 20
9 30
10 24;
Run;
The data steps I am trying to automate with the macro do loop would be this:
Data table_1985; set table1985; XYZ1985 = XYZ81I1985; keep id XYZ1985; run;
Data table_1986; set table1986; XYZ1986 = XYZ86I1986; keep id XYZ1986; run;
Data table_1987; set table1987; XYZ1987 = XYZ86I1987; keep id XYZ1987; run;
Data table_1988; set table1988; XYZ1988 = XYZ86I1988; keep id XYZ1988; run;
Data table_1989; set table1989; XYZ1989 = XYZ86I1989; keep id XYZ1989; run;
Data table_1990; set table1990; XYZ1990 = XYZ86I1990; keep id XYZ1990; run;
Data table_1991; set table1991; XYZ1991 = XYZ91I1991; keep id XYZ1991; run;
Data table_1992; set table1992; XYZ1992 = XYZ91I1992; keep id XYZ1992; run;
The only special think every 5 years is that the original variable changes its name, but with no impact to my analysis if I rename it as I am suggesting.
Is it clear now the macro I intent to code?
Thanks for your comment and help!
This type of structure, with datasets with years in the name, and variables with years in the name, is almost never a good idea.
But since it's Friday, I played with it.
I think you only want one loop, but you need a way to map
1981 -> 81 1982 -> 81 1983 -> 81 1984 -> 81 1985 -> 81 1986 -> 86 1987 -> 86 1988 -> 86 1989 -> 86 1990 -> 86 1991 -> 91 1992 -> 91 1993 -> 91 1994 -> 91 1995 -> 91 ...
For a programmer, I'm really bad at this every time I have to do it, but you can use the MOD function for this sort of mapping.
In a data step, you could do it like:
data foo ;
input x ;
y=x-(mod(x-1,5))-1900 ;
put x= y= ;
cards ;
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
;
run ;
In the macro language, you could do it like:
%macro XYZ();
%local x y ;
%do x = 1985 %to 1992;
%let y=%eval(&x-%sysfunc(mod(%eval(&x-1),5))-1900) ;
%put &x -> &y;
%end ;
%mend xyz ;
%xyz()
So if you like the above, then I think using your code it would look like below (untested):
%macro XYZ();
%local i j ;
%do i = 1985 %to 1992;
%let j=%eval(&i-%sysfunc(mod(%eval(&i-1),5))-1900) ;
data table_&i.;
set table&i.;
XYZ&i.=XYZ&j.I&i.
keep id XYZ&i.;
run;
%end ;
%mend xyz ;
options mprint;
%xyz()
That might be what you want. But be sure to read the MPRINT log carefully, to look at the SAS code that was generated and make sure it's generating want you want.
And again, if you describe the big picture of why you are doing this, there are probably better ways to do this (like just put all the data into one dataset, with a different structure). But it's quitting time where I live. : )
Note if your real data goes into the year 2001, above function won't handle it appropriately, because two-digit years are bad.
Thanks for your help!
If your real data only consists of those two variables you might consider using Proc Datasets to copy the data set then rename the variable. Using a data step for such means that each and every observation gets read into the data vector and processed one at a time. The time involved may not be noticed with a few sets of 100 or fewer observations but if any of your data sets have 1,000,000 or more observations you may sit around wondering what is taking so long.
Please also note that for your "macro generated code" to make sense then the input data sets, that should have created with data step code should be named table1985 and not table_1985.
Hi, @ballardw , my real data has more than 1,000,000 observations. So I have to execute and wait, I have no choice. That's why I intent to minimize the mistakes using a macro that could help me copy many variables without the risk of writing many data steps. All the other variables in my dataset are fine, but this one that I used in my question (XYZYYYY) was creating me problem. Thanks a lot for your insights!
So are the variables literally named XYZ.....YYYY ?
Or does the XYZ also change?
I find it much easier to work with data in datasets than in MACRO code. So get the list of variable names into a dataset, say with PROC CONTENTS. Then apply logic to generate the NEWNAME from the original NAME. Now you have the information you might want to use to rename the variables.
So let's say the problem is to combine data from a series of dataset with names like TABLE...yyyy where yyyy is a four digit year. And each dataset has exactly variables with a names like ...AAAyyyy where yyyy is the same four digit year that is in the dataset name and AAA is the 3 letters to be removed. So a query like this should get the list of variable names that need to be renamed and the target new name.
proc sql;
create table renames as
select libname
, memname
, substrn(memname,length(memname)-3) as year
, name
, cats(substrn(name,1,length(name)-7),calculated year) as newname
from dictionary.columns
where libname = 'WORK'
and memname like 'TABLE%'
and length(name) > 7
and substrn(memname,length(memname)-3) = substrn(name,length(name)-3)
and 1980 <= input(substrn(name,length(name)-3),?4.) <= year(today())
;
quit;
Let's try if for some of your original datasets:
data table_1985;
input id XYZ81I1985 WXYZ81I1985;
datalines;
1 2 2
2 3 3
3 5 5
;
data table_1986;
input id XYZ86I1986 WXYZ86I1986;
datalines;
1 4 5
5 10 10
6 17 17
;
data table_1987;
input id XYZ86I1987 WXYZ86I1987;
datalines;
1 6 7
8 20 19
9 30 30
;
Result
Obs libname memname year name newname 1 WORK TABLE_1985 1985 XYZ81I1985 XYZ1985 2 WORK TABLE_1985 1985 WXYZ81I1985 WXYZ1985 3 WORK TABLE_1986 1986 XYZ86I1986 XYZ1986 4 WORK TABLE_1986 1986 WXYZ86I1986 WXYZ1986 5 WORK TABLE_1987 1987 XYZ86I1987 XYZ1987 6 WORK TABLE_1987 1987 WXYZ86I1987 WXYZ1987
Now that we have that data we can use it to write some code.
Perhaps you want code to MERGE the datasets?
So you could use code like this
filename code temp;
data _null_;
set renames end=eof;
by memname;
file code;
if _n_=1 then put 'merge';
if first.memname then put libname +(-1) '.' memname '(rename=(' @;
put name '=' newname @;
if last.memname then put '))';
if eof then put ';';
run;
To create the MERGE statement. Which we could then use in a step like this
data want;
%include code/source2;
by id ;
run;
to combine the datasets to get:
Obs id XYZ1985 WXYZ1985 XYZ1986 WXYZ1986 XYZ1987 WXYZ1987 1 1 2 2 2 2 2 2 2 2 3 3 . . . . 3 3 5 5 . . . . 4 5 . . 10 10 . . 5 6 . . 17 17 . . 6 8 . . . . 20 19 7 9 . . . . 30 30
But it will probably be more usable to SET the datasets and put the YEAR into its own variable.
proc sql;
create table renames as
select libname
, memname
, catx('.',libname,nliteral(memname)) as dsname
, substrn(memname,length(memname)-3) as year
, nliteral(name) as name
, nliteral(substrn(name,1,length(name)-7)) as newname
from dictionary.columns
where libname = 'WORK'
and memname like 'TABLE%'
and length(name) > 7
and substrn(memname,length(memname)-3) = substrn(name,length(name)-3)
and 1980 <= input(substrn(name,length(name)-3),?4.) <= year(today())
;
quit;
filename code temp;
data _null_;
set renames end=eof;
by memname;
file code;
if first.memname then put dsname '(rename=(' @;
put name '=' newname @;
if last.memname then put '))';
run;
data want;
length id 8 year 8;
set
%include code/source2;
indsname=dsname
;
by id ;
year = input(substrn(dsname,length(dsname)-3),4.);
run;
Result
Obs id year XYZ WXYZ 1 1 1985 2 2 2 1 1986 2 2 3 1 1987 2 2 4 2 1985 3 3 5 3 1985 5 5 6 5 1986 10 10 7 6 1986 17 17 8 8 1987 20 19 9 9 1987 30 30
data wide;
merge table:;
by id;
run;
proc transpose data=wide out=long (rename=(col1=xyz));
by id;
var xyz:;
run;
data want;
set long;
year = input(substr(_name_,length(_name)-3),4.);
drop _name_;
run;
proc sort data=want;
by id year;
run;
Untested, posted from my tablet.
You now have a dataset suited for all kinds of analysis, where data (years) is not hidden in structure (variable names). Maxim 19: Long Beats Wide.
Thank you all for the different proposed solutions ! I came back to my dataset today and I got to execute the work that I needed.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.