- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have a dataset with years 2000-2019 and one variable for each year with the state that reported the data (e.g., STATE_2000, STATE_2001, STATE_2002). I need to populate a variable with the reporting state for each month and year of the data instead of just by year. For example, if I have STATE_2000, I need one for each month like STATE_012000, STATE_022000, STATE_032000, etc. This will just be a copy of the STATE_2000 variable with a new name, 12 times over for each year. I am trying to find a way to succulently write the code instead of writing it all out like this because I have 20 years of data:
data do_not_want;
set have;
STATE_012000=STATE_2000;
STATE_022000=STATE_2000;
STATE_032000=STATE_2000;
run;
I tried playing around with a macro for the month but I don't think it likes the underscore.
I think I figured out a way to do it with a macro statement for the year but it's only outputting the last year. Here's an example with a couple of years of data:
%macro state_yrs;
%do yr=2000 %to 2002;
data want;
set have;
STATE_01&yr=STATE_&yr;
STATE_02&yr=STATE_&yr;
STATE_03&yr=STATE_&yr;
STATE_04&yr=STATE_&yr;
STATE_05&yr=STATE_&yr;
STATE_06&yr=STATE_&yr;
STATE_07&yr=STATE_&yr;
STATE_08&yr=STATE_&yr;
STATE_09&yr=STATE_&yr;
STATE_10&yr=STATE_&yr;
STATE_11&yr=STATE_&yr;
STATE_12&yr=STATE_&yr;
run;
%end;
%mend state_yrs;
%state_yrs;
The output is giving me the variable name I want (STATE_MMYYYY), but in this example, it's only outputting the year 2002 and not 2000 or 2001. Thank you!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You put the %DO and %END in the wrong place. Move them so they are just around the assignment statements.
Also put the month after the year so that the variable names will sort in the right order.
data want;
set have;
%do yr=2000 %to 2002;
%do month=1 to 12 ;
%let month=%sysfunc(putn(&month,Z2.));
STATE_&yr._&month = STATE_&yr;
%end;
%end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You put the %DO and %END in the wrong place. Move them so they are just around the assignment statements.
Also put the month after the year so that the variable names will sort in the right order.
data want;
set have;
%do yr=2000 %to 2002;
%do month=1 to 12 ;
%let month=%sysfunc(putn(&month,Z2.));
STATE_&yr._&month = STATE_&yr;
%end;
%end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks so much, this worked.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Your task is complicated by a wide dataset structure.
After transposing to a long dataset, where the years are stored in a YEAR variable, a simple DO loop for the months will do it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The location of your loop outside the data step means you are recreating the data set with different variables.
It sounds like you would want:
%macro state_yrs; data want; set have; %do yr=2000 %to 2002; STATE_01&yr=STATE_&yr; STATE_02&yr=STATE_&yr; STATE_03&yr=STATE_&yr; STATE_04&yr=STATE_&yr; STATE_05&yr=STATE_&yr; STATE_06&yr=STATE_&yr; STATE_07&yr=STATE_&yr; STATE_08&yr=STATE_&yr; STATE_09&yr=STATE_&yr; STATE_10&yr=STATE_&yr; STATE_11&yr=STATE_&yr; STATE_12&yr=STATE_&yr; %end; run; %mend state_yrs;
However before going too far down this path you may want to reconsider whether this is really needed.
For 99% or more of most analysis, modeling, reporting or graphing in SAS the data would be be better with separate variable holding the date information. I'm not sure but I also suspect that instead of actual "STATE_" you have names like "Alaska_". Which means it may be worth in the long run having a separate variable with the state name as well.
This type of data with variable names holding values, i.e. the Year and Month, means that you have to keep adding variables and rewriting code. If you have a variable holding the year (and month) then you could subset your data with a where clause using the desired period pointing at the variable, or use BY group processing to handle the entire data set.
If you are going to attempt macro language coding you want to learn about the SAS system options MPRINT, MLOGIC and SYMBOLGEN. They will cause the LOG to show more details about the code generated.
For instance if you run the log would show that you generated 3 different data sets with different variables.
options mprint; %state_yrs; options mprint;
If you decide that you must have the many variables you might consider the advantages to using a YYYY_MM
structure instead. With first you can use variable lists such as State_2020_01 - State_2020_06 to get that sequence of 6 months for reference or State_2020: to have all the variables whose names start with State_2020. Most of them data that has this wide structure ends up involving using arrays to process in one way or another and the shortcut lists simply Array definition statements as well as variable Keep lists. The names will also sort "nicely" for processes that may involve the names of the variables.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'm not convinced that a macro is the best way to do this. Instead you can take advantage of declaring 2-dimension arrays, and specifying array bounds, as in:
data want;
set have;
array sty {2000:2002} state_2000-state_2002;
array stym {2000:2002,1:12}
state_2000_01-state_2000_12
state_2001_01-state_2001_12
state_2002_01-state_2002_12 ;
do y=2000 to 2002; do m=1 to 12;
stym{y,m}=sty{y};
end; end;
run;
The appeal of this approach is that the code is almost as compact as the macro, and is inline in the data step.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------