Dear SAS experts
I have created a simple example of the problem I am experiencing.
I want to loop over some datasets. In each loop I want to find the median value for a variable and create a categorical variable for those below or at/above the median for this variable. I also for each dataset want to create a label/format for the categorical variable where the dataset-specific median is included/presented.
When I run the code below something I cannot quite understand happens. The label created in the most recent loop (based on the median computed in the most recent loop) is not only applied to the dataset in the most recent loop, but also to the dataset in first loop. I am not sure if also the categorization is performed in this way.
Can someone pinpoint the issue below? I have tried making several changes not of which changed the outcome.
*Two datasets;
data have2010;
input Name $ Age;
datalines;
Maria 30
John 32
Martin 22.5
Casper 25
Jonah 13
Ben 40
Ask 10
Harry 15
;
run;
data have2011;
input Name $ Age;
datalines;
Maria 31
John 33
Martin 23.5
Casper 26
Jonah 14
Ben 41
Ask 11
Harry 16
;
run;
*Loop over the two datasets;
*Create categorical variable (above/below median) and create appropriate label;
%macro example;
%do year=2010 %to 2011;
proc means data=have&year noprint;
output out=median&year(drop=_type_ _freq_) p50=p_50;
run;
data _null_;
set median&year;
call symputx('p_50',p_50);
run;
data have&year;
set have&year;
Age_cat=.;
if Age<&p_50 then Age_cat=1;
if Age>=&p_50 then Age_cat=2;
run;
proc format;
value age_form
1="1: <&p_50."
2="2: >=&p_50.";
run;
data have&year;
set have&year;
format Age_cat age_form.;
run;
%end;
%mend;
%example;
Thank you
Like this?
%macro example;
%local year median;
%do year=2010 %to 2011;
proc sql noprint;
select median(AGE) into: median from HAVE&year. ;
quit;
proc format;
value age&year._
1="1: < &median."
2="2: >= &median.";
run;
data HAVE&YEAR;
set HAVE&YEAR;
AGE_CAT=(AGE>=&median)+1;
format AGE_CAT age&year._.;
run;
%end;
%mend;
%example;
@ChrisNZ Thanks.
Although I did not use all of your code, I did get my code to work after being inspired by the way you define the label in the loop. After re-writing it in this way my code worked as intended.
No macro needed:
data have2010;
input Name $ Age;
datalines;
Maria 30
John 32
Martin 22.5
Casper 25
Jonah 13
Ben 40
Ask 10
Harry 15
;
data have2011;
input Name $ Age;
datalines;
Maria 31
John 33
Martin 23.5
Casper 26
Jonah 14
Ben 41
Ask 11
Harry 16
;
data have;
length dname $41.;
set
have2:
indsname=dname
;
year = input(compress(scan(dname,2,".")," ","kd"),4.);
run;
proc means data=have noprint;
by year;
output out=median (drop=_type_ _freq_) p50=p_50;
run;
data want;
merge
have
median
;
by year;
age_cat = (age >= p_50) + 1;
if age_cat = 1
then age_cat_char = catx(" ","1: <",p_50);
else age_cat_char = catx(" ","2: >=",p_50);
drop p_50;
run;
Nearly the same approach as already seen in the post of @Kurt_Bremser , but with a format to increase code length 😉
data work.have;
length Year 8;
set have2: indsname= _Source;
Year = input(compress(scan(_Source, 2, '.'), '', 'kd'), 4.);
run;
proc summary data=have nway;
by year;
var Age;
output out= work.median(drop= _type_ _freq_) p50=p50;
run;
data median_format;
set work.Median;
length
FmtName $ 32
Start End 8
Label $ 20
HLO EExcl $ 1
;
FmtName = cats('Age', Year, 'Median');
Start = .;
End = p50;
HLO = 'L';
EExcl = 'Y';
Label = cats('1: <', p50);
output;
Start = p50;
End = .;
HLO = 'H';
EExcl = 'N';
Label = cats('2: >=', p50);
output;
run;
proc format cntlin= work.median_format;
run;
data work.have_extended;
set work.have;
Age_Cat = putn(Age, cats('Age', Year, 'Median'));
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.