Hello,
I created three variables based on how certain observations represent the total and based on percentile criteria. I need to choose which of these variables has the closest value to 25% and then be able to use another variable that is associated with that variable. All observations are the same within each of these variables just because of the way I generated the results into the dataset (vs using macro variables).
The variable names represent the percentile criteria used. I need to choose which of the "AGG_" variables is closest to 25% so that I can use the associated "SITE_" variable which also uses the percentile in the variable name. A picture of the resulting dataset is attached which shows what I am doing. Also, ideally, the percentiles will change so that I would prefer to not have to hardcode the variables in (as in the PROC SQL at the end), so any advice on how to use macro variables would be helpful .
Small Sample Code:
data have ;
infile datalines dsd missover ;
input SiteID totalpt diagpts Totalpts DiagPts diagsite_pct acaSite commSite privSite diagmd_pct ;
datalines;
2958,137,3,42,1,0.02189781,,,137,0.023809524
2965,1366,13,104,1,0.009516837,,,1366,0.009615385
2999,100,2,43,2,0.02,,,100,0.046511628
3075,300,6,153,6,0.02,,,300,0.039215686
3101,429,10,95,3,0.023310023,,,429,0.031578947
3171,189,1,5,1,0.005291005,,,189,0.2
3279,37,2,2,1,0.054054054,,37,,0.5
3285,484,5,88,2,0.010330579,,,484,0.022727273
3307,2908,31,119,1,0.010660248,,,2908,0.008403361
3400,491,7,127,5,0.014256619,,,491,0.039370079
3434,1047,37,7,4,0.035339064,1047,,,0.571428571
3438,244,7,132,5,0.028688525,,244,,0.037878788
3450,2486,61,192,3,0.024537409,,,2486,0.015625
3493,434,9,66,1,0.020737327,,434,,0.015151515
3496,194,2,29,2,0.010309278,,,194,0.068965517
3530,77,23,25,9,0.298701299,77,,,0.36
3537,113,2,1,1,0.017699115,113,,,1
3547,470,6,42,1,0.012765957,,470,,0.023809524
3551,140,2,140,2,0.014285714,,140,,0.014285714
3638,914,83,54,23,0.090809628,914,,,0.425925926
;
run;
%let class=25 ; /*percentile for practice class type is set at 25 for now*/
proc univariate data=have noprint;
var acasite commsite privsite diagsite_pct diagsite_pct; /*diagsite_pct is duplicated to create the dummy site_ variables*/
output out=pctOutput pctlpre=aca_ comm_ priv_ diag_ site_ pctlpts=&class., 60 to 75 by 5;
run;
proc print; run;
data have (drop=i);
if _N_ = 1 then set pctoutput ; /*add the percentiles row generated from proc univariate to lookup table*/
set have ;
/*identify practices where number of patients with diagnosis is greater than or equal to various percentiles from proc univariate*/
array diag{*} diag_: ;
array site{*} site_: ;
do i = 1 to dim(diag) ;
Site[i]=. ;
if diagsite_PCT >= diag[i] and ( acaSite >= 0 or commSite >= comm_25 or privSite >= priv_25 ) then Site[i] = 1 ;
end ;
run;
proc print; run;
/*Determine if percentile representativeness is around 25% of total diagnosis sites */
/*NOTE: I prefer to be able to use macrovariables rather than hardcode just in case we decide to choose different percentiles later*/
proc sql ;
create table haveAgg as
Select SiteID, Site_60, Site_65, Site_70, sum(Site_60)/count(diagsite_pct) as Agg_60,
sum(Site_65)/count(diagsite_pct) as Agg_65,sum(Site_70)/count(diagsite_pct) as Agg_70
from have
Quit ;
proc print ; run;
/* Choose percentile column where diagnosis representativeness is closest to 25% */
Not sure how to do this because in this example the Agg_70 variable is closest to 25% so in subsequent code I will need to continue with this project I will need to use the SITE_70 variable. The Agg_70 is the representation of SITE_70, Agg_65 is representing the SITE_65, etc.
psuedo code that I want:
distance_n = abs(AGG_n - .25) ;
if min(distance_n) then variablewant = SITE_n ;
If Site_n = 1 and diagsite_pct < .05 then flag=1;
The gist of my problem is if AGG_70 is closest to 25% then how do I refer to the associated SITE_ variable using the suffix?
Suggestions greatly appreciated!!
Thank you! With a few tweaks (added "end" to close the if loop, added "of" within the min function), this code worked to create variablewant.
data want ;
set have ;
array aggx Agg_60 Agg_65 Agg_70 ;
array disx dis1-dis3;
array sitx $ Site_60 Site_65 Site_70 ;
do i=1 to dim(aggx);
disx(i) = abs( aggx(i) - 0.25);
end;
min_agg = min(of dis: );
do i=1 to dim(aggx);
if disx(i) = min_agg then do;
variablewant = sitx(i);
call symput('aggSite' , variablewant); /* to be used on next steps */
leave;
end;
end;
run;
proc print ; run;
You may try:
data want;
set heveAgg;
array aggx Agg_60 Agg_65 Agg70;
array disx dis1-dis3;
array sitx $ 'Site_60' 'Site_65' 'Site_70' ;
do i=1 to dim(aggx);
disx(i) = abs( aggx(i) - 0.25);
end;
min_agg = min(dis: );
do i=1 to dim(aggx);
if disx(i) = min_agg then do;
variablewant = sitx(i);
call symput('aggSite' , variablewant); /* to be used on next steps */
leave;
end;
run;
then next step should be in a macro:
%macro name_it;
%global aggSite;
.... your code using &aggSite ....
%mend;
%name_it;
Thank you! With a few tweaks (added "end" to close the if loop, added "of" within the min function), this code worked to create variablewant.
data want ;
set have ;
array aggx Agg_60 Agg_65 Agg_70 ;
array disx dis1-dis3;
array sitx $ Site_60 Site_65 Site_70 ;
do i=1 to dim(aggx);
disx(i) = abs( aggx(i) - 0.25);
end;
min_agg = min(of dis: );
do i=1 to dim(aggx);
if disx(i) = min_agg then do;
variablewant = sitx(i);
call symput('aggSite' , variablewant); /* to be used on next steps */
leave;
end;
end;
run;
proc print ; run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.