Hello,
I created three variables based on how certain observations represent the total and based on percentile criteria. I need to choose which of these variables has the closest value to 25% and then be able to use another variable that is associated with that variable. All observations are the same within each of these variables just because of the way I generated the results into the dataset (vs using macro variables).
The variable names represent the percentile criteria used. I need to choose which of the "AGG_" variables is closest to 25% so that I can use the associated "SITE_" variable which also uses the percentile in the variable name. A picture of the resulting dataset is attached which shows what I am doing. Also, ideally, the percentiles will change so that I would prefer to not have to hardcode the variables in (as in the PROC SQL at the end), so any advice on how to use macro variables would be helpful .
Small Sample Code:
data have ;
infile datalines dsd missover ;
input SiteID totalpt diagpts Totalpts DiagPts diagsite_pct acaSite commSite privSite diagmd_pct ;
datalines;
2958,137,3,42,1,0.02189781,,,137,0.023809524
2965,1366,13,104,1,0.009516837,,,1366,0.009615385
2999,100,2,43,2,0.02,,,100,0.046511628
3075,300,6,153,6,0.02,,,300,0.039215686
3101,429,10,95,3,0.023310023,,,429,0.031578947
3171,189,1,5,1,0.005291005,,,189,0.2
3279,37,2,2,1,0.054054054,,37,,0.5
3285,484,5,88,2,0.010330579,,,484,0.022727273
3307,2908,31,119,1,0.010660248,,,2908,0.008403361
3400,491,7,127,5,0.014256619,,,491,0.039370079
3434,1047,37,7,4,0.035339064,1047,,,0.571428571
3438,244,7,132,5,0.028688525,,244,,0.037878788
3450,2486,61,192,3,0.024537409,,,2486,0.015625
3493,434,9,66,1,0.020737327,,434,,0.015151515
3496,194,2,29,2,0.010309278,,,194,0.068965517
3530,77,23,25,9,0.298701299,77,,,0.36
3537,113,2,1,1,0.017699115,113,,,1
3547,470,6,42,1,0.012765957,,470,,0.023809524
3551,140,2,140,2,0.014285714,,140,,0.014285714
3638,914,83,54,23,0.090809628,914,,,0.425925926
;
run;
%let class=25 ; /*percentile for practice class type is set at 25 for now*/
proc univariate data=have noprint;
var acasite commsite privsite diagsite_pct diagsite_pct; /*diagsite_pct is duplicated to create the dummy site_ variables*/
output out=pctOutput pctlpre=aca_ comm_ priv_ diag_ site_ pctlpts=&class., 60 to 75 by 5;
run;
proc print; run;
data have (drop=i);
if _N_ = 1 then set pctoutput ; /*add the percentiles row generated from proc univariate to lookup table*/
set have ;
/*identify practices where number of patients with diagnosis is greater than or equal to various percentiles from proc univariate*/
array diag{*} diag_: ;
array site{*} site_: ;
do i = 1 to dim(diag) ;
Site[i]=. ;
if diagsite_PCT >= diag[i] and ( acaSite >= 0 or commSite >= comm_25 or privSite >= priv_25 ) then Site[i] = 1 ;
end ;
run;
proc print; run;
/*Determine if percentile representativeness is around 25% of total diagnosis sites */
/*NOTE: I prefer to be able to use macrovariables rather than hardcode just in case we decide to choose different percentiles later*/
proc sql ;
create table haveAgg as
Select SiteID, Site_60, Site_65, Site_70, sum(Site_60)/count(diagsite_pct) as Agg_60,
sum(Site_65)/count(diagsite_pct) as Agg_65,sum(Site_70)/count(diagsite_pct) as Agg_70
from have
Quit ;
proc print ; run;
/* Choose percentile column where diagnosis representativeness is closest to 25% */
Not sure how to do this because in this example the Agg_70 variable is closest to 25% so in subsequent code I will need to continue with this project I will need to use the SITE_70 variable. The Agg_70 is the representation of SITE_70, Agg_65 is representing the SITE_65, etc.
psuedo code that I want:
distance_n = abs(AGG_n - .25) ;
if min(distance_n) then variablewant = SITE_n ;
If Site_n = 1 and diagsite_pct < .05 then flag=1;
The gist of my problem is if AGG_70 is closest to 25% then how do I refer to the associated SITE_ variable using the suffix?
Suggestions greatly appreciated!!
Thank you! With a few tweaks (added "end" to close the if loop, added "of" within the min function), this code worked to create variablewant.
data want ;
set have ;
array aggx Agg_60 Agg_65 Agg_70 ;
array disx dis1-dis3;
array sitx $ Site_60 Site_65 Site_70 ;
do i=1 to dim(aggx);
disx(i) = abs( aggx(i) - 0.25);
end;
min_agg = min(of dis: );
do i=1 to dim(aggx);
if disx(i) = min_agg then do;
variablewant = sitx(i);
call symput('aggSite' , variablewant); /* to be used on next steps */
leave;
end;
end;
run;
proc print ; run;
You may try:
data want;
set heveAgg;
array aggx Agg_60 Agg_65 Agg70;
array disx dis1-dis3;
array sitx $ 'Site_60' 'Site_65' 'Site_70' ;
do i=1 to dim(aggx);
disx(i) = abs( aggx(i) - 0.25);
end;
min_agg = min(dis: );
do i=1 to dim(aggx);
if disx(i) = min_agg then do;
variablewant = sitx(i);
call symput('aggSite' , variablewant); /* to be used on next steps */
leave;
end;
run;
then next step should be in a macro:
%macro name_it;
%global aggSite;
.... your code using &aggSite ....
%mend;
%name_it;
Thank you! With a few tweaks (added "end" to close the if loop, added "of" within the min function), this code worked to create variablewant.
data want ;
set have ;
array aggx Agg_60 Agg_65 Agg_70 ;
array disx dis1-dis3;
array sitx $ Site_60 Site_65 Site_70 ;
do i=1 to dim(aggx);
disx(i) = abs( aggx(i) - 0.25);
end;
min_agg = min(of dis: );
do i=1 to dim(aggx);
if disx(i) = min_agg then do;
variablewant = sitx(i);
call symput('aggSite' , variablewant); /* to be used on next steps */
leave;
end;
end;
run;
proc print ; run;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.