DATA Step, Macro, Functions and more

select variable that has the closest value

Accepted Solution Solved
Reply
Frequent Contributor
Frequent Contributor
Posts: 133
Accepted Solution

select variable that has the closest value

Hello,

I created three variables based on how certain observations represent the total and based on percentile criteria. I need to choose which of these variables has the closest value to 25% and then be able to use another variable that is associated with that variable.  All observations are the same within each of these variables just because of the way I generated the results into the dataset (vs using macro variables).

 

The variable names represent the percentile criteria used. I need to choose which of the "AGG_" variables is closest to 25% so that I can use the associated "SITE_" variable which also uses the percentile in the variable name.  A picture of the resulting dataset is attached which shows what I am doing. Also, ideally, the percentiles will change so that I would prefer to not have to hardcode the variables in (as in the PROC SQL at the end), so any advice on how to use macro variables would be helpful .

 

Small Sample Code: 

 

data have ;
infile datalines dsd missover ;
input SiteID totalpt diagpts Totalpts DiagPts diagsite_pct acaSite commSite privSite diagmd_pct ;
datalines;
2958,137,3,42,1,0.02189781,,,137,0.023809524
2965,1366,13,104,1,0.009516837,,,1366,0.009615385
2999,100,2,43,2,0.02,,,100,0.046511628
3075,300,6,153,6,0.02,,,300,0.039215686
3101,429,10,95,3,0.023310023,,,429,0.031578947
3171,189,1,5,1,0.005291005,,,189,0.2
3279,37,2,2,1,0.054054054,,37,,0.5
3285,484,5,88,2,0.010330579,,,484,0.022727273
3307,2908,31,119,1,0.010660248,,,2908,0.008403361
3400,491,7,127,5,0.014256619,,,491,0.039370079
3434,1047,37,7,4,0.035339064,1047,,,0.571428571
3438,244,7,132,5,0.028688525,,244,,0.037878788
3450,2486,61,192,3,0.024537409,,,2486,0.015625
3493,434,9,66,1,0.020737327,,434,,0.015151515
3496,194,2,29,2,0.010309278,,,194,0.068965517
3530,77,23,25,9,0.298701299,77,,,0.36
3537,113,2,1,1,0.017699115,113,,,1
3547,470,6,42,1,0.012765957,,470,,0.023809524
3551,140,2,140,2,0.014285714,,140,,0.014285714
3638,914,83,54,23,0.090809628,914,,,0.425925926
;
run;


%let class=25 ; /*percentile for practice class type is set at 25 for now*/
proc univariate data=have noprint;
var acasite commsite privsite diagsite_pct diagsite_pct; /*diagsite_pct is duplicated to create the dummy site_ variables*/
output out=pctOutput pctlpre=aca_ comm_ priv_ diag_ site_ pctlpts=&class., 60 to 75 by 5;
run;
proc print; run;

data have (drop=i);
if _N_ = 1 then set pctoutput ; /*add the percentiles row generated from proc univariate to lookup table*/
set have ;

/*identify practices where number of patients with diagnosis is greater than or equal to various percentiles from proc univariate*/
array diag{*} diag_: ;
array site{*} site_: ;

do i = 1 to dim(diag) ;
Site[i]=. ;
if diagsite_PCT >= diag[i] and ( acaSite >= 0 or commSite >= comm_25 or privSite >= priv_25 ) then Site[i] = 1 ;
end ;

run;
proc print; run;

 

/*Determine if percentile representativeness is around 25% of total diagnosis sites */

/*NOTE: I prefer to be able to use macrovariables rather than hardcode just in case we decide to choose different percentiles later*/
proc sql ;
create table haveAgg as
Select SiteID, Site_60, Site_65, Site_70, sum(Site_60)/count(diagsite_pct) as Agg_60,
sum(Site_65)/count(diagsite_pct) as Agg_65,sum(Site_70)/count(diagsite_pct) as Agg_70
from have
Quit ;
proc print ; run;

/* Choose percentile column where diagnosis representativeness is closest to 25% */

 

Not sure how to do this because in this example the Agg_70 variable is closest to 25% so in subsequent code I will need to continue with this project I will need to use the SITE_70 variable. The Agg_70 is the representation of SITE_70, Agg_65 is representing the SITE_65, etc. 

 

psuedo code that I want:

 

distance_n = abs(AGG_n - .25)  ;

 

if min(distance_n) then variablewant = SITE_n ;

 

If Site_n = 1 and diagsite_pct < .05 then flag=1;

 

 

The gist of my problem is if AGG_70 is closest to 25% then how do I refer to the associated SITE_ variable using the suffix?

 

Suggestions greatly appreciated!!

 


sample specialty.PNG

Accepted Solutions
Solution
‎10-10-2016 11:11 AM
Frequent Contributor
Frequent Contributor
Posts: 133

Re: select variable that has the closest value

Thank you! With a few tweaks (added "end" to close the if loop, added "of" within the min function), this code worked to create variablewant. 

 

data want ;
set have ;

array aggx Agg_60 Agg_65 Agg_70 ;
array disx dis1-dis3;
array sitx $ Site_60 Site_65 Site_70 ;

do i=1 to dim(aggx);
disx(i) = abs( aggx(i) - 0.25);
end;
min_agg = min(of dis: );
do i=1 to dim(aggx);
if disx(i) = min_agg then do;
variablewant = sitx(i);
call symput('aggSite' , variablewant); /* to be used on next steps */
leave;
end;
end;
run;
proc print ; run;

View solution in original post


All Replies
Trusted Advisor
Posts: 1,370

Re: select variable that has the closest value

[ Edited ]

You may try:

 

data want;

 set heveAgg;

      array aggx Agg_60 Agg_65 Agg70;

      array disx   dis1-dis3;

      array sitx $ 'Site_60' 'Site_65' 'Site_70' ;

 

      do i=1 to dim(aggx);

           disx(i) = abs( aggx(i) - 0.25);

      end;

      min_agg = min(dis: );

      do i=1 to dim(aggx);

            if disx(i) = min_agg then do;

               variablewant = sitx(i);

               call symput('aggSite' , variablewant);   /* to be used on next steps */

               leave;

      end;

 run;

    

then next step should be in a macro:

 

%macro name_it;

       %global aggSite;

         .... your code using &aggSite ....

%mend;

%name_it;

 

                    

            

Solution
‎10-10-2016 11:11 AM
Frequent Contributor
Frequent Contributor
Posts: 133

Re: select variable that has the closest value

Thank you! With a few tweaks (added "end" to close the if loop, added "of" within the min function), this code worked to create variablewant. 

 

data want ;
set have ;

array aggx Agg_60 Agg_65 Agg_70 ;
array disx dis1-dis3;
array sitx $ Site_60 Site_65 Site_70 ;

do i=1 to dim(aggx);
disx(i) = abs( aggx(i) - 0.25);
end;
min_agg = min(of dis: );
do i=1 to dim(aggx);
if disx(i) = min_agg then do;
variablewant = sitx(i);
call symput('aggSite' , variablewant); /* to be used on next steps */
leave;
end;
end;
run;
proc print ; run;

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 322 views
  • 0 likes
  • 2 in conversation