Hello, I created three variables based on how certain observations represent the total and based on percentile criteria. I need to choose which of these variables has the closest value to 25% and then be able to use another variable that is associated with that variable. All observations are the same within each of these variables just because of the way I generated the results into the dataset (vs using macro variables). The variable names represent the percentile criteria used. I need to choose which of the "AGG_" variables is closest to 25% so that I can use the associated "SITE_" variable which also uses the percentile in the variable name. A picture of the resulting dataset is attached which shows what I am doing. Also, ideally, the percentiles will change so that I would prefer to not have to hardcode the variables in (as in the PROC SQL at the end), so any advice on how to use macro variables would be helpful . Small Sample Code: data have ; infile datalines dsd missover ; input SiteID totalpt diagpts Totalpts DiagPts diagsite_pct acaSite commSite privSite diagmd_pct ; datalines; 2958,137,3,42,1,0.02189781,,,137,0.023809524 2965,1366,13,104,1,0.009516837,,,1366,0.009615385 2999,100,2,43,2,0.02,,,100,0.046511628 3075,300,6,153,6,0.02,,,300,0.039215686 3101,429,10,95,3,0.023310023,,,429,0.031578947 3171,189,1,5,1,0.005291005,,,189,0.2 3279,37,2,2,1,0.054054054,,37,,0.5 3285,484,5,88,2,0.010330579,,,484,0.022727273 3307,2908,31,119,1,0.010660248,,,2908,0.008403361 3400,491,7,127,5,0.014256619,,,491,0.039370079 3434,1047,37,7,4,0.035339064,1047,,,0.571428571 3438,244,7,132,5,0.028688525,,244,,0.037878788 3450,2486,61,192,3,0.024537409,,,2486,0.015625 3493,434,9,66,1,0.020737327,,434,,0.015151515 3496,194,2,29,2,0.010309278,,,194,0.068965517 3530,77,23,25,9,0.298701299,77,,,0.36 3537,113,2,1,1,0.017699115,113,,,1 3547,470,6,42,1,0.012765957,,470,,0.023809524 3551,140,2,140,2,0.014285714,,140,,0.014285714 3638,914,83,54,23,0.090809628,914,,,0.425925926 ; run; %let class=25 ; /*percentile for practice class type is set at 25 for now*/ proc univariate data=have noprint; var acasite commsite privsite diagsite_pct diagsite_pct; /*diagsite_pct is duplicated to create the dummy site_ variables*/ output out=pctOutput pctlpre=aca_ comm_ priv_ diag_ site_ pctlpts=&class., 60 to 75 by 5; run; proc print; run; data have (drop=i); if _N_ = 1 then set pctoutput ; /*add the percentiles row generated from proc univariate to lookup table*/ set have ; /*identify practices where number of patients with diagnosis is greater than or equal to various percentiles from proc univariate*/ array diag{*} diag_: ; array site{*} site_: ; do i = 1 to dim(diag) ; Site[i]=. ; if diagsite_PCT >= diag[i] and ( acaSite >= 0 or commSite >= comm_25 or privSite >= priv_25 ) then Site[i] = 1 ; end ; run; proc print; run; /*Determine if percentile representativeness is around 25% of total diagnosis sites */ /*NOTE: I prefer to be able to use macrovariables rather than hardcode just in case we decide to choose different percentiles later*/ proc sql ; create table haveAgg as Select SiteID, Site_60, Site_65, Site_70, sum(Site_60)/count(diagsite_pct) as Agg_60, sum(Site_65)/count(diagsite_pct) as Agg_65,sum(Site_70)/count(diagsite_pct) as Agg_70 from have Quit ; proc print ; run; /* Choose percentile column where diagnosis representativeness is closest to 25% */ Not sure how to do this because in this example the Agg_70 variable is closest to 25% so in subsequent code I will need to continue with this project I will need to use the SITE_70 variable. The Agg_70 is the representation of SITE_70, Agg_65 is representing the SITE_65, etc. psuedo code that I want: distance_n = abs(AGG_n - .25) ; if min(distance_n) then variablewant = SITE_n ; If Site_n = 1 and diagsite_pct < .05 then flag=1; The gist of my problem is if AGG_70 is closest to 25% then how do I refer to the associated SITE_ variable using the suffix? Suggestions greatly appreciated!!
... View more