So, I am combining all of them based on the BEN_CD_#.
So for BEN_CD_01: NJ has 1,0,0 so then final result will be 1.
For BEN_CD_03: NJ has 1,0,2 so then final result will be 1 because according to my hierarchy, 1 >2>0
For BEN_CD_04: NJ has 3,1,1 so then final result will be 1 because according to my hierarchy, 1 >3
For BEN_CD_07: NJ has 5,6,0 so then final result will be 5 because according to my hierarchy 5>6>0
data demogs;
input id state_cd $ ben_cd_01-ben_cd_12;
datalines;
111 NJ 1 1 1 3 1 0 5 0 0 0 0 0
111 NJ 0 0 0 1 0 1 6 0 0 0 0 0
111 NJ 0 0 2 1 0 1 0 0 0 0 0 0
111 FL 0 0 0 0 0 1 1 1 1 1 1 1
;
run;
Here is some code I have tried to use thus far but am not getting what I want at all:
data DEMOGS_COLLAPSED;
set DEMOGS;
by state_cd;
array BENS(12) BENS_CD_01-BENS_CD_12;
do i = 01 to 12;
if BENS[i] = 0 then BENS[i] = 99;
end;
drop i;
run;
data DEMOGS_COLLAPSED_2;
set DEMOGS_COLLAPSED;
by state_cd;
array BENS_rst(12) BENS_CD_01-BENS_CD_12;
do i = 01 to 12;
if BENS_rst[i] ne . then BENS_rst[i]=min(BENS_rst[i]);
end;
drop i;
run;
data final;
input id state_cd $ ben_cd_01-ben_cd_12;
datalines;
111 NJ 1 1 1 1 1 1 5 0 0 0 0 0
111 FL 0 0 0 0 0 1 1 1 1 1 1 1
;
run;
Thanks in advance!
Your rules aren't clear. For NJ there are 3 records that have BEN_CD=1. Why did you choose the first record? Is it because it has more 1's than the other records?
Also, please show the code you have tried. It will help people help you.
I see. Are you familiar with PROC SQL? There is almost an easy SQL approach, because you almost want the minimum value. "Almost" because of the pesky problem that 0 lowest in your hierarchy.
You could almost just do:
proc sql ;
create table want as
select id,state_cd
,min(ben_cd_01) as ben_cd_01
,min(ben_cd_02) as ben_cd_02
,min(ben_cd_03) as ben_cd_03
,min(ben_cd_04) as ben_cd_04
from demogs
group by id,state_cd
;
quit ;
Then I went down the rathole of trying to deal with the zero values, and decided to treat them as 9s and came up with the horrendous:
proc sql ;
create table want as
select id,state_cd
,ifn(min(ifn(ben_cd_01=0,9,ben_cd_01))=9,0,min(ifn(ben_cd_01=0,9,ben_cd_01))) as ben_cd_01
,ifn(min(ifn(ben_cd_02=0,9,ben_cd_02))=9,0,min(ifn(ben_cd_02=0,9,ben_cd_02))) as ben_cd_02
,ifn(min(ifn(ben_cd_03=0,9,ben_cd_03))=9,0,min(ifn(ben_cd_03=0,9,ben_cd_03))) as ben_cd_03
,ifn(min(ifn(ben_cd_04=0,9,ben_cd_04))=9,0,min(ifn(ben_cd_04=0,9,ben_cd_04))) as ben_cd_04
from demogs
group by id,state_cd
;
quit ;
Or similar wallpaper code:
proc sql ;
create table want as
select id,state_cd
,ifn(ben_cd_01=9,0,ben_cd_01) as ben_cd_01
,ifn(ben_cd_02=9,0,ben_cd_02) as ben_cd_02
,ifn(ben_cd_03=9,0,ben_cd_03) as ben_cd_03
,ifn(ben_cd_04=9,0,ben_cd_04) as ben_cd_04
,ifn(ben_cd_05=9,0,ben_cd_05) as ben_cd_05
,ifn(ben_cd_06=9,0,ben_cd_06) as ben_cd_06
,ifn(ben_cd_07=9,0,ben_cd_07) as ben_cd_07
,ifn(ben_cd_08=9,0,ben_cd_08) as ben_cd_08
,ifn(ben_cd_09=9,0,ben_cd_09) as ben_cd_09
,ifn(ben_cd_10=9,0,ben_cd_10) as ben_cd_10
,ifn(ben_cd_11=9,0,ben_cd_11) as ben_cd_11
,ifn(ben_cd_12=9,0,ben_cd_12) as ben_cd_12
from (
select id,state_cd
,min(ifn(ben_cd_01=0,9,ben_cd_01)) as ben_cd_01
,min(ifn(ben_cd_02=0,9,ben_cd_02)) as ben_cd_02
,min(ifn(ben_cd_03=0,9,ben_cd_03)) as ben_cd_03
,min(ifn(ben_cd_04=0,9,ben_cd_04)) as ben_cd_04
,min(ifn(ben_cd_05=0,9,ben_cd_05)) as ben_cd_05
,min(ifn(ben_cd_06=0,9,ben_cd_06)) as ben_cd_06
,min(ifn(ben_cd_07=0,9,ben_cd_07)) as ben_cd_07
,min(ifn(ben_cd_08=0,9,ben_cd_08)) as ben_cd_08
,min(ifn(ben_cd_09=0,9,ben_cd_09)) as ben_cd_09
,min(ifn(ben_cd_10=0,9,ben_cd_10)) as ben_cd_10
,min(ifn(ben_cd_11=0,9,ben_cd_11)) as ben_cd_11
,min(ifn(ben_cd_12=0,9,ben_cd_12)) as ben_cd_12
from demogs
group by id,state_cd
) as a
;
quit ;
I think instead of that mess I would probably use a DATA step with an array to recode all of the zeros to nines, then run the simple PROC SQL step (or PROC MEANS) to get the minimum values, then another step to recode the nines back to zeros.
My other thought is that generally I don't like this wide format. So if I were working with this, I'd be likely to transpose it into a vertical dataset that had columns ID State Ben_CD_ID Ben_CD. So each record in your dataset would become 12 records in the vertical dataset. That would allow you to use the simple PROC SQL or PROC MEANS approach to grab the minimum value (after dealing with the zero problem). If you'd like to see a transpose approach, drop a note, and I or someone else will post one.
When you subtract 1 and use MOD by 7, the 0 turns into the highest value.
Since your not quite obvious rule is to search across observations then perhaps it means the data should be reshaped so that the rows become the variables and the Ben variables become rows. Then searching is a bit easier.
The code below assumes the values are at least grouped by ID and State. if not, sort by Id and state before the first step.
proc transpose data=demogs out=trans ; by id state_cd notsorted; var ben: ; run; data rankit; set trans; do i= 1 to 7,0; if whichn(i,of col:)>0 then do; rank=i; leave; end; end; run; proc transpose data=rankit out=want (drop=_name_); by id state_cd notsorted; var rank; id _name_; run;
The first transpose makes the row/column reshape.
The data step uses the order of the DO loop variable to search in a specific order. If you had really stupid hierarchy rules like 2>7>1>5>4>6 ... then listing them in order on the Do statement would work.
The WHICH function will search for the first value in the parameter list in a list of other values which here uses the variable list Col: as the default variables created by Proc transpose will be Col1, col2 ... The Whichn functions returns the position of the first match or 0. So the test for if the value of I was found. Only when found set a variable I called Rank to match your hierarchy level. The LEAVE instruction terminates the DO loop when executed. So we Leave the Loop when the first match is found.
Then use Proc Transpose to reshape back to your desired layout.
Hello @A_Halps,
I agree with the others that a dataset in a "long" format with only one variable (B) containing the BEN_CD_n values would make the task easier. PROC SQL could then select an expression such as
mod(min(mod(B+7, 8))+1, 8)
(similar to what @Kurt_Bremser suggested) from each "id state_cd n" BY group (in the sense of a GROUP BY clause). This expression would combine the two transformations needed for the special case B=0.
To avoid the two "transpose" steps creating (and reversing) the long format temporarily, you could fix your existing array approach and get the result in a single DATA step:
%macro f(a);
ifn(&a[i] in (0,9), 9-&a[i], &a[i])
%mend f;
data want(drop=i m:);
do until(last.state_cd);
set demogs;
by id state_cd notsorted;
array b[12] b:;
array m[12];
do i=1 to dim(b);
m[i]=min(m[i], %f(b));
end;
end;
do i=1 to dim(b);
b[i]=%f(m);
end;
run;
The DOW loop (do until(last.state_cd) ...) facilitates the collapsing of observations. The little macro F is just an abbreviation of the function used in two places to implement the special rule for zeros in array B. Array M (storing the "minimum" values) could also be defined as a _temporary_ array for possibly better performance. This would require a
call missing(of m[*]);
to be inserted, e.g., before the RUN statement (to override the automatic RETAIN of temporary arrays) and the "m:" in the DROP= dataset option to be removed.
If your data are already grouped by ID/STATE_CD:
data want (drop=_: );
set demogs;
by id state_cd notsorted;
retain _pref_ben1-_pref_ben12;
array pref_ben _pref_ben: ;
array ben_cd ben_cd_01-ben_cd_12 ;
if first.state_cd then do over ben_cd;
pref_ben=ben_cd;
end;
else do over ben_cd;
if mod(ben_cd+7,8)<mod(pref_ben+7,8) then pref_ben=ben_cd;
end;
if last.state_cd;
do over ben_cd;
ben_cd=pref_ben;
end;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.