BookmarkSubscribeRSS Feed
A_Halps
Obsidian | Level 7
I want to collapse the data within each state to get 1 Record Per ID-State. on top of that, there is a hierarchy for who is the most eligible in that BEN_CD per state. The hierarchy of who should be selected: 1>2>3>4>5>6>7>0.
 

So, I am combining all of them based on the BEN_CD_#.

So for BEN_CD_01: NJ has 1,0,0 so then final result will be 1.

For BEN_CD_03: NJ has 1,0,2 so then final result will be 1 because according to my hierarchy, 1 >2>0

For BEN_CD_04: NJ has 3,1,1 so then final result will be 1 because according to my hierarchy, 1 >3

For BEN_CD_07: NJ has 5,6,0 so then final result will be 5 because according to my hierarchy 5>6>0

The code starts like this: 
data demogs;
input id state_cd $ ben_cd_01-ben_cd_12;
datalines;
111 NJ 1 1 1 3 1 0 5 0 0 0 0 0
111 NJ 0 0 0 1 0 1 6 0 0 0 0 0
111 NJ 0 0 2 1 0 1 0 0 0 0 0 0
111 FL 0 0 0 0 0 1 1 1 1 1 1 1
;
run;
 

Here is some code I have tried to use thus far but am not getting what I want at all:

data DEMOGS_COLLAPSED;
    set DEMOGS;
	    by state_cd;

    array BENS(12) BENS_CD_01-BENS_CD_12;
	    do i = 01 to 12;
		if BENS[i] = 0 then BENS[i] = 99;
	    end;
drop i;
run;

data DEMOGS_COLLAPSED_2;
    set DEMOGS_COLLAPSED;
	    by state_cd;

	array BENS_rst(12) BENS_CD_01-BENS_CD_12;

    do i = 01 to 12;
		if BENS_rst[i] ne . then BENS_rst[i]=min(BENS_rst[i]);
    end;

drop i;
run;
 
and I want it to end like this:
data final;
input id state_cd $ ben_cd_01-ben_cd_12;
datalines;
111 NJ 1 1 1 1 1 1 5 0 0 0 0 0
111 FL 0 0 0 0 0 1 1 1 1 1 1 1
;
run;

 

Thanks in advance!

7 REPLIES 7
Quentin
Super User

Your rules aren't clear.  For NJ there are 3 records that have BEN_CD=1.  Why did you choose the first record? Is it because it has more 1's than the other records?

 

Also, please show the code you have tried. It will help people help you.

A_Halps
Obsidian | Level 7
Changed the original post to answer your questions
Quentin
Super User

I see.  Are you familiar with PROC SQL?  There is almost an easy SQL approach, because you almost want the minimum value.  "Almost" because of the pesky problem that 0 lowest in your hierarchy.

 

You could almost just do:

proc sql ;
  create table want as
  select id,state_cd
        ,min(ben_cd_01) as ben_cd_01
        ,min(ben_cd_02) as ben_cd_02
        ,min(ben_cd_03) as ben_cd_03
        ,min(ben_cd_04) as ben_cd_04
  from demogs
  group by id,state_cd
   ;
quit ;

Then I went down the rathole of trying to deal with the zero values, and decided to treat them as 9s and came up with the horrendous:

proc sql ;
  create table want as
  select id,state_cd
        ,ifn(min(ifn(ben_cd_01=0,9,ben_cd_01))=9,0,min(ifn(ben_cd_01=0,9,ben_cd_01))) as ben_cd_01
        ,ifn(min(ifn(ben_cd_02=0,9,ben_cd_02))=9,0,min(ifn(ben_cd_02=0,9,ben_cd_02))) as ben_cd_02
        ,ifn(min(ifn(ben_cd_03=0,9,ben_cd_03))=9,0,min(ifn(ben_cd_03=0,9,ben_cd_03))) as ben_cd_03
        ,ifn(min(ifn(ben_cd_04=0,9,ben_cd_04))=9,0,min(ifn(ben_cd_04=0,9,ben_cd_04))) as ben_cd_04
  from demogs
  group by id,state_cd
   ;
quit ;

Or similar wallpaper code:

proc sql ;
  create table want as
  select id,state_cd
        ,ifn(ben_cd_01=9,0,ben_cd_01) as ben_cd_01
        ,ifn(ben_cd_02=9,0,ben_cd_02) as ben_cd_02
        ,ifn(ben_cd_03=9,0,ben_cd_03) as ben_cd_03
        ,ifn(ben_cd_04=9,0,ben_cd_04) as ben_cd_04
        ,ifn(ben_cd_05=9,0,ben_cd_05) as ben_cd_05
        ,ifn(ben_cd_06=9,0,ben_cd_06) as ben_cd_06
        ,ifn(ben_cd_07=9,0,ben_cd_07) as ben_cd_07
        ,ifn(ben_cd_08=9,0,ben_cd_08) as ben_cd_08
        ,ifn(ben_cd_09=9,0,ben_cd_09) as ben_cd_09
        ,ifn(ben_cd_10=9,0,ben_cd_10) as ben_cd_10
        ,ifn(ben_cd_11=9,0,ben_cd_11) as ben_cd_11
        ,ifn(ben_cd_12=9,0,ben_cd_12) as ben_cd_12
  from (
        select id,state_cd
        ,min(ifn(ben_cd_01=0,9,ben_cd_01)) as ben_cd_01
        ,min(ifn(ben_cd_02=0,9,ben_cd_02)) as ben_cd_02
        ,min(ifn(ben_cd_03=0,9,ben_cd_03)) as ben_cd_03
        ,min(ifn(ben_cd_04=0,9,ben_cd_04)) as ben_cd_04
        ,min(ifn(ben_cd_05=0,9,ben_cd_05)) as ben_cd_05
        ,min(ifn(ben_cd_06=0,9,ben_cd_06)) as ben_cd_06
        ,min(ifn(ben_cd_07=0,9,ben_cd_07)) as ben_cd_07
        ,min(ifn(ben_cd_08=0,9,ben_cd_08)) as ben_cd_08
        ,min(ifn(ben_cd_09=0,9,ben_cd_09)) as ben_cd_09
        ,min(ifn(ben_cd_10=0,9,ben_cd_10)) as ben_cd_10
        ,min(ifn(ben_cd_11=0,9,ben_cd_11)) as ben_cd_11
        ,min(ifn(ben_cd_12=0,9,ben_cd_12)) as ben_cd_12
        from demogs
        group by id,state_cd
        ) as a
   ;
quit ;

I think instead of that mess I would probably use a DATA step with an array to recode all of the zeros to nines, then run the simple PROC SQL step (or PROC MEANS) to get the minimum values, then another step to recode the nines back to zeros.

 

My other thought is that generally I don't like this wide format.  So if I were working with this, I'd be likely to transpose it into a vertical dataset that had columns ID State Ben_CD_ID Ben_CD.  So each record in your dataset would become 12 records in the vertical dataset.  That would allow you to use the simple PROC SQL or PROC MEANS approach to grab the minimum value (after dealing with the zero problem).  If you'd like to see a transpose approach, drop a note, and I or someone else will post one.

ballardw
Super User

Since your not quite obvious rule is to search across observations then perhaps it means the data should be reshaped so that the rows become the variables and the Ben variables become rows. Then searching is a bit easier.

 

The code below assumes the values are at least grouped by ID and State. if not, sort by Id and state before the first step.

proc transpose data=demogs out=trans
   ;
by id state_cd notsorted;
var ben: ;
run;

data rankit;
   set trans;
   do i= 1 to 7,0;
      if whichn(i,of col:)>0 then do;
         rank=i;
         leave;
      end;
   end;
run;

proc transpose data=rankit out=want (drop=_name_);
   by id state_cd notsorted;
   var rank;
   id _name_;
run;

The first transpose makes the row/column reshape.

The data step uses the order of the DO loop variable to search in a specific order. If you had really stupid hierarchy rules like 2>7>1>5>4>6 ... then listing them in order on the Do statement would work.

The WHICH function will search for the first value in the parameter list in a list of other values which here uses the variable list Col:  as the default variables created by Proc transpose will be Col1, col2 ... The Whichn functions returns the position of the first match or 0. So the test for if the value of I was found. Only when found set a variable I called Rank to match your hierarchy level. The LEAVE instruction terminates the DO loop when executed. So we Leave the Loop when the first match is found.

Then use Proc Transpose to reshape back to your desired layout.

FreelanceReinh
Jade | Level 19

Hello @A_Halps,

 

I agree with the others that a dataset in a "long" format with only one variable (B) containing the BEN_CD_n values would make the task easier. PROC SQL could then select an expression such as 

mod(min(mod(B+7, 8))+1, 8)

(similar to what @Kurt_Bremser suggested) from each "id state_cd n" BY group (in the sense of a GROUP BY clause). This expression would combine the two transformations needed for the special case B=0.

 

To avoid the two "transpose" steps creating (and reversing) the long format temporarily, you could fix your existing array approach and get the result in a single DATA step:

%macro f(a);
ifn(&a[i] in (0,9), 9-&a[i], &a[i])
%mend f;

data want(drop=i m:);
do until(last.state_cd);
  set demogs;
  by id state_cd notsorted;
  array b[12] b:;
  array m[12];
  do i=1 to dim(b);
    m[i]=min(m[i], %f(b));
  end;
end;
do i=1 to dim(b);
  b[i]=%f(m);
end;
run;

The DOW loop (do until(last.state_cd) ...) facilitates the collapsing of observations. The little macro F is just an abbreviation of the function used in two places to implement the special rule for zeros in array B. Array M (storing the "minimum" values) could also be defined as a _temporary_ array for possibly better performance. This would require a 

call missing(of m[*]);

to be inserted, e.g., before the RUN statement (to override the automatic RETAIN of temporary arrays) and the "m:" in the DROP= dataset option to be removed.

mkeintz
PROC Star

If your data are already grouped by ID/STATE_CD:

 


data want (drop=_: );
  set demogs;
  by id state_cd notsorted;
  retain _pref_ben1-_pref_ben12;
  array pref_ben _pref_ben: ;
  array ben_cd   ben_cd_01-ben_cd_12 ;
  if first.state_cd then do over ben_cd;
    pref_ben=ben_cd;
  end;
  else do over ben_cd;
    if mod(ben_cd+7,8)<mod(pref_ben+7,8) then pref_ben=ben_cd;
  end;

  if last.state_cd;
  do over ben_cd;
    ben_cd=pref_ben;
  end;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1842 views
  • 5 likes
  • 6 in conversation