Solved: Re: Identifying observations on different submission indicators

bknitch · Posted 04-09-2021 02:33 PM

data have;
input ID$ CC_IND CC1 CC2 CC6 CC8 CC9 CC10;
cards;
123 0 1 0 0 0 0 1 
123 1 0 1 0 0 0 1 
124 0 0 0 0 0 0 1
124 1 0 0 0 0 0 1
125 0 1 1 0 0 0 0
125 1 1 1 0 0 1 0
126 0 1 0 1 0 0 0
126 1 0 0 1 0 0 0
;
run;

Above is a sample of my table that contains roughly 8,000 IDs with flags indicating the ID has a CC present or not (=1 is present =0 is not present). There is also an indicator (C_IND) that lets me know what type of submission this was.

I'd like to find which CC is populated as 1 by ID and CC_IND. Essentially I'm trying to find what CC's are present on ID's with a CC_IND=1 that are not populated on a CC_IND=0.

An example would be ID 123 where CC2=1 on the CC_IND=1 record but not present on the CC_IND=0 record.

Kurt_Bremser · Posted 04-09-2021 03:06 PM

Such code is easy with a long dataset layout:

data have;
input ID$ CC_IND CC1 CC2 CC6 CC8 CC9 CC10;
cards;
123 0 1 0 0 0 0 1 
123 1 0 1 0 0 0 1 
124 0 0 0 0 0 0 1
124 1 0 0 0 0 0 1
125 0 1 1 0 0 0 0
125 1 1 1 0 0 1 0
126 0 1 0 1 0 0 0
126 1 0 0 1 0 0 0
;

proc transpose
  data=have
  out=long (
    rename=(_name_=cc col1=submitted)
    where=(submitted = 1)
  )
;
by id cc_ind;
run;

proc sort data=long;
by id cc;
run;

data want;
merge
  long (in=zero where=(cc_ind = 0))
  long (in=one where=(cc_ind = 1))
;
by id cc;
if one and not zero;
keep id cc;
run;

Edit: note that, with a long dataset layout, the code does not need to know the cc values present in the dataset. It is completely data-driven.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

PeterClemmensen · Posted 04-09-2021 02:42 PM

What does your desired result look like given this data ?

The DATA to DATA Step Macro
Blog: SASnrd

bknitch · Posted 04-09-2021 02:54 PM

hi @PeterClemmensen An optimal output would be to identify the header name as the output or an indication of what CC was in the submission identified. Or just retaining the flags where the observation was present on the C_IND=1 and not present on the C_IND=0

ID	C_IND	CC
123	1	CC2
234	1	CC1
657	1	CC8

Or something like this

ID	C_IND	CC1	CC2	CC6	CC8	CC9	CC10
123	1	0	1	0	0	0	0
234	1	1	0	0	0	0	0
657	1	0	0	0	1	0	0

Kurt_Bremser · Posted 04-09-2021 03:06 PM

Such code is easy with a long dataset layout:

data have;
input ID$ CC_IND CC1 CC2 CC6 CC8 CC9 CC10;
cards;
123 0 1 0 0 0 0 1 
123 1 0 1 0 0 0 1 
124 0 0 0 0 0 0 1
124 1 0 0 0 0 0 1
125 0 1 1 0 0 0 0
125 1 1 1 0 0 1 0
126 0 1 0 1 0 0 0
126 1 0 0 1 0 0 0
;

proc transpose
  data=have
  out=long (
    rename=(_name_=cc col1=submitted)
    where=(submitted = 1)
  )
;
by id cc_ind;
run;

proc sort data=long;
by id cc;
run;

data want;
merge
  long (in=zero where=(cc_ind = 0))
  long (in=one where=(cc_ind = 1))
;
by id cc;
if one and not zero;
keep id cc;
run;

Edit: note that, with a long dataset layout, the code does not need to know the cc values present in the dataset. It is completely data-driven.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

bknitch · Posted 04-09-2021 03:40 PM

Thanks @Kurt_Bremser this worked really well. I thought about transposing it back to long and skinny from wide but didn't know the next steps from there. This was very helpful!

PeterClemmensen · Posted 04-09-2021 03:07 PM

Try something like this

data have;
input ID$ CC_IND CC1 CC2 CC6 CC8 CC9 CC10;
cards;
123 0 1 0 0 0 0 1 
123 1 0 1 0 0 0 1 
124 0 0 0 0 0 0 1
124 1 0 0 0 0 0 1
125 0 1 1 0 0 0 0
125 1 1 1 0 0 1 0
126 0 1 0 1 0 0 0
126 1 0 0 1 0 0 0
;

data want(keep = ID cc);
   set have;
   where CC_IND = 1;
   array r{*} CC1 -- CC10;
   cc = vname(r[whichn(1, of r[*])]);
run;

The DATA to DATA Step Macro
Blog: SASnrd

bknitch · Posted 04-09-2021 03:38 PM

@PeterClemmensen Thank you for the code. This actually did not capture the correct scenarios. It did reduce the data set to those ID's with a CC_IND=1 but it identified all of them, no matter if the same ID had a CC = to a CC with a CC_IND=1.

Meaning, it identified scenarios where the ID had both a CC_IND= 1 & 0 and a CC1=1 on both records and this codes output gave me that ID.

PGStats · Posted 04-09-2021 03:09 PM

Given your example data:

proc sql;
select 
    id,
    range(cc_ind) = 1 and min(cc_ind=cc1) = 1 as CC1,
    range(cc_ind) = 1 and min(cc_ind=cc2) = 1 as CC2,
    range(cc_ind) = 1 and min(cc_ind=cc6) = 1 as CC6,
    range(cc_ind) = 1 and min(cc_ind=cc8) = 1 as CC8,
    range(cc_ind) = 1 and min(cc_ind=cc9) = 1 as CC9,
    range(cc_ind) = 1 and min(cc_ind=cc10) = 1 as CC10
from have
group by id;
quit;

PG

bknitch · Posted 04-09-2021 03:35 PM

Hi @PGStats thanks for the Proc SQL example, this actually works very well. Never thought about building a range out as i do something similar to calculate the CC flags. Thanks again!

Classroom Training Available!