data have;
input ID$ CC_IND CC1 CC2 CC6 CC8 CC9 CC10;
cards;
123 0 1 0 0 0 0 1
123 1 0 1 0 0 0 1
124 0 0 0 0 0 0 1
124 1 0 0 0 0 0 1
125 0 1 1 0 0 0 0
125 1 1 1 0 0 1 0
126 0 1 0 1 0 0 0
126 1 0 0 1 0 0 0
;
run;
Above is a sample of my table that contains roughly 8,000 IDs with flags indicating the ID has a CC present or not (=1 is present =0 is not present). There is also an indicator (C_IND) that lets me know what type of submission this was.
I'd like to find which CC is populated as 1 by ID and CC_IND. Essentially I'm trying to find what CC's are present on ID's with a CC_IND=1 that are not populated on a CC_IND=0.
An example would be ID 123 where CC2=1 on the CC_IND=1 record but not present on the CC_IND=0 record.
Such code is easy with a long dataset layout:
data have;
input ID$ CC_IND CC1 CC2 CC6 CC8 CC9 CC10;
cards;
123 0 1 0 0 0 0 1
123 1 0 1 0 0 0 1
124 0 0 0 0 0 0 1
124 1 0 0 0 0 0 1
125 0 1 1 0 0 0 0
125 1 1 1 0 0 1 0
126 0 1 0 1 0 0 0
126 1 0 0 1 0 0 0
;
proc transpose
data=have
out=long (
rename=(_name_=cc col1=submitted)
where=(submitted = 1)
)
;
by id cc_ind;
run;
proc sort data=long;
by id cc;
run;
data want;
merge
long (in=zero where=(cc_ind = 0))
long (in=one where=(cc_ind = 1))
;
by id cc;
if one and not zero;
keep id cc;
run;
Edit: note that, with a long dataset layout, the code does not need to know the cc values present in the dataset. It is completely data-driven.
What does your desired result look like given this data ?
hi @PeterClemmensen An optimal output would be to identify the header name as the output or an indication of what CC was in the submission identified. Or just retaining the flags where the observation was present on the C_IND=1 and not present on the C_IND=0
ID | C_IND | CC |
123 | 1 | CC2 |
234 | 1 | CC1 |
657 | 1 | CC8 |
Or something like this
ID | C_IND | CC1 | CC2 | CC6 | CC8 | CC9 | CC10 |
123 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
234 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
657 | 1 | 0 | 0 | 0 | 1 | 0 | 0 |
Such code is easy with a long dataset layout:
data have;
input ID$ CC_IND CC1 CC2 CC6 CC8 CC9 CC10;
cards;
123 0 1 0 0 0 0 1
123 1 0 1 0 0 0 1
124 0 0 0 0 0 0 1
124 1 0 0 0 0 0 1
125 0 1 1 0 0 0 0
125 1 1 1 0 0 1 0
126 0 1 0 1 0 0 0
126 1 0 0 1 0 0 0
;
proc transpose
data=have
out=long (
rename=(_name_=cc col1=submitted)
where=(submitted = 1)
)
;
by id cc_ind;
run;
proc sort data=long;
by id cc;
run;
data want;
merge
long (in=zero where=(cc_ind = 0))
long (in=one where=(cc_ind = 1))
;
by id cc;
if one and not zero;
keep id cc;
run;
Edit: note that, with a long dataset layout, the code does not need to know the cc values present in the dataset. It is completely data-driven.
Try something like this
data have;
input ID$ CC_IND CC1 CC2 CC6 CC8 CC9 CC10;
cards;
123 0 1 0 0 0 0 1
123 1 0 1 0 0 0 1
124 0 0 0 0 0 0 1
124 1 0 0 0 0 0 1
125 0 1 1 0 0 0 0
125 1 1 1 0 0 1 0
126 0 1 0 1 0 0 0
126 1 0 0 1 0 0 0
;
data want(keep = ID cc);
set have;
where CC_IND = 1;
array r{*} CC1 -- CC10;
cc = vname(r[whichn(1, of r[*])]);
run;
@PeterClemmensen Thank you for the code. This actually did not capture the correct scenarios. It did reduce the data set to those ID's with a CC_IND=1 but it identified all of them, no matter if the same ID had a CC = to a CC with a CC_IND=1.
Meaning, it identified scenarios where the ID had both a CC_IND= 1 & 0 and a CC1=1 on both records and this codes output gave me that ID.
Given your example data:
proc sql;
select
id,
range(cc_ind) = 1 and min(cc_ind=cc1) = 1 as CC1,
range(cc_ind) = 1 and min(cc_ind=cc2) = 1 as CC2,
range(cc_ind) = 1 and min(cc_ind=cc6) = 1 as CC6,
range(cc_ind) = 1 and min(cc_ind=cc8) = 1 as CC8,
range(cc_ind) = 1 and min(cc_ind=cc9) = 1 as CC9,
range(cc_ind) = 1 and min(cc_ind=cc10) = 1 as CC10
from have
group by id;
quit;
Hi @PGStats thanks for the Proc SQL example, this actually works very well. Never thought about building a range out as i do something similar to calculate the CC flags. Thanks again!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.