BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
bknitch
Quartz | Level 8
data have;
input ID$ CC_IND CC1 CC2 CC6 CC8 CC9 CC10;
cards;
123 0 1 0 0 0 0 1 
123 1 0 1 0 0 0 1 
124 0 0 0 0 0 0 1
124 1 0 0 0 0 0 1
125 0 1 1 0 0 0 0
125 1 1 1 0 0 1 0
126 0 1 0 1 0 0 0
126 1 0 0 1 0 0 0
;
run;

Above is a sample of my table that contains roughly 8,000 IDs with flags indicating the ID has a CC present or not (=1 is present =0 is not present). There is also an indicator (C_IND) that lets me know what type of submission this was.

I'd like to find which CC is populated as 1 by ID and CC_IND. Essentially I'm trying to find what CC's are present on ID's with a CC_IND=1 that are not populated on a CC_IND=0.

An example would be ID 123 where CC2=1 on the CC_IND=1 record but not present on the CC_IND=0 record.

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Such code is easy with a long dataset layout:

data have;
input ID$ CC_IND CC1 CC2 CC6 CC8 CC9 CC10;
cards;
123 0 1 0 0 0 0 1 
123 1 0 1 0 0 0 1 
124 0 0 0 0 0 0 1
124 1 0 0 0 0 0 1
125 0 1 1 0 0 0 0
125 1 1 1 0 0 1 0
126 0 1 0 1 0 0 0
126 1 0 0 1 0 0 0
;

proc transpose
  data=have
  out=long (
    rename=(_name_=cc col1=submitted)
    where=(submitted = 1)
  )
;
by id cc_ind;
run;

proc sort data=long;
by id cc;
run;

data want;
merge
  long (in=zero where=(cc_ind = 0))
  long (in=one where=(cc_ind = 1))
;
by id cc;
if one and not zero;
keep id cc;
run;

 

 

Edit: note that, with a long dataset layout, the code does not need to know the cc values present in the dataset. It is completely data-driven.

View solution in original post

8 REPLIES 8
PeterClemmensen
Tourmaline | Level 20

What does your desired result look like given this data ?

bknitch
Quartz | Level 8

hi @PeterClemmensen  An optimal output would be to identify the header name as the output or an indication of what CC was in the submission identified. Or just retaining the flags where the observation was present on the C_IND=1 and not present on the C_IND=0

ID C_IND CC
123 1 CC2
234 1 CC1
657 1 CC8

 

Or something like this 

ID C_IND CC1 CC2 CC6 CC8 CC9 CC10
123 1 0 1 0 0 0 0
234 1 1 0 0 0 0 0
657 1 0 0 0 1 0 0
Kurt_Bremser
Super User

Such code is easy with a long dataset layout:

data have;
input ID$ CC_IND CC1 CC2 CC6 CC8 CC9 CC10;
cards;
123 0 1 0 0 0 0 1 
123 1 0 1 0 0 0 1 
124 0 0 0 0 0 0 1
124 1 0 0 0 0 0 1
125 0 1 1 0 0 0 0
125 1 1 1 0 0 1 0
126 0 1 0 1 0 0 0
126 1 0 0 1 0 0 0
;

proc transpose
  data=have
  out=long (
    rename=(_name_=cc col1=submitted)
    where=(submitted = 1)
  )
;
by id cc_ind;
run;

proc sort data=long;
by id cc;
run;

data want;
merge
  long (in=zero where=(cc_ind = 0))
  long (in=one where=(cc_ind = 1))
;
by id cc;
if one and not zero;
keep id cc;
run;

 

 

Edit: note that, with a long dataset layout, the code does not need to know the cc values present in the dataset. It is completely data-driven.

bknitch
Quartz | Level 8
Thanks @Kurt_Bremser this worked really well. I thought about transposing it back to long and skinny from wide but didn't know the next steps from there. This was very helpful!
PeterClemmensen
Tourmaline | Level 20

Try something like this

 

data have;
input ID$ CC_IND CC1 CC2 CC6 CC8 CC9 CC10;
cards;
123 0 1 0 0 0 0 1 
123 1 0 1 0 0 0 1 
124 0 0 0 0 0 0 1
124 1 0 0 0 0 0 1
125 0 1 1 0 0 0 0
125 1 1 1 0 0 1 0
126 0 1 0 1 0 0 0
126 1 0 0 1 0 0 0
;

data want(keep = ID cc);
   set have;
   where CC_IND = 1;
   array r{*} CC1 -- CC10;
   cc = vname(r[whichn(1, of r[*])]);
run;
bknitch
Quartz | Level 8

@PeterClemmensen  Thank you for the code. This actually did not capture the correct scenarios. It did reduce the data set to those ID's with a CC_IND=1 but it identified all of them, no matter if the same ID had a CC = to a CC with a CC_IND=1. 

 

Meaning, it identified scenarios where the ID had both a CC_IND= 1 & 0 and a CC1=1 on both records and this codes output gave me that ID. 

PGStats
Opal | Level 21

Given your example data:

 

proc sql;
select 
    id,
    range(cc_ind) = 1 and min(cc_ind=cc1) = 1 as CC1,
    range(cc_ind) = 1 and min(cc_ind=cc2) = 1 as CC2,
    range(cc_ind) = 1 and min(cc_ind=cc6) = 1 as CC6,
    range(cc_ind) = 1 and min(cc_ind=cc8) = 1 as CC8,
    range(cc_ind) = 1 and min(cc_ind=cc9) = 1 as CC9,
    range(cc_ind) = 1 and min(cc_ind=cc10) = 1 as CC10
from have
group by id;
quit;

PGStats_0-1617995343967.png

 

PG
bknitch
Quartz | Level 8

Hi @PGStats  thanks for the Proc SQL example, this actually works very well. Never thought about building a range out as i do something similar to calculate the CC flags. Thanks again! 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 661 views
  • 3 likes
  • 4 in conversation