BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
PeterBr
Obsidian | Level 7

Hi All,

 

I have the following have data where I only care about codes a, b, and c. I have an identifier variable for each of the codes that are 1 or 0 for whether that data row has the specific code. However, I'd like that identifier variable for each code to be 1 for every row of the ID if that ID had the code at some point. I've included the want data below. Thanks!

 

data have;
   input id $ code $ code_a code_b code_c ;
   datalines;
1 a 1 0 0
1 b 0 1 0
1 c 0 0 1
2 b 0 1 0
2 f 0 0 0
3 a 1 0 0
3 c 0 0 1
;
run;

data want;
   input id $ code $ code_a code_b code_c ;
   datalines;
1 a 1 1 1
1 b 1 1 1
1 c 1 1 1
2 b 0 1 0
2 f 0 1 0
3 a 1 0 1
3 c 1 0 1
;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20
data have;
   input id $ code $ code_a code_b code_c ;
   datalines;
1 a 1 0 0
1 b 0 1 0
1 c 0 0 1
2 b 0 1 0
2 f 0 0 0
3 a 1 0 0
3 c 0 0 1
;

data want(drop=a b c);
   do until (last.id);
      set have;
      by id;
      a = max(code_a, a);
      b = max(code_b, b);
      c = max(code_c, c);
   end;

   do until (last.id);
      set have;
      by id;
      code_a = a;
      code_b = b;
      code_c = c;
      output;
   end;
run;

View solution in original post

6 REPLIES 6
PeterClemmensen
Tourmaline | Level 20
data have;
   input id $ code $ code_a code_b code_c ;
   datalines;
1 a 1 0 0
1 b 0 1 0
1 c 0 0 1
2 b 0 1 0
2 f 0 0 0
3 a 1 0 0
3 c 0 0 1
;

data want(drop=a b c);
   do until (last.id);
      set have;
      by id;
      a = max(code_a, a);
      b = max(code_b, b);
      c = max(code_c, c);
   end;

   do until (last.id);
      set have;
      by id;
      code_a = a;
      code_b = b;
      code_c = c;
      output;
   end;
run;
mkeintz
PROC Star

Here is a program that takes advantage of how the sas data step populates a PDV (program data vector, think of it as a list of variables.

 

The strategy below does the following:

  1. Merges HAVE with 3 subsets of Have with matching ID's
    1. subset with code_a=1, and with 2 variables (id and code_a).
    2. subset with code_b=1, and with 2 variables (id and code_b).
    3. subset with code_c=1, and with 2 variables (id and code_c).
      The very first observation for each id will have a pattern of 1's and 0's for the codes the you want, but subsequent observations produced by merge would not.

  2. With the desired codes in hand (i.e. when the record-in-hand is the first produced by merge), read in all the other records for the id (except the code variables) and output the records.
data have;
   input id $ code $ code_a code_b code_c ;
   datalines;
1 a 1 0 0
1 b 0 1 0
1 c 0 0 1
2 b 0 1 0
2 f 0 0 0
3 a 1 0 0
3 c 0 0 1
;
run;

data want;
  merge have
        have (keep=id code_a where=(code_a=1))
        have (keep=id code_b where=(code_b=1))
        have (keep=id code_c where=(code_c=1)) ;
  by id;
  if first.id then do until (last.id);
    set have (drop=code_a code_b code_c);
	by id;
	output;
  end;
run;

Ordinarily I would suggest using the proc summary solutioni offered by @Kurt_Bremser , but it produces one record per id, which you would have to re-merge with have if you want the original number of records.

 

And this program may look a little strange compared to the well-known double do offered by draycut.  I'd suggest sticking with @PeterClemmensen's suggestion.  But if you need to scale to situations with multiple variables for each code  (e.g.  code_a1 code_a2 code_a3 code_a4 code_a5, code_b1 code_b2 code_b3 code_b4 code_b5,  code_c1 code_c2 code_c3 code_c4 code_c5 ), then the code above might be easier to maintain.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
novinosrin
Tourmaline | Level 20

Hi @PeterBr 



data have;
   input id $ code $ code_a code_b code_c ;
   datalines;
1 a 1 0 0
1 b 0 1 0
1 c 0 0 1
2 b 0 1 0
2 f 0 0 0
3 a 1 0 0
3 c 0 0 1
;
run;

data want;
 do _iorc_=1 by 1 until(last.id);
  set have;
  by id;
  array t  code_a code_b code_c ;
  array u(3) _temporary_;
  do _n_=1 to dim(t);
   if t(_n_) then u(_n_)=t(_n_);
  end;
 end;
 do _iorc_=1 to _iorc_;
  set have;
  do _n_=1 to dim(t);
   t(_n_)=^^u(_n_);
  end;
  output;
 end;
 call missing(of u(*));
run;

Ksharp
Super User
data have;
   input id $ code $ code_a code_b code_c ;
   datalines;
1 a 1 0 0
1 b 0 1 0
1 c 0 0 1
2 b 0 1 0
2 f 0 0 0
3 a 1 0 0
3 c 0 0 1
;
run;
proc sql;
create table want as
select id,code,
max(code_a) as code_a,
max(code_b) as code_b,
max(code_c) as code_c
from have 
 group by id;
quit;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 2057 views
  • 2 likes
  • 6 in conversation