BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
PeterBr
Obsidian | Level 7

Hi All,

 

I have the following have data where I only care about codes a, b, and c. I have an identifier variable for each of the codes that are 1 or 0 for whether that data row has the specific code. However, I'd like that identifier variable for each code to be 1 for every row of the ID if that ID had the code at some point. I've included the want data below. Thanks!

 

data have;
   input id $ code $ code_a code_b code_c ;
   datalines;
1 a 1 0 0
1 b 0 1 0
1 c 0 0 1
2 b 0 1 0
2 f 0 0 0
3 a 1 0 0
3 c 0 0 1
;
run;

data want;
   input id $ code $ code_a code_b code_c ;
   datalines;
1 a 1 1 1
1 b 1 1 1
1 c 1 1 1
2 b 0 1 0
2 f 0 1 0
3 a 1 0 1
3 c 1 0 1
;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20
data have;
   input id $ code $ code_a code_b code_c ;
   datalines;
1 a 1 0 0
1 b 0 1 0
1 c 0 0 1
2 b 0 1 0
2 f 0 0 0
3 a 1 0 0
3 c 0 0 1
;

data want(drop=a b c);
   do until (last.id);
      set have;
      by id;
      a = max(code_a, a);
      b = max(code_b, b);
      c = max(code_c, c);
   end;

   do until (last.id);
      set have;
      by id;
      code_a = a;
      code_b = b;
      code_c = c;
      output;
   end;
run;

View solution in original post

6 REPLIES 6
PeterClemmensen
Tourmaline | Level 20
data have;
   input id $ code $ code_a code_b code_c ;
   datalines;
1 a 1 0 0
1 b 0 1 0
1 c 0 0 1
2 b 0 1 0
2 f 0 0 0
3 a 1 0 0
3 c 0 0 1
;

data want(drop=a b c);
   do until (last.id);
      set have;
      by id;
      a = max(code_a, a);
      b = max(code_b, b);
      c = max(code_c, c);
   end;

   do until (last.id);
      set have;
      by id;
      code_a = a;
      code_b = b;
      code_c = c;
      output;
   end;
run;
mkeintz
PROC Star

Here is a program that takes advantage of how the sas data step populates a PDV (program data vector, think of it as a list of variables.

 

The strategy below does the following:

  1. Merges HAVE with 3 subsets of Have with matching ID's
    1. subset with code_a=1, and with 2 variables (id and code_a).
    2. subset with code_b=1, and with 2 variables (id and code_b).
    3. subset with code_c=1, and with 2 variables (id and code_c).
      The very first observation for each id will have a pattern of 1's and 0's for the codes the you want, but subsequent observations produced by merge would not.

  2. With the desired codes in hand (i.e. when the record-in-hand is the first produced by merge), read in all the other records for the id (except the code variables) and output the records.
data have;
   input id $ code $ code_a code_b code_c ;
   datalines;
1 a 1 0 0
1 b 0 1 0
1 c 0 0 1
2 b 0 1 0
2 f 0 0 0
3 a 1 0 0
3 c 0 0 1
;
run;

data want;
  merge have
        have (keep=id code_a where=(code_a=1))
        have (keep=id code_b where=(code_b=1))
        have (keep=id code_c where=(code_c=1)) ;
  by id;
  if first.id then do until (last.id);
    set have (drop=code_a code_b code_c);
	by id;
	output;
  end;
run;

Ordinarily I would suggest using the proc summary solutioni offered by @Kurt_Bremser , but it produces one record per id, which you would have to re-merge with have if you want the original number of records.

 

And this program may look a little strange compared to the well-known double do offered by draycut.  I'd suggest sticking with @PeterClemmensen's suggestion.  But if you need to scale to situations with multiple variables for each code  (e.g.  code_a1 code_a2 code_a3 code_a4 code_a5, code_b1 code_b2 code_b3 code_b4 code_b5,  code_c1 code_c2 code_c3 code_c4 code_c5 ), then the code above might be easier to maintain.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
novinosrin
Tourmaline | Level 20

Hi @PeterBr 



data have;
   input id $ code $ code_a code_b code_c ;
   datalines;
1 a 1 0 0
1 b 0 1 0
1 c 0 0 1
2 b 0 1 0
2 f 0 0 0
3 a 1 0 0
3 c 0 0 1
;
run;

data want;
 do _iorc_=1 by 1 until(last.id);
  set have;
  by id;
  array t  code_a code_b code_c ;
  array u(3) _temporary_;
  do _n_=1 to dim(t);
   if t(_n_) then u(_n_)=t(_n_);
  end;
 end;
 do _iorc_=1 to _iorc_;
  set have;
  do _n_=1 to dim(t);
   t(_n_)=^^u(_n_);
  end;
  output;
 end;
 call missing(of u(*));
run;

Ksharp
Super User
data have;
   input id $ code $ code_a code_b code_c ;
   datalines;
1 a 1 0 0
1 b 0 1 0
1 c 0 0 1
2 b 0 1 0
2 f 0 0 0
3 a 1 0 0
3 c 0 0 1
;
run;
proc sql;
create table want as
select id,code,
max(code_a) as code_a,
max(code_b) as code_b,
max(code_c) as code_c
from have 
 group by id;
quit;

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 3277 views
  • 2 likes
  • 6 in conversation