BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
NewUsrStat
Lapis Lazuli | Level 10

Hi guys, 

suppose to have the following: 

 

data DB;
  input ID :$20. Admission :date09. Discharge :date09. Index Variable1 Variable2;
  format Admission date9. Discharge date9.;
cards;
166  16FEB2019 26FEB2019  1   0  0
166  18MAR2019 25MAR2019  0   1  1
166  12APR2020 02JUN2020  0   0  0
170  22FEB2017 07MAR2017  1   0  0
170  22FEB2017 07MAR2017  0   1  1
170  30JAN2019 04MAR2019  0   0  0
313  03MAR2016 10MAR2016  1   0  0
313  03MAR2016 10MAR2016  0   1  1
313  12DEC2019 15DEC2019  0   0  0
215  22DEC2014 25DEC2014  1   1  1  
; 

I would like to "move" all values = 1 of Variable1 and Variable2 where Index = 1 to get the following: 

 

data DB1;
  input ID :$20. Admission :date09. Discharge :date09. Index Variable1 Variable2;
  format Admission date9. Discharge date9.;
cards;
166  16FEB2019 26FEB2019  1   1  1
166  18MAR2019 25MAR2019  0   0  0
166  12APR2020 02JUN2020  0   0  0
170  22FEB2017 07MAR2017  1   1  1
170  22FEB2017 07MAR2017  0   0  0
170  30JAN2019 04MAR2019  0   0  0
313  03MAR2016 10MAR2016  1   1  1
313  03MAR2016 10MAR2016  0   0  0
313  12DEC2019 15DEC2019  0   0  0
215  22DEC2014 25DEC2014  1   1  1  
; 

To do this for one variable I do: 

data db_temp;
	retain max_index;
	set DB;
	by ID;
	
	if first.ID then max_index=0;
	max_index=max(max_index,Variable1);
	if last.ID;
	keep ID max_index;
run;

data DB1;
	merge DB db_temp;
	by ID;
	if Index=1 then Variable1=max_index;
	else if Variable1=max_index then Variable1=0;
run;

How if I need to run also on Variable2 and potentially VariablesN instead of one variable? How the code should be modified? 

 

Thank you in advance

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

There needs to be a concrete rule before you can program it.

It sounds like perhaps you want the MAX values per ID to be attached to the observation where INDEX=1? If so just take the max and merge it back on.  Drop the original variables so you can have the other values be zero.

proc sort data=db;
  by id descending index;
run;

proc summary data=db  ;
  by id;
  output out=summary(drop=_type_ _freq_) max(index variable1 variable2)=;
run;


data want ;
  merge db(drop=variable1 variable2) summary;
  by id descending index;
  variable1+0;
  variable2+0;
run;

If you want the cases that do not have an index record left alone then split the data up before merging it back together.

data db_index(drop=variable1 variable2) db_other;
  merge db summary(keep=id index rename=(index=index2) where=(index2=1)) ;
  by id;
  if index2=1 then output db_index;
  else output db_other;
  drop index2;
run;
data want ;
  merge db_other db_index(where=(index=1)) summary(where=(index=1));
  by id descending index;
  variable1+0;
  variable2+0;
run;

But it sounds like a silly thing to do.  Is there a reason that motivates this desire?   Perhaps there is another way to do what you actually want to do if you could describe it.

View solution in original post

4 REPLIES 4
Tom
Super User Tom
Super User

There needs to be a concrete rule before you can program it.

It sounds like perhaps you want the MAX values per ID to be attached to the observation where INDEX=1? If so just take the max and merge it back on.  Drop the original variables so you can have the other values be zero.

proc sort data=db;
  by id descending index;
run;

proc summary data=db  ;
  by id;
  output out=summary(drop=_type_ _freq_) max(index variable1 variable2)=;
run;


data want ;
  merge db(drop=variable1 variable2) summary;
  by id descending index;
  variable1+0;
  variable2+0;
run;

If you want the cases that do not have an index record left alone then split the data up before merging it back together.

data db_index(drop=variable1 variable2) db_other;
  merge db summary(keep=id index rename=(index=index2) where=(index2=1)) ;
  by id;
  if index2=1 then output db_index;
  else output db_other;
  drop index2;
run;
data want ;
  merge db_other db_index(where=(index=1)) summary(where=(index=1));
  by id descending index;
  variable1+0;
  variable2+0;
run;

But it sounds like a silly thing to do.  Is there a reason that motivates this desire?   Perhaps there is another way to do what you actually want to do if you could describe it.

Kurt_Bremser
Super User

Calculate the max with PROC SUMMARY, load the result into a hash, and use it for the first observation.

Note that the names of the variables need to be determined first for use in the definition of the hash.

proc summary data=db nway;
by id notsorted;
var var:;
output out=maxed (drop=_type_ _freq_) max()=;
run;

proc sql noprint;
select quote(name) into :vars separated by ","
from dictionary.columns
where libname = "WORK" and memname = "DB" and upcase(name) like 'VAR%';
quit;

data want;
set db;
by id notsorted;
if _n_ = 1
then do;
  declare hash m (dataset:"maxed");
  m.definekey("id");
  m.definedata(&vars.);
  m.definedone();
end;
array var {*} var:;
if first.id
then i = m.find();
else do i = 1 to dim(var);
  var{i} = 0;
end;
drop i;
run;
Ksharp
Super User

Assuming I understood what you mean.

 

data DB;
  input ID :$20. Admission :date09. Discharge :date09. Index Variable1 Variable2;
  format Admission date9. Discharge date9.;
cards;
166  16FEB2019 26FEB2019  1   0  0
166  18MAR2019 25MAR2019  0   1  1
166  12APR2020 02JUN2020  0   0  0
170  22FEB2017 07MAR2017  1   0  0
170  22FEB2017 07MAR2017  0   1  1
170  30JAN2019 04MAR2019  0   0  0
313  03MAR2016 10MAR2016  1   0  0
313  03MAR2016 10MAR2016  0   1  1
313  12DEC2019 15DEC2019  0   0  0
315  22DEC2014 25DEC2014  1   1  1  
; 

data want;
 do until(last.id);
  set DB;
  by id;
  if Variable1=1 then v1=1;
  if Variable2=1 then v2=1;
 end;
 do until(last.id);
  set DB;
  by id;
  if Index=1 then do;Variable1=coalesce(v1,0);Variable2=coalesce(v2,0);end;
   else do;Variable1=0;Variable2=0;end;
  output;
 end;
drop v1 v2;
run;
Patrick
Opal | Level 21

Just based on your sample data and desired outcome the code could be as simple as below to generate the desired result.

data have;
  input ID :$20. Admission :date09. Discharge :date09. Index Variable1 Variable2;
  format Admission date9. Discharge date9.;
cards;
166  16FEB2019 26FEB2019  1   0  0
166  18MAR2019 25MAR2019  0   1  1
166  12APR2020 02JUN2020  0   0  0
170  22FEB2017 07MAR2017  1   0  0
170  22FEB2017 07MAR2017  0   1  1
170  30JAN2019 04MAR2019  0   0  0
313  03MAR2016 10MAR2016  1   0  0
313  03MAR2016 10MAR2016  0   1  1
313  12DEC2019 15DEC2019  0   0  0
215  22DEC2014 25DEC2014  1   1  1  
; 

data want_1;
  set have;
  array vars{*} Variable1 Variable2;
  do i=1 to dim(vars);
    vars[i]=index;
  end;
  drop i;
run;

proc print data=want_1;
run;

But then looking at your code it appears your real data also contains other cases that require additional logic. Something like below could work.

data want_2;
  if _n_=1 then
    do;
      dcl hash h1();
      h1.defineKey('id');
      h1.defineDone();
      do until(last);
        set have end=last;
        array vars[*] Variable1 Variable2;
        if h1.check() ne 0 and max(of vars[*])=1 then h1.add();
      end;
    end;
  set have;
  do i=1 to dim(vars);
    if index=1 then vars[i]= (h1.check()=0);
    else vars[i]=0;
  end;
  drop i;
run;

proc print data=want_2;
run;

If none of the solutions provided work for your actual data then I suggest you provide amended sample data that includes the cases where things aren't working.