Solved: Re: Modify a code to run on many variables

NewUsrStat · Posted 11-24-2024 07:34 AM

Hi guys,

suppose to have the following:

data DB;
  input ID :$20. Admission :date09. Discharge :date09. Index Variable1 Variable2;
  format Admission date9. Discharge date9.;
cards;
166  16FEB2019 26FEB2019  1   0  0
166  18MAR2019 25MAR2019  0   1  1
166  12APR2020 02JUN2020  0   0  0
170  22FEB2017 07MAR2017  1   0  0
170  22FEB2017 07MAR2017  0   1  1
170  30JAN2019 04MAR2019  0   0  0
313  03MAR2016 10MAR2016  1   0  0
313  03MAR2016 10MAR2016  0   1  1
313  12DEC2019 15DEC2019  0   0  0
215  22DEC2014 25DEC2014  1   1  1  
;

I would like to "move" all values = 1 of Variable1 and Variable2 where Index = 1 to get the following:

data DB1;
  input ID :$20. Admission :date09. Discharge :date09. Index Variable1 Variable2;
  format Admission date9. Discharge date9.;
cards;
166  16FEB2019 26FEB2019  1   1  1
166  18MAR2019 25MAR2019  0   0  0
166  12APR2020 02JUN2020  0   0  0
170  22FEB2017 07MAR2017  1   1  1
170  22FEB2017 07MAR2017  0   0  0
170  30JAN2019 04MAR2019  0   0  0
313  03MAR2016 10MAR2016  1   1  1
313  03MAR2016 10MAR2016  0   0  0
313  12DEC2019 15DEC2019  0   0  0
215  22DEC2014 25DEC2014  1   1  1  
;

To do this for one variable I do:

data db_temp;
	retain max_index;
	set DB;
	by ID;
	
	if first.ID then max_index=0;
	max_index=max(max_index,Variable1);
	if last.ID;
	keep ID max_index;
run;

data DB1;
	merge DB db_temp;
	by ID;
	if Index=1 then Variable1=max_index;
	else if Variable1=max_index then Variable1=0;
run;

How if I need to run also on Variable2 and potentially VariablesN instead of one variable? How the code should be modified?

Thank you in advance

Tom · Posted 11-24-2024 02:06 PM

There needs to be a concrete rule before you can program it.

It sounds like perhaps you want the MAX values per ID to be attached to the observation where INDEX=1? If so just take the max and merge it back on. Drop the original variables so you can have the other values be zero.

proc sort data=db;
  by id descending index;
run;

proc summary data=db  ;
  by id;
  output out=summary(drop=_type_ _freq_) max(index variable1 variable2)=;
run;


data want ;
  merge db(drop=variable1 variable2) summary;
  by id descending index;
  variable1+0;
  variable2+0;
run;

If you want the cases that do not have an index record left alone then split the data up before merging it back together.

data db_index(drop=variable1 variable2) db_other;
  merge db summary(keep=id index rename=(index=index2) where=(index2=1)) ;
  by id;
  if index2=1 then output db_index;
  else output db_other;
  drop index2;
run;
data want ;
  merge db_other db_index(where=(index=1)) summary(where=(index=1));
  by id descending index;
  variable1+0;
  variable2+0;
run;

But it sounds like a silly thing to do. Is there a reason that motivates this desire? Perhaps there is another way to do what you actually want to do if you could describe it.

View solution in original post

Tom · Posted 11-24-2024 02:06 PM

There needs to be a concrete rule before you can program it.

It sounds like perhaps you want the MAX values per ID to be attached to the observation where INDEX=1? If so just take the max and merge it back on. Drop the original variables so you can have the other values be zero.

proc sort data=db;
  by id descending index;
run;

proc summary data=db  ;
  by id;
  output out=summary(drop=_type_ _freq_) max(index variable1 variable2)=;
run;


data want ;
  merge db(drop=variable1 variable2) summary;
  by id descending index;
  variable1+0;
  variable2+0;
run;

If you want the cases that do not have an index record left alone then split the data up before merging it back together.

data db_index(drop=variable1 variable2) db_other;
  merge db summary(keep=id index rename=(index=index2) where=(index2=1)) ;
  by id;
  if index2=1 then output db_index;
  else output db_other;
  drop index2;
run;
data want ;
  merge db_other db_index(where=(index=1)) summary(where=(index=1));
  by id descending index;
  variable1+0;
  variable2+0;
run;

But it sounds like a silly thing to do. Is there a reason that motivates this desire? Perhaps there is another way to do what you actually want to do if you could describe it.

Kurt_Bremser · Posted 11-24-2024 02:32 PM

Calculate the max with PROC SUMMARY, load the result into a hash, and use it for the first observation.

Note that the names of the variables need to be determined first for use in the definition of the hash.

proc summary data=db nway;
by id notsorted;
var var:;
output out=maxed (drop=_type_ _freq_) max()=;
run;

proc sql noprint;
select quote(name) into :vars separated by ","
from dictionary.columns
where libname = "WORK" and memname = "DB" and upcase(name) like 'VAR%';
quit;

data want;
set db;
by id notsorted;
if _n_ = 1
then do;
  declare hash m (dataset:"maxed");
  m.definekey("id");
  m.definedata(&vars.);
  m.definedone();
end;
array var {*} var:;
if first.id
then i = m.find();
else do i = 1 to dim(var);
  var{i} = 0;
end;
drop i;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Ksharp · Posted 11-24-2024 08:00 PM

Assuming I understood what you mean.

data DB;
  input ID :$20. Admission :date09. Discharge :date09. Index Variable1 Variable2;
  format Admission date9. Discharge date9.;
cards;
166  16FEB2019 26FEB2019  1   0  0
166  18MAR2019 25MAR2019  0   1  1
166  12APR2020 02JUN2020  0   0  0
170  22FEB2017 07MAR2017  1   0  0
170  22FEB2017 07MAR2017  0   1  1
170  30JAN2019 04MAR2019  0   0  0
313  03MAR2016 10MAR2016  1   0  0
313  03MAR2016 10MAR2016  0   1  1
313  12DEC2019 15DEC2019  0   0  0
315  22DEC2014 25DEC2014  1   1  1  
; 

data want;
 do until(last.id);
  set DB;
  by id;
  if Variable1=1 then v1=1;
  if Variable2=1 then v2=1;
 end;
 do until(last.id);
  set DB;
  by id;
  if Index=1 then do;Variable1=coalesce(v1,0);Variable2=coalesce(v2,0);end;
   else do;Variable1=0;Variable2=0;end;
  output;
 end;
drop v1 v2;
run;

Patrick · Posted 11-24-2024 09:50 PM

Just based on your sample data and desired outcome the code could be as simple as below to generate the desired result.

data have;
  input ID :$20. Admission :date09. Discharge :date09. Index Variable1 Variable2;
  format Admission date9. Discharge date9.;
cards;
166  16FEB2019 26FEB2019  1   0  0
166  18MAR2019 25MAR2019  0   1  1
166  12APR2020 02JUN2020  0   0  0
170  22FEB2017 07MAR2017  1   0  0
170  22FEB2017 07MAR2017  0   1  1
170  30JAN2019 04MAR2019  0   0  0
313  03MAR2016 10MAR2016  1   0  0
313  03MAR2016 10MAR2016  0   1  1
313  12DEC2019 15DEC2019  0   0  0
215  22DEC2014 25DEC2014  1   1  1  
; 

data want_1;
  set have;
  array vars{*} Variable1 Variable2;
  do i=1 to dim(vars);
    vars[i]=index;
  end;
  drop i;
run;

proc print data=want_1;
run;

But then looking at your code it appears your real data also contains other cases that require additional logic. Something like below could work.

data want_2;
  if _n_=1 then
    do;
      dcl hash h1();
      h1.defineKey('id');
      h1.defineDone();
      do until(last);
        set have end=last;
        array vars[*] Variable1 Variable2;
        if h1.check() ne 0 and max(of vars[*])=1 then h1.add();
      end;
    end;
  set have;
  do i=1 to dim(vars);
    if index=1 then vars[i]= (h1.check()=0);
    else vars[i]=0;
  end;
  drop i;
run;

proc print data=want_2;
run;

If none of the solutions provided work for your actual data then I suggest you provide amended sample data that includes the cases where things aren't working.

Modify a code to run on many variables

Re: Modify a code to run on many variables

Re: Modify a code to run on many variables

Re: Modify a code to run on many variables

Re: Modify a code to run on many variables

Re: Modify a code to run on many variables

Registration is open