Solved: Re: Move values of variables where a variable takes a value

NewUsrStat · Posted 12-19-2024 09:41 AM

Hi guys,

suppose to have this highly sparse dataset:

data DB;
  input ID :$20. Index Variable1  Variable2  Variable3 Variable4;
cards;
0001 1  .  1  .   .
0001 0  1  .  .   1
0002 1  1  .  .   1
0002 0  .  1  .   .
0003 1  .  .  .   .
0003 0  2  .  4   .
0003 0  .  3  .   .
0003 0  .  .  .   .
0003 0  .  .  .   8  
run;

Is there a way to move all values of Variable: where Index = 1 to have the following?

data DB1;
  input ID :$20. Index Variable1  Variable2  Variable3 Variable4;
cards;
0001 1  1  1  .   1
0001 0  .  .  .   .
0002 1  1  1  .   1
0002 0  .  .  .   .
0003 1  2  3  4   8
0003 0  .  .  .   .
0003 0  .  .  .   .
0003 0  .  .  .   .
0003 0  .  .  .   .  
run;

Note: no replicates are present by ID for each Variable:

Thank you in advance

Tom · Posted 12-19-2024 09:54 AM

Just collapse and then remerge. If you only have one observation with INDEX=1 per ID value then using a conditional SET statement will work.

data DB;
  input ID :$20. Index Variable1-Variable4;
cards;
0001 1  .  1  .   .
0001 0  1  .  .   1
0002 1  1  .  .   1
0002 0  .  1  .   .
0003 1  .  .  .   .
0003 0  2  .  4   .
0003 0  .  3  .   .
0003 0  .  .  .   .
0003 0  .  .  .   8
;

data final;
  update db(obs=0) db;
  by id;
  drop index;
run;

data want;
  set db (keep=id index) ;
  if index then set final ;
  else call missing(of variable1-variable4);
run;

Having to do things like this makes it seems that you have issues in your upstream processes. How did you end up with such a messy dataset? What are you going to do with the resulting dataset? It does not look that useful for analysis. What value do all of those observations with only missing values add??

View solution in original post

Tom · Posted 12-19-2024 09:54 AM

Just collapse and then remerge. If you only have one observation with INDEX=1 per ID value then using a conditional SET statement will work.

data DB;
  input ID :$20. Index Variable1-Variable4;
cards;
0001 1  .  1  .   .
0001 0  1  .  .   1
0002 1  1  .  .   1
0002 0  .  1  .   .
0003 1  .  .  .   .
0003 0  2  .  4   .
0003 0  .  3  .   .
0003 0  .  .  .   .
0003 0  .  .  .   8
;

data final;
  update db(obs=0) db;
  by id;
  drop index;
run;

data want;
  set db (keep=id index) ;
  if index then set final ;
  else call missing(of variable1-variable4);
run;

Having to do things like this makes it seems that you have issues in your upstream processes. How did you end up with such a messy dataset? What are you going to do with the resulting dataset? It does not look that useful for analysis. What value do all of those observations with only missing values add??

ballardw · Posted 12-19-2024 10:23 AM

@Tom wrote:

Having to do things like this makes it seems that you have issues in your upstream processes. How did you end up with such a messy dataset? What are you going to do with the resulting dataset? It does not look that useful for analysis. What value do all of those observations with only missing values add??

I suspect it may be related to the https://communities.sas.com/t5/New-SAS-User/Fill-with-values-a-set-of-variables-to-reach-the-total/m... bit where the "fill to value" was apparently to fix an incorrect count. Perhaps this data the index=0 with all the missing values are the "not an event" that the filled in values represented. I agree that this seems to represent a moderately flawed data structure at some point of the process. Look at https://communities.sas.com/t5/New-SAS-User/From-long-format-to-short-data-format/td-p/953512 for where the Variable1-Variable4 may have come from as well.

russt_sas · Posted 12-19-2024 09:58 AM

This is one way to accomplish your task:

data DB;
input ID :$20. Index Variable1 Variable2 Variable3 Variable4;
cards;
0001 1 . 1 . .
0001 0 1 . . 1
0002 1 1 . . 1
0002 0 . 1 . .
0003 1 . . . .
0003 0 2 . 4 .
0003 0 . 3 . .
0003 0 . . . .
0003 0 . . . 8
;

proc sort out=newdb;
by descending id index;
run;

data new;
set newdb;
retain tvar1 tvar2 tvar3 tvar4;
array vars(*) variable1-variable4;
array tempvars(*) tvar1-tvar4;
by descending id index;
if index=0 then do;
do i= 1 to dim(vars);
if vars(i) ne . then tempvars(i)=vars(i);
vars(i)=.;
end;
end;
else if index=1 then do;
do i= 1 to dim(vars);
if vars(i)=. then vars(i)=tempvars(i);
end;
call missing(tvar1,tvar2,tvar3,tvar4);
end;
drop tvar: i;
run;

proc sort out=newdb;
by id descending index;
run;

proc print;

run;

Tom · Posted 12-19-2024 10:37 AM

Wouldn't it be simpler to just collapse to one observation per ID?

proc summary data=db ;
  by id;
  var index variable1-variable4 ;
  output out=want(drop=_type_ _freq_) max= ;
run;

Move values of variables where a variable takes a value

Re: Move values of variables where a variable takes a value

Re: Move values of variables where a variable takes a value

Re: Move values of variables where a variable takes a value

Re: Move values of variables where a variable takes a value

Re: Move values of variables where a variable takes a value

Registration is open