Hi guys,
suppose to have this highly sparse dataset:
data DB;
input ID :$20. Index Variable1 Variable2 Variable3 Variable4;
cards;
0001 1 . 1 . .
0001 0 1 . . 1
0002 1 1 . . 1
0002 0 . 1 . .
0003 1 . . . .
0003 0 2 . 4 .
0003 0 . 3 . .
0003 0 . . . .
0003 0 . . . 8
run;
Is there a way to move all values of Variable: where Index = 1 to have the following?
data DB1;
input ID :$20. Index Variable1 Variable2 Variable3 Variable4;
cards;
0001 1 1 1 . 1
0001 0 . . . .
0002 1 1 1 . 1
0002 0 . . . .
0003 1 2 3 4 8
0003 0 . . . .
0003 0 . . . .
0003 0 . . . .
0003 0 . . . .
run;
Note: no replicates are present by ID for each Variable:
Thank you in advance
Just collapse and then remerge. If you only have one observation with INDEX=1 per ID value then using a conditional SET statement will work.
data DB;
input ID :$20. Index Variable1-Variable4;
cards;
0001 1 . 1 . .
0001 0 1 . . 1
0002 1 1 . . 1
0002 0 . 1 . .
0003 1 . . . .
0003 0 2 . 4 .
0003 0 . 3 . .
0003 0 . . . .
0003 0 . . . 8
;
data final;
update db(obs=0) db;
by id;
drop index;
run;
data want;
set db (keep=id index) ;
if index then set final ;
else call missing(of variable1-variable4);
run;
Having to do things like this makes it seems that you have issues in your upstream processes. How did you end up with such a messy dataset? What are you going to do with the resulting dataset? It does not look that useful for analysis. What value do all of those observations with only missing values add??
Just collapse and then remerge. If you only have one observation with INDEX=1 per ID value then using a conditional SET statement will work.
data DB;
input ID :$20. Index Variable1-Variable4;
cards;
0001 1 . 1 . .
0001 0 1 . . 1
0002 1 1 . . 1
0002 0 . 1 . .
0003 1 . . . .
0003 0 2 . 4 .
0003 0 . 3 . .
0003 0 . . . .
0003 0 . . . 8
;
data final;
update db(obs=0) db;
by id;
drop index;
run;
data want;
set db (keep=id index) ;
if index then set final ;
else call missing(of variable1-variable4);
run;
Having to do things like this makes it seems that you have issues in your upstream processes. How did you end up with such a messy dataset? What are you going to do with the resulting dataset? It does not look that useful for analysis. What value do all of those observations with only missing values add??
@Tom wrote:
Having to do things like this makes it seems that you have issues in your upstream processes. How did you end up with such a messy dataset? What are you going to do with the resulting dataset? It does not look that useful for analysis. What value do all of those observations with only missing values add??
I suspect it may be related to the https://communities.sas.com/t5/New-SAS-User/Fill-with-values-a-set-of-variables-to-reach-the-total/m... bit where the "fill to value" was apparently to fix an incorrect count. Perhaps this data the index=0 with all the missing values are the "not an event" that the filled in values represented. I agree that this seems to represent a moderately flawed data structure at some point of the process. Look at https://communities.sas.com/t5/New-SAS-User/From-long-format-to-short-data-format/td-p/953512 for where the Variable1-Variable4 may have come from as well.
This is one way to accomplish your task:
data DB;
input ID :$20. Index Variable1 Variable2 Variable3 Variable4;
cards;
0001 1 . 1 . .
0001 0 1 . . 1
0002 1 1 . . 1
0002 0 . 1 . .
0003 1 . . . .
0003 0 2 . 4 .
0003 0 . 3 . .
0003 0 . . . .
0003 0 . . . 8
;
proc sort out=newdb;
by descending id index;
run;
data new;
set newdb;
retain tvar1 tvar2 tvar3 tvar4;
array vars(*) variable1-variable4;
array tempvars(*) tvar1-tvar4;
by descending id index;
if index=0 then do;
do i= 1 to dim(vars);
if vars(i) ne . then tempvars(i)=vars(i);
vars(i)=.;
end;
end;
else if index=1 then do;
do i= 1 to dim(vars);
if vars(i)=. then vars(i)=tempvars(i);
end;
call missing(tvar1,tvar2,tvar3,tvar4);
end;
drop tvar: i;
run;
proc sort out=newdb;
by id descending index;
run;
proc print;
run;
Wouldn't it be simpler to just collapse to one observation per ID?
proc summary data=db ;
by id;
var index variable1-variable4 ;
output out=want(drop=_type_ _freq_) max= ;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.