BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
NewUsrStat
Lapis Lazuli | Level 10

Hi guys, 

suppose to have this highly sparse dataset: 

 

data DB;
  input ID :$20. Index Variable1  Variable2  Variable3 Variable4;
cards;
0001 1  .  1  .   .
0001 0  1  .  .   1
0002 1  1  .  .   1
0002 0  .  1  .   .
0003 1  .  .  .   .
0003 0  2  .  4   .
0003 0  .  3  .   .
0003 0  .  .  .   .
0003 0  .  .  .   8  
run;

Is there a way to move all values of Variable: where Index = 1 to have the following? 

data DB1;
  input ID :$20. Index Variable1  Variable2  Variable3 Variable4;
cards;
0001 1  1  1  .   1
0001 0  .  .  .   .
0002 1  1  1  .   1
0002 0  .  .  .   .
0003 1  2  3  4   8
0003 0  .  .  .   .
0003 0  .  .  .   .
0003 0  .  .  .   .
0003 0  .  .  .   .  
run;

Note: no replicates are present by ID for each Variable:

Thank you in advance

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Just collapse and then remerge.  If you only have one observation with INDEX=1 per ID value then using a conditional SET statement will work.

data DB;
  input ID :$20. Index Variable1-Variable4;
cards;
0001 1  .  1  .   .
0001 0  1  .  .   1
0002 1  1  .  .   1
0002 0  .  1  .   .
0003 1  .  .  .   .
0003 0  2  .  4   .
0003 0  .  3  .   .
0003 0  .  .  .   .
0003 0  .  .  .   8
;

data final;
  update db(obs=0) db;
  by id;
  drop index;
run;

data want;
  set db (keep=id index) ;
  if index then set final ;
  else call missing(of variable1-variable4);
run;

Having to do things like this makes it seems that you have issues in your upstream processes.  How did you end up with such a messy dataset?  What are you going to do with the resulting dataset? It does not look that useful for analysis.  What value do all of those observations with only missing values add??

 

View solution in original post

4 REPLIES 4
Tom
Super User Tom
Super User

Just collapse and then remerge.  If you only have one observation with INDEX=1 per ID value then using a conditional SET statement will work.

data DB;
  input ID :$20. Index Variable1-Variable4;
cards;
0001 1  .  1  .   .
0001 0  1  .  .   1
0002 1  1  .  .   1
0002 0  .  1  .   .
0003 1  .  .  .   .
0003 0  2  .  4   .
0003 0  .  3  .   .
0003 0  .  .  .   .
0003 0  .  .  .   8
;

data final;
  update db(obs=0) db;
  by id;
  drop index;
run;

data want;
  set db (keep=id index) ;
  if index then set final ;
  else call missing(of variable1-variable4);
run;

Having to do things like this makes it seems that you have issues in your upstream processes.  How did you end up with such a messy dataset?  What are you going to do with the resulting dataset? It does not look that useful for analysis.  What value do all of those observations with only missing values add??

 

ballardw
Super User

@Tom wrote:

Having to do things like this makes it seems that you have issues in your upstream processes.  How did you end up with such a messy dataset?  What are you going to do with the resulting dataset? It does not look that useful for analysis.  What value do all of those observations with only missing values add??

 


I suspect it may be related to the https://communities.sas.com/t5/New-SAS-User/Fill-with-values-a-set-of-variables-to-reach-the-total/m... bit where the "fill to value" was apparently to fix an incorrect count. Perhaps this data the index=0 with all the missing values are the "not an event" that the filled in values represented. I agree that this seems to represent a moderately flawed data structure at some point of the process. Look at https://communities.sas.com/t5/New-SAS-User/From-long-format-to-short-data-format/td-p/953512 for where the Variable1-Variable4 may have come from as well.

russt_sas
SAS Employee

This is one way to accomplish your task:

 

data DB;
input ID :$20. Index Variable1 Variable2 Variable3 Variable4;
cards;
0001 1 . 1 . .
0001 0 1 . . 1
0002 1 1 . . 1
0002 0 . 1 . .
0003 1 . . . .
0003 0 2 . 4 .
0003 0 . 3 . .
0003 0 . . . .
0003 0 . . . 8
;

proc sort out=newdb;
by descending id index;
run;

data new;
set newdb;
retain tvar1 tvar2 tvar3 tvar4;
array vars(*) variable1-variable4;
array tempvars(*) tvar1-tvar4;
by descending id index;
if index=0 then do;
do i= 1 to dim(vars);
if vars(i) ne . then tempvars(i)=vars(i);
vars(i)=.;
end;
end;
else if index=1 then do;
do i= 1 to dim(vars);
if vars(i)=. then vars(i)=tempvars(i);
end;
call missing(tvar1,tvar2,tvar3,tvar4);
end;
drop tvar: i;
run;

proc sort out=newdb;
by id descending index;
run;

proc print;

run;

Tom
Super User Tom
Super User

Wouldn't it be simpler to just collapse to one observation per ID?

proc summary data=db ;
  by id;
  var index variable1-variable4 ;
  output out=want(drop=_type_ _freq_) max= ;
run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1761 views
  • 3 likes
  • 4 in conversation