BookmarkSubscribeRSS Feed
smithb
Calcite | Level 5

Greetings,

 

I have a data set of patients with scores corresponding to a specific visit (from 1 to 3). Here is a snippet of the data: 

 

Capture.PNG

As you can see, id 1 has three separate visits. However, id 3 has only 1 visit recorded. What I would like to do is keep the original format of the data set (long) but insert "visit = 2, score = ." or "visit = 2, score = ."  and "visit = 3, score = ." for subjects who have either one or two subsequent missing visits, respectively. I can do this by converting the data set to long format, with three separate scores (score1, score2, score3), but I would like to keep the original format. Also, is there a way to do this in a single data step?

 

Here's my code (converting from long to wide currently):

data center_a;
	input id gender age visit score @@;
	datalines;
1 1 49 1 1.33333
1 1 49 2 4.55556
1 1 49 3 0.55556
2 1 51 1 2.66667
2 1 51 2 4.88889
2 1 51 3 4.22222
3 0 52 1 3.44444
4 0 70 1 3.44444
4 0 70 2 0.44444
4 0 70 3 4.88889
5 0 49 1 3.33333
;
	CENTER = 1;
run;
proc sort data = center_a;
	by id visit;
run;
data center_a_revised;
	set center_a;
	by id visit;
	array subject{3} score1 score2 score3;
	retain score1 score2 score3;
	if (first.id) then do i = 1 to 3;
		subject{i} = .;
	end;
	subject{visit} = score;
	if (lsat.id) then output;
	keep id gender age visit score CENTER;
run;
8 REPLIES 8
novinosrin
Tourmaline | Level 20

Hi @smithb   By any chance are you asking for this?

 


 data center_a;
	input id gender age visit score ;
	datalines;
1 1 49 1 1.33333
1 1 49 2 4.55556
1 1 49 3 0.55556
2 1 51 1 2.66667
2 1 51 2 4.88889
2 1 51 3 4.22222
3 0 52 1 3.44444
4 0 70 1 3.44444
4 0 70 2 0.44444
4 0 70 3 4.88889
5 0 49 1 3.33333
;
%let n=3;/*number of visits*/
data want;
 do until(last.id);
  set center_a;
  by id;
  output;
 end;
 score=.;
 do visit=visit+1 to &n;
  output;
 end;
run;
 
smithb
Calcite | Level 5

Hello @novinosrin , thanks for all your help. I can't tell you how much I appreciate the time you've taken to help me in this. I ran the code, and yes, this is what I was looking for. Thank you so much!

unison
Lapis Lazuli | Level 10

Something like this. I'm sure you can do it in one step, but this is how I think about it using proc transpose.

 

data center_a;
	input id gender age visit score @@;
	datalines;
1 1 49 1 1.33333
1 1 49 2 4.55556
1 1 49 3 0.55556
2 1 51 1 2.66667
2 1 51 2 4.88889
2 1 51 3 4.22222
3 0 52 1 3.44444
4 0 70 1 3.44444
4 0 70 2 0.44444
4 0 70 3 4.88889
5 0 49 1 3.33333
;
run;
proc sort data=center_a; by id visit; run;

proc transpose data=center_a out=center_trx(drop=_name_) prefix=vst_;
by id gender age;
id visit;
var score;
run;

proc transpose data=center_trx out=center_trx_back(rename=col1=score);
by id gender age;
run; 

data want(drop=_name_);
set center_trx_back;
visit = 1*scan(_name_,2,"_");
run;
-unison
Tom
Super User Tom
Super User

Are the missing visits always at the end? 

data center_a_revised;
  set center_a;
  by id visit;
  output;
  if last.id then do visit=visit+1 to 3 ;
    score=.;
    output;
  end;
run;

Or is it possible that some visits are missing in the middle.  So someone has visits 1 and 3 and is missing visit 2?

data center_a_all_visits;
  set center_a(keep=id visit);
  by id;
  if first.id then do visit=1 to 3;
    output;
  end;
run;

data center_a_revised;
  merge center_a center_a_all_visits;
  by id visit;
run;
smithb
Calcite | Level 5

Hello @Tom , the data includes subjects whom have an initial first visit, but some of them did not appear for any further subsequent visits. So no, none of the observations are missing visit two but have visits 1 and 3 accounted for. Thank you for this interesting take on the data. And thank you for taking the time to work on this code, I really appreciate it. I ran your top code, and that is the output I am looking for. Thanks again

r_behata
Barite | Level 11
 data center_a;
	input id gender age visit score ;
	datalines;
1 1 49 1 1.33333
1 1 49 2 4.55556
1 1 49 3 0.55556
2 1 51 1 2.66667
2 1 51 2 4.88889
2 1 51 3 4.22222
3 0 52 1 3.44444
4 0 70 1 3.44444
4 0 70 2 0.44444
4 0 70 3 4.88889
5 0 49 1 3.33333
;
run;

data want;
	set center_a;
	
	by id;
	
	output;
	
	if last.id then do;
	do while(visit < 3);
		visit=visit +1;
		score=.;
		output;
	end;
	end;
run;
hashman
Ammonite | Level 13

@smithb:

I strongly dislike hard-coding, so instead of setting the max number of visits to 3, I'd rather first determine from the data itself what that number really is and then use it downstream in the step:

data have ;                                                                                                                             
  input id gender age visit score ;                                                                                                     
  cards ;                                                                                                                               
1 1 49 1 1.33333                                                                                                                        
1 1 49 2 4.55556                                                                                                                        
1 1 49 3 0.55556                                                                                                                        
2 1 51 1 2.66667                                                                                                                        
2 1 51 2 4.88889                                                                                                                        
2 1 51 3 4.22222                                                                                                                        
3 0 52 1 3.44444                                                                                                                        
4 0 70 1 3.44444                                                                                                                        
4 0 70 2 0.44444                                                                                                                        
4 0 70 3 4.88889                                                                                                                        
5 0 49 1 3.33333                                                                                                                        
;                                                                                                                                       
run ;                                                                                                                                   
                                                                                                                                        
data want ;                                                                                                                             
  if _n_ = 1 then do until (z) ;                                                                                                        
    set have (keep = visit) end = z ;                                                                                                   
    last.visit = max (last.visit, visit) ;                                                                                              
  end ;                                                                                                                                 
  set have ;                                                                                                                            
  by id ;                                                                                                                               
  output ;                                                                                                                              
  call missing (score) ;                                                                                                                
  if last.id and visit < last.visit then do visit = visit + 1 to last.visit ;                                                           
    output ;                                                                                                                            
  end ;                                                                                                                                 
run ;                  

An interesting question is whether or not the same can be done in a single step if the input data set is completely disordered. The answer is that with a modicum of memorizing stuff in a hash table, it is affirmative:

data have ;                                                                                                                             
  input id gender age visit score ;                                                                                                     
  cards ;                                                                                                                               
4 0 70 3 4.88889                                                                                                                        
1 1 49 3 0.55556                                                                                                                        
2 1 51 1 2.66667                                                                                                                        
1 1 49 2 4.55556                                                                                                                        
2 1 51 3 4.22222                                                                                                                        
4 0 70 1 3.44444                                                                                                                        
2 1 51 2 4.88889                                                                                                                        
3 0 52 1 3.44444                                                                                                                        
4 0 70 2 0.44444                                                                                                                        
5 0 49 1 3.33333                                                                                                                        
1 1 49 1 1.33333                                                                                                                        
;                                                                                                                                       
run ;                                                                                                                                   
                                                                                                                                        
data want ;                                                                                                                             
  if _n_ = 1 then do ;                                                                                                                  
    dcl hash h (ordered:"a") ;                                                                                                          
    h.definekey ("id") ;                                                                                                                
    h.definedata ("id", "_iorc_", "_n_") ;                                                                                              
    h.definedone () ;                                                                                                                   
    do until (z) ;                                                                                                                      
      set have (keep = id visit) curobs = q end = z ;                                                                                   
      last.visit = max (last.visit, visit) ;                                                                                            
      if h.find() ne 0 then call missing (_iorc_, _n_) ;                                                                                
      if _iorc_ < visit then do ;                                                                                                       
        _iorc_ = visit ;                                                                                                                
        _n_ = q ;                                                                                                                       
      end ;                                                                                                                             
      h.replace() ;                                                                                                                     
    end ;                                                                                                                               
  end ;                                                                                                                                 
  set have curobs = q ;                                                                                                                 
  output ;                                                                                                                              
  h.find() ;                                                                                                                            
  call missing (score) ;                                                                                                                
  if _n_ = q then do visit = _iorc_ + 1 to last.visit ;                                                                                 
    output ;                                                                                                                            
  end ;                                                                                                                                 
run ;                                                

Though it might look like a codeful, the scheme is actually pretty simple:

  1. Read the input file, find the highest visit across all records, and store it in the variable LAST.VISIT.
  2. For each ID, memorize its highest visit and the record number on which it occurs.
  3. Read the file again and output each record. If for the current record's ID, the stored highest visit is less than LAST.VISIT, augment the output with visit numbers from the stored value up by 1 to last.visit.

Note that the variable LAST.VISIT has nothing to do with BY processing. It's just an arbitrary temporary variable, whose convenience lies in the fact that it is set to 0 at compile, auto-retained, and auto-dropped.   

 

Kind regards

Paul D. 

 

novinosrin
Tourmaline | Level 20

Nice one Guru @hashman  after paying attention to your post, my leaning was little bit towards utilizing SPARSE

tables id*visit/out=temp(keep=id visit) sparse;

in proc freq to get all max combinations in an ordered form and perform a direct dynamic look up. 

 

However when i tested on some large data, the performance of SPARSE was worse than awful.  Yours was lightning fast. Of course as expected. 

 

Okay, my post isn't about just words of praise that you anyway are used to receiving from SAS world, but many procedure options are not really performing at least at citizens SAS installation. Is it only at citizens resources or just expected normal behavior that you've seen from your enormous experiences? I guess you or Tom are likely to be best judge of it.

 

PS Thank you for the text. SAS certification seems far easier than passing road test. lol. We'll see i i make it in achieving my drivers. 🙂

 

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1840 views
  • 4 likes
  • 6 in conversation