BookmarkSubscribeRSS Feed
MelBelle93
Calcite | Level 5

Hello! 

 

I am relatively new to SAS programming, and need help cleaning up a data set. Here is what my data looks like. 

 

PARTICIPANT 1               Var_1        Var_2           Var_3             Var_4

Visit 1                            32423.2     23432.4       8345.3          10203.76

Visit 2                            23498.9    86872.3       23443.5         872491.2

Visit 3                                .                  .                    .                     . 

PARTICIPANT 2              

Visit 1                            32423.2     23432.4       8345.3          10203.76

Visit 2                            23498.9    86872.3       23443.5         872491.2

Visit 3                            34232.3    23423.1       32432.9         234343.4

PARTICIPANT 3               

Visit 1                            32423.2     23432.4       8345.3          10203.76

Visit 2                            23498.9    86872.3       23443.5         872491.2

Visit 3                                .                  .                    .                     . 

PARTICIPANT 4               

Visit 1                            32423.2     23432.4       8345.3          10203.76

Visit 2                            23498.9    86872.3       23443.5         872491.2

Visit 3                                .                 .                       .                   . 

 

If a participant returned for visit 3, it means that something was wrong with their numbers for visit 2 and we needed to retest. How do I get SAS to drop all of the values for var1-var4 associated with visit 2 and replace them with the values for visit 3? I have many more variables but used var1-var4 to illustrate my point.  Data is grouped by the study visits. 

 

Any help would be very appreciated. 

 

Thank you! 

 

5 REPLIES 5
Reeza
Super User

Can you please provide example data. For example, I'm assuming you have some ID variable that indicates participants?

 

Currently you have it appearing in the same column so not sure how your data is actually structured.

 

Here are instructions on how to provide sample data as a data step:
https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

yabwon
Onyx | Level 15

Hi,

 

If I understood your post right self merge sounds like an option here.

 

data have;
infile cards4 dlm = "|";
input Subject : $ 30. visit : $ 15. Var_1 Var_2 Var_3 Var_4;
cards4;
PARTICIPANT 1|Visit 1|32423.2|23432.4|8345.3|10203.76
PARTICIPANT 1|Visit 2|23498.9|86872.3|23443.5|872491.2
PARTICIPANT 1|Visit 3|.|.|.|. 
PARTICIPANT 2|Visit 1|32423.2|23432.4|8345.3|10203.76
PARTICIPANT 2|Visit 2|23498.9|86872.3|23443.5|872491.2
PARTICIPANT 2|Visit 3|34232.3|23423.1|32432.9|234343.4
PARTICIPANT 3|Visit 1|32423.2|23432.4|8345.3|10203.76
PARTICIPANT 3|Visit 2|23498.9|86872.3|23443.5|872491.2
PARTICIPANT 3|Visit 3|.|.|.|. 
PARTICIPANT 4|Visit 1|32423.2|23432.4|8345.3|10203.76
PARTICIPANT 4|Visit 2|23498.9|86872.3|23443.5|872491.2
PARTICIPANT 4|Visit 3|.|.|.|.
;;;;
run;

data want;
  merge have 
        have(keep = Subject visit Var_1
             rename = (visit = _visit Var_1 = _Var_1)
             where = (_visit = 'Visit 3' and _Var_1 ne .)  );
  by Subject;

  if 
    (not (visit = 'Visit 2' and _visit = 'Visit 3') and not (visit = 'Visit 3'))
    or  
    (visit = 'Visit 3' and _visit = 'Visit 3') 
  then output;
  drop _:;
run;

All the best

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



hashman
Ammonite | Level 13

@MelBelle93:

When program control is at VISIT=2, read VAR_1-VAR_4 from the next record and overwrite the current record's VAR values with them. If VAR_1 (or any other VAR) is missing, read the current record again to restore the overwritten values. If VISIT=3, just null the VAR variables. In the SAS language:

data have ;                                                                                                                             
  input subjid participant visit var_1-var_4 ;                                                                                          
  cards ;                                                                                                                               
1  1  1  111  112  113  114                                                                                                             
1  1  2  121  122  123  124                                                                                                             
1  1  3    .    .    .    .                                                                                                             
1  2  1  211  212  213  214                                                                                                             
1  2  2  221  222  223  224                                                                                                             
1  2  3  231  232  233  234                                                                                                             
1  3  1  311  312  313  314                                                                                                             
1  3  2  321  322  323  324                                                                                                             
1  3  3    .    .    .    .                                                                                                             
1  4  1  411  412  413  414                                                                                                             
1  4  2  421  422  423  424                                                                                                             
1  4  3    .    .    .    .                                                                                                             
1  5  1  511  512  513  514                                                                                                             
1  5  2  521  522  523  524                                                                                                             
1  5  3  531  532  533  534                                                                                                             
;                                                                                                                                       
run ;                                                                                                                                   
                                                                                                                                        
data want ;                                                                                                                             
  set have ;                                                                                                                            
  if visit = 2 then do ;                                                                                                                
    p = _n_ + 1 ;                                                                                                                       
    set have (keep = var:) point = p ;                                                                                                  
    if missing (var_1) then set have point = _n_ ;                                                                                      
  end ;                                                                                                                                 
  else if visit = 3 then call missing (of var:) ;                                                                                       
run ;                                                

Of course, it assumes that the file is grouped as indicated.

 

Kind regards

Paul D.  

mkeintz
PROC Star

Here's similar logic, but by using the MERGE statement with a "firstobs=2" parameter, it retrieves VAR_1 (renamed to _nxt_var_1) from visit 3 at the same time as it reads in all of visit 2.  Then it can replace visit 2 data with visit 3 data only when needed.  The advantage is there is no need to re-read visit 2 when visit 3 is found to be empty.

 

data have ;                                                                                                                 
  input subjid participant visit var_1-var_4 ;
cards ;
1  1  1  111  112  113  114
1  1  2  121  122  123  124
1  1  3    .    .    .    .
1  2  1  211  212  213  214
1  2  2  221  222  223  224
1  2  3  231  232  233  234
1  3  1  311  312  313  314
1  3  2  321  322  323  324
1  3  3    .    .    .    .
1  4  1  411  412  413  414
1  4  2  421  422  423  424
1  4  3    .    .    .    .
1  5  1  511  512  513  514
1  5  2  521  522  523  524
1  5  3  531  532  533  534
run ;                                                                                                                    

data want (drop=_:);
  merge have  have (firstobs=2 keep=var_1 rename=(var_1=_nxtvar_1));
  p=_n_+1;
  if visit=2 and not missing(_nxtvar_1) then set have (keep=var:) point=p;
  else if visit=3 then call missing(of var:);
run;

Regards,

Mark

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
hashman
Ammonite | Level 13

@mkeintz: Neato! 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 849 views
  • 3 likes
  • 5 in conversation