Hello!
I am relatively new to SAS programming, and need help cleaning up a data set. Here is what my data looks like.
PARTICIPANT 1 Var_1 Var_2 Var_3 Var_4
Visit 1 32423.2 23432.4 8345.3 10203.76
Visit 2 23498.9 86872.3 23443.5 872491.2
Visit 3 . . . .
PARTICIPANT 2
Visit 1 32423.2 23432.4 8345.3 10203.76
Visit 2 23498.9 86872.3 23443.5 872491.2
Visit 3 34232.3 23423.1 32432.9 234343.4
PARTICIPANT 3
Visit 1 32423.2 23432.4 8345.3 10203.76
Visit 2 23498.9 86872.3 23443.5 872491.2
Visit 3 . . . .
PARTICIPANT 4
Visit 1 32423.2 23432.4 8345.3 10203.76
Visit 2 23498.9 86872.3 23443.5 872491.2
Visit 3 . . . .
If a participant returned for visit 3, it means that something was wrong with their numbers for visit 2 and we needed to retest. How do I get SAS to drop all of the values for var1-var4 associated with visit 2 and replace them with the values for visit 3? I have many more variables but used var1-var4 to illustrate my point. Data is grouped by the study visits.
Any help would be very appreciated.
Thank you!
Can you please provide example data. For example, I'm assuming you have some ID variable that indicates participants?
Currently you have it appearing in the same column so not sure how your data is actually structured.
Here are instructions on how to provide sample data as a data step:
https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...
Hi,
If I understood your post right self merge sounds like an option here.
data have;
infile cards4 dlm = "|";
input Subject : $ 30. visit : $ 15. Var_1 Var_2 Var_3 Var_4;
cards4;
PARTICIPANT 1|Visit 1|32423.2|23432.4|8345.3|10203.76
PARTICIPANT 1|Visit 2|23498.9|86872.3|23443.5|872491.2
PARTICIPANT 1|Visit 3|.|.|.|.
PARTICIPANT 2|Visit 1|32423.2|23432.4|8345.3|10203.76
PARTICIPANT 2|Visit 2|23498.9|86872.3|23443.5|872491.2
PARTICIPANT 2|Visit 3|34232.3|23423.1|32432.9|234343.4
PARTICIPANT 3|Visit 1|32423.2|23432.4|8345.3|10203.76
PARTICIPANT 3|Visit 2|23498.9|86872.3|23443.5|872491.2
PARTICIPANT 3|Visit 3|.|.|.|.
PARTICIPANT 4|Visit 1|32423.2|23432.4|8345.3|10203.76
PARTICIPANT 4|Visit 2|23498.9|86872.3|23443.5|872491.2
PARTICIPANT 4|Visit 3|.|.|.|.
;;;;
run;
data want;
merge have
have(keep = Subject visit Var_1
rename = (visit = _visit Var_1 = _Var_1)
where = (_visit = 'Visit 3' and _Var_1 ne .) );
by Subject;
if
(not (visit = 'Visit 2' and _visit = 'Visit 3') and not (visit = 'Visit 3'))
or
(visit = 'Visit 3' and _visit = 'Visit 3')
then output;
drop _:;
run;
All the best
Bart
When program control is at VISIT=2, read VAR_1-VAR_4 from the next record and overwrite the current record's VAR values with them. If VAR_1 (or any other VAR) is missing, read the current record again to restore the overwritten values. If VISIT=3, just null the VAR variables. In the SAS language:
data have ;
input subjid participant visit var_1-var_4 ;
cards ;
1 1 1 111 112 113 114
1 1 2 121 122 123 124
1 1 3 . . . .
1 2 1 211 212 213 214
1 2 2 221 222 223 224
1 2 3 231 232 233 234
1 3 1 311 312 313 314
1 3 2 321 322 323 324
1 3 3 . . . .
1 4 1 411 412 413 414
1 4 2 421 422 423 424
1 4 3 . . . .
1 5 1 511 512 513 514
1 5 2 521 522 523 524
1 5 3 531 532 533 534
;
run ;
data want ;
set have ;
if visit = 2 then do ;
p = _n_ + 1 ;
set have (keep = var:) point = p ;
if missing (var_1) then set have point = _n_ ;
end ;
else if visit = 3 then call missing (of var:) ;
run ;
Of course, it assumes that the file is grouped as indicated.
Kind regards
Paul D.
Here's similar logic, but by using the MERGE statement with a "firstobs=2" parameter, it retrieves VAR_1 (renamed to _nxt_var_1) from visit 3 at the same time as it reads in all of visit 2. Then it can replace visit 2 data with visit 3 data only when needed. The advantage is there is no need to re-read visit 2 when visit 3 is found to be empty.
data have ;
input subjid participant visit var_1-var_4 ;
cards ;
1 1 1 111 112 113 114
1 1 2 121 122 123 124
1 1 3 . . . .
1 2 1 211 212 213 214
1 2 2 221 222 223 224
1 2 3 231 232 233 234
1 3 1 311 312 313 314
1 3 2 321 322 323 324
1 3 3 . . . .
1 4 1 411 412 413 414
1 4 2 421 422 423 424
1 4 3 . . . .
1 5 1 511 512 513 514
1 5 2 521 522 523 524
1 5 3 531 532 533 534
run ;
data want (drop=_:);
merge have have (firstobs=2 keep=var_1 rename=(var_1=_nxtvar_1));
p=_n_+1;
if visit=2 and not missing(_nxtvar_1) then set have (keep=var:) point=p;
else if visit=3 then call missing(of var:);
run;
Regards,
Mark
@mkeintz: Neato!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.