Use by-group processing, and add a loop when last.id is reached:
data have;
input id $ visit val1 val2;
datalines;
ABC123 1 50 54
ABC123 2 33 33
ABC123 3 21 44
ABC123 4 33 64
ABC121 1 90 34
ABC121 2 32 39
ABC121 3 51 24
ABC122 1 73 83
ABC122 2 10 14
ABC122 3 53 77
ABC124 1 50 54
ABC124 2 32 33
ABC124 3 51 94
ABC124 4 33 44
;
data want;
set have;
by id notsorted;
output;
if last.id
then do visit = visit + 1 to 4;
output;
end;
run;
proc print data=want noobs;
run;
Result:
id visit val1 val2 ABC123 1 50 54 ABC123 2 33 33 ABC123 3 21 44 ABC123 4 33 64 ABC121 1 90 34 ABC121 2 32 39 ABC121 3 51 24 ABC121 4 51 24 ABC122 1 73 83 ABC122 2 10 14 ABC122 3 53 77 ABC122 4 53 77 ABC124 1 50 54 ABC124 2 32 33 ABC124 3 51 94 ABC124 4 33 44
What does your desired output look like?
Use by-group processing, and add a loop when last.id is reached:
data have;
input id $ visit val1 val2;
datalines;
ABC123 1 50 54
ABC123 2 33 33
ABC123 3 21 44
ABC123 4 33 64
ABC121 1 90 34
ABC121 2 32 39
ABC121 3 51 24
ABC122 1 73 83
ABC122 2 10 14
ABC122 3 53 77
ABC124 1 50 54
ABC124 2 32 33
ABC124 3 51 94
ABC124 4 33 44
;
data want;
set have;
by id notsorted;
output;
if last.id
then do visit = visit + 1 to 4;
output;
end;
run;
proc print data=want noobs;
run;
Result:
id visit val1 val2 ABC123 1 50 54 ABC123 2 33 33 ABC123 3 21 44 ABC123 4 33 64 ABC121 1 90 34 ABC121 2 32 39 ABC121 3 51 24 ABC121 4 51 24 ABC122 1 73 83 ABC122 2 10 14 ABC122 3 53 77 ABC122 4 53 77 ABC124 1 50 54 ABC124 2 32 33 ABC124 3 51 94 ABC124 4 33 44
By-group processing is initiated with the by statement; for every variable in the by statement, automatic boolean variables first. and last. are created, which signal when a group starts and when it ends.
My code runs a do loop when the last observation of a group is reached, and the do loop is specified in a way that it only runs when the last observation of a group has a visit number smaller than 4.
If you only have values 1 to 4, it makes sense to save space by storing it as $1 (numeric variables need at least 3 bytes).
Still, if you do not have space issues, storing such values as numeric might be better.
For this, you use the "look-ahead" technique:
data have;
input id $ visit val1 val2;
datalines;
ABC123 1 50 54
ABC123 2 33 33
ABC123 4 33 64
ABC121 1 90 34
ABC121 3 51 24
ABC122 1 73 83
ABC122 2 10 14
ABC122 3 53 77
ABC124 1 50 54
ABC124 2 32 33
ABC124 3 51 94
ABC124 4 33 44
;
data want;
merge
have
have (
firstobs=2
keep=id visit
rename=(id=_id visit=_visit)
)
;
output;
if id ne _id then _visit = 5;
do visit = visit + 1 to _visit - 1;
output;
end;
drop _:;
run;
I do a merge without a by; the firstobs=2 causes the second read in the merge to always be one observation ahead of the first read.
I only keep two variables and rename them, so I have two additional variables in the PDV, "future" versions of the originals.
One of these is used to detect if we're still in the same group (id), and the other gives me the next visit number.
The do loop is structured in a way that it writes the current observation (one do loop iteration if next visit = current visit + 1), or multiple observations if there is a "hole". At the end of the group, I set a virtual 5th visit, so the loop works right up to visit 4.
This code was created mostly by applying Maxim 4: start with a simple idea (the look-ahead), run it, look where the result differs from the expectations, adapt and run again. Rinse, lather, repeat.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.