Hello Community,
I need to create a dataset from a raw data file that only retains individuals in a study who completed a study assessment at two data points- one at study baseline (VISNO=01) AND one at study follow-up (VISNO=02). Individuals who only completed a study assessment at only one time point should not be included in the new dataset. I have provided an example of my input file as well as my desired output file below. Any help with how to program this would be greatly appreciated!
Have:
PID | VISNO | Other variables |
01 | 01 | … |
01 | 02 | … |
02 | 01 | … |
03 | 01 | … |
03 | 02 | … |
04 | 02 | … |
05 | 02 | … |
Want:
PID | VISNO | Other variables |
01 | 01 | … |
01 | 02 | … |
03 | 01 | … |
03 | 02 | … |
If your data is sorted as shown this might work:
data want; set have; by pid ; if not (first.pid and last.pid); run;
BY creates automatic boolean variables that indicate whether a record is the first or last of a by group. The variables are accessed using the First. and Last. (note the dots there) before the variable.
So if there is only a single record then it is both first and last at the same time.
@wj2 wrote:
Hello Community,
I need to create a dataset from a raw data file that only retains individuals in a study who completed a study assessment at two data points- one at study baseline (VISNO=01) AND one at study follow-up (VISNO=02). Individuals who only completed a study assessment at only one time point should not be included in the new dataset. I have provided an example of my input file as well as my desired output file below. Any help with how to program this would be greatly appreciated!
Have:
PID
VISNO
Other variables
01
01
…
01
02
…
02
01
…
03
01
…
03
02
…
04
02
…
05
02
…
Want:
PID
VISNO
Other variables
01
01
…
01
02
…
03
01
…
03
02
…
If your data is sorted as shown this might work:
data want; set have; by pid ; if not (first.pid and last.pid); run;
BY creates automatic boolean variables that indicate whether a record is the first or last of a by group. The variables are accessed using the First. and Last. (note the dots there) before the variable.
So if there is only a single record then it is both first and last at the same time.
@wj2 wrote:
Hello Community,
I need to create a dataset from a raw data file that only retains individuals in a study who completed a study assessment at two data points- one at study baseline (VISNO=01) AND one at study follow-up (VISNO=02). Individuals who only completed a study assessment at only one time point should not be included in the new dataset. I have provided an example of my input file as well as my desired output file below. Any help with how to program this would be greatly appreciated!
Have:
PID
VISNO
Other variables
01
01
…
01
02
…
02
01
…
03
01
…
03
02
…
04
02
…
05
02
…
Want:
PID
VISNO
Other variables
01
01
…
01
02
…
03
01
…
03
02
…
If you make VISNO numeric instead of character, this is pretty simple
proc sql;
create table want as select * from have
where max(visno)=2 and min(visno)=1
group by pid;
quit;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.