Hello Community,
I need to create a dataset from a raw data file that only retains individuals in a study who completed a study assessment at two data points- one at study baseline (VISNO=01) AND one at study follow-up (VISNO=02). Individuals who only completed a study assessment at only one time point should not be included in the new dataset. I have provided an example of my input file as well as my desired output file below. Any help with how to program this would be greatly appreciated!
Have:
PID | VISNO | Other variables |
01 | 01 | … |
01 | 02 | … |
02 | 01 | … |
03 | 01 | … |
03 | 02 | … |
04 | 02 | … |
05 | 02 | … |
Want:
PID | VISNO | Other variables |
01 | 01 | … |
01 | 02 | … |
03 | 01 | … |
03 | 02 | … |
If your data is sorted as shown this might work:
data want; set have; by pid ; if not (first.pid and last.pid); run;
BY creates automatic boolean variables that indicate whether a record is the first or last of a by group. The variables are accessed using the First. and Last. (note the dots there) before the variable.
So if there is only a single record then it is both first and last at the same time.
@wj2 wrote:
Hello Community,
I need to create a dataset from a raw data file that only retains individuals in a study who completed a study assessment at two data points- one at study baseline (VISNO=01) AND one at study follow-up (VISNO=02). Individuals who only completed a study assessment at only one time point should not be included in the new dataset. I have provided an example of my input file as well as my desired output file below. Any help with how to program this would be greatly appreciated!
Have:
PID
VISNO
Other variables
01
01
…
01
02
…
02
01
…
03
01
…
03
02
…
04
02
…
05
02
…
Want:
PID
VISNO
Other variables
01
01
…
01
02
…
03
01
…
03
02
…
If your data is sorted as shown this might work:
data want; set have; by pid ; if not (first.pid and last.pid); run;
BY creates automatic boolean variables that indicate whether a record is the first or last of a by group. The variables are accessed using the First. and Last. (note the dots there) before the variable.
So if there is only a single record then it is both first and last at the same time.
@wj2 wrote:
Hello Community,
I need to create a dataset from a raw data file that only retains individuals in a study who completed a study assessment at two data points- one at study baseline (VISNO=01) AND one at study follow-up (VISNO=02). Individuals who only completed a study assessment at only one time point should not be included in the new dataset. I have provided an example of my input file as well as my desired output file below. Any help with how to program this would be greatly appreciated!
Have:
PID
VISNO
Other variables
01
01
…
01
02
…
02
01
…
03
01
…
03
02
…
04
02
…
05
02
…
Want:
PID
VISNO
Other variables
01
01
…
01
02
…
03
01
…
03
02
…
If you make VISNO numeric instead of character, this is pretty simple
proc sql;
create table want as select * from have
where max(visno)=2 and min(visno)=1
group by pid;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.