I have data in an excel file that has ~80 variables starting with N, subject ID, deceased..... There are multiple lines for subject ID as each follow up visit is added in another row and aligned under the 'follow up visit' variable. My issue now is when I'm running reports, the number of "missing" is much higher than it should be because it is counting every repeated subject_id as missing. For example, subject id AMC001 below, is showing sex is missing 4 times. How can I get says to only look at the first subject id and ignore the rest if they are the same subject_id? I tried truncover without an input statement because I have ~80 variables and didn't want to list them all. I figured there has to be an easier way.
variables N subject_id Deceased Reason_for_Death Sex Ethnicity 12 008IGG 0 1 13 009IGG 0 1 14 AMC001 0 0 5 AMC001 AMC001 AMC001 AMC001
data=datasets.redcap_pde1;
Infile miss truncover;
run;
Once you read this monstrosity of an Excel file into SAS, you can create a new data set using only the first record for each subject_id and then do your statistics on this new data set.
data want;
set have;
by subject_id;
if first.subject_id;
run;
It is not clear in what form your data is now. You mentioned EXCEL and then show code using INFILE which is not valid for reading a binary file like an Excel workbook.
Let's assume you actually have an EXCEL file and have converted it into a SAS dataset using PROC IMPORT.
Let's also assume you have a variable like SUBJID_ID that is non-missing on every observation that ca be used to group the data.
In that case you could treat the whole file like a series of transactions using the UPDATE statement. That will ignore missing values on the transaction records to that the existing value of the variable is not replaced. So essentially values will be carried forward until a non-missing value appears for that variable. If there are variables that you do not want carried forward the add them to the KEEP= option on the extra SET statement.
So here is an example using HAVE as the name of the existing dataset, SUBJECT_ID as the key variable, and VISIT_DATE as an example of a variable where you do not want carried forward.
data want;
update have(obs=0) have;
by subject_id;
set have(keep=visit_date);
output;
run;
@Tom Sorry, that was unclear. Your assumptions were correct. The original data was an xlsx file and I imported to SAS. I tried your code and then ran a proc freq on MRI's but I'm still showing 471 'missing' . My total N is 116, so it is still counting the multiple entries for each subject_id.
data work.PDE_by_subjectid; update datasets.REDCap_PDE0(obs=0)datasets.REDCap_PDE0; by subject_id; set datasets.REDCap_PDE0; *(keep=redcap_event_name); output; run; *MRI performed; Proc freq data=work.PDE_by_subjectid; tables MRI_performed /missing; format mri_performed $mri_performed.; run;
*RESULTS; The FREQ Procedure MRI performed mri_performed Frequency Percent Cum Frequency Cum Percent 471 80.79 471 80.79 No 14 2.40 485 83.19 Yes 98 16.81 583 100.00
Try it without that extra SET and see if you like the results.
The point I was trying to make by including is that it might be that you only want to carry forward the non-missing values on SOME of the variables, like SEX, and not others. Having the extra SET statement re-read ALL of the variables like you did completely undoes anything the UPDATE statement might have done to carry forward values.
@kristiepauly wrote:
I have data in an excel file that has ~80 variables starting with N, subject ID, deceased..... There are multiple lines for subject ID as each follow up visit is added in another row and aligned under the 'follow up visit' variable. My issue now is when I'm running reports, the number of "missing" is much higher than it should be because it is counting every repeated subject_id as missing. For example, subject id AMC001 below, is showing sex is missing 4 times. How can I get says to only look at the first subject id and ignore the rest if they are the same subject_id? I tried truncover without an input statement because I have ~80 variables and didn't want to list them all. I figured there has to be an easier way.
variables N subject_id Deceased Reason_for_Death Sex Ethnicity 12 008IGG 0 1 13 009IGG 0 1 14 AMC001 0 0 5 AMC001 AMC001 AMC001 AMC001
data=datasets.redcap_pde1;
Infile miss truncover;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.