BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
SAS_Novice22
Quartz | Level 8

Hi SAS Experts,

 

I have a number of patient reported outcome questionnaires where I would like to evaluate very basic descriptive information on the number of completed by subjects at each visit. The dataset is large for each questionnaire as subjects have multiple rows per questionnaire for each visit.

 

Here is what a snapshot of the data looks like:

Patient_IDVisit_NumberVisit_DescriptionClinical_Assessment_Short_NameClinical_Assessment_Coded_Respon
0011EnrollmentWPAIGH12
0011EnrollmentWPAIGH2.
0011EnrollmentWPAIGH3.
0011EnrollmentWPAIGH4.
0011EnrollmentWPAIGH5.
0011EnrollmentWPAIGH6.
0012Month_3WPAIGH11
0012Month_3WPAIGH2.
0012Month_3WPAIGH3.
0012Month_3WPAIGH4.
0012Month_3WPAIGH5.
0012Month_3WPAIGH6.
0013Month_6WPAIGH12
0013.........

 

There are around 160 subjects in my study and I only want to know how many subjects completed the questionnaire at each study visit.

I just want the data to look like this:

QuestionnaireEnrollmentMonth 3Month 6Month 9Month 12Month 15
WPAIGHN=164N=155N=139N=124N=115N=89 ...
TSQM-9... ................

 

Each questionnaire is in its own dataset, so my example is only showing one questionnaire (WPAIGH). The problem I am having is the multiple rows per subject per visit and how to deal with that. Each subject may have completed a different number of follow-up assessments so patient 001 may have up to month 26 while patient 002 may only have up to month 18.

I was considering deriving a new variable and then using if and then statements to code but that does not seem to work. Here is my SAS code and the log output as well.

SAS Code:

/**Number of WPAIGH completed**/
DATA WORK.WPAIGH_ERT;
	SET WPAIGH_ERT;
	WPAIGH = .;
RUN;

TITLE "Count of WPAIGH completed per visit per subject"; 
DATA WORK.HAEA_NEW;
	SET HAEA_NEW;
	IF Visit_Description="Enrollment" AND Clinical_Assessment_Short_Name="WPAIGH1" THEN WPAIGH=1;
	ELSE WPAIGH = .;
RUN;

Log Output:

 

244  TITLE "Count of WPAIGH completed per visit per subject";
245  DATA WORK.HAEA_NEW;
246      SET HAEA_NEW;
247      IF Visit_Description="Enrollment" AND Clinical_Assessment_Short_Name="WPAIGH1" THEN WPAIGH=1;
248      ELSE WPAIGH = .;
249  RUN;

NOTE: There were 0 observations read from the data set WORK.HAEA_NEW.
NOTE: The data set WORK.HAEA_NEW has 0 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds


NOTE: This SAS session is using a registry in WORK.  All changes will be lost at the end of this
      session.
NOTE: This SAS session is using a registry in WORK.  All changes will be lost at the end of this
      session

 

 

Again, I am only looking for basic counts for number of questionnaires completed at each visit for each subject (counted only once per subject/per visit).

Any assistance that you can provide would be greatly appreciated.

Thank you,

T.
 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Very likely you do not want a data set. You say " I only want to know how many subjects completed the questionnaire at each study visit."

That sounds like a report of some sort.

 

Before we get started, you have destroyed your starting data set with this code as shown by the Work.Haea_new has 0 observations.

DATA WORK.HAEA_NEW;
	SET HAEA_NEW;

when you use the same data set name on Set and Data statements you completely replace the set. It is a good idea, especially while learning to always create new data sets.

 

You don't mention what the actual name of the data set that looks like your example. If you have multiple data set then you likely need to append them together to get a single set to work with.

If you have a data set with those variables something like: HOWEVER "clinical_assessment_coded_response" is not actually a valid SAS variable name. It is 34 characters long and the limit is 32. So you may have to fudge the code shown below to match your actual SAS data set name(s)

 

Proc tabulate data=yourdatasetname;
   where not missing (clinical_assessment_coded_ );
   class visit_description clinical_assessment_short_name;
   table visit_description ,
          clinical_assessment_short_name
    ;
run;

 

 

View solution in original post

4 REPLIES 4
ballardw
Super User

Very likely you do not want a data set. You say " I only want to know how many subjects completed the questionnaire at each study visit."

That sounds like a report of some sort.

 

Before we get started, you have destroyed your starting data set with this code as shown by the Work.Haea_new has 0 observations.

DATA WORK.HAEA_NEW;
	SET HAEA_NEW;

when you use the same data set name on Set and Data statements you completely replace the set. It is a good idea, especially while learning to always create new data sets.

 

You don't mention what the actual name of the data set that looks like your example. If you have multiple data set then you likely need to append them together to get a single set to work with.

If you have a data set with those variables something like: HOWEVER "clinical_assessment_coded_response" is not actually a valid SAS variable name. It is 34 characters long and the limit is 32. So you may have to fudge the code shown below to match your actual SAS data set name(s)

 

Proc tabulate data=yourdatasetname;
   where not missing (clinical_assessment_coded_ );
   class visit_description clinical_assessment_short_name;
   table visit_description ,
          clinical_assessment_short_name
    ;
run;

 

 

SAS_Novice22
Quartz | Level 8

Hi @ballardw,
Thank you. I now understand that I am writing over my dataset name when I call in a dataset and set that dataset using the exact same name. How do I make these 2 different names? I have tried and I get the following in my log:

426  PROC IMPORT OUT = WPAIGH_ERT
427      DATAFILE = 'M:\SAS\TAKEDA EDR/WPAIGH_ERT.xlsx'
428      DBMS = xlsx;
429  RUN;

NOTE:    Variable Name Change.  Study Identifier -> Study_Identifier
NOTE:    Variable Name Change.  Patient ID -> Patient_ID
NOTE:    Variable Name Change.  Questionnaire Type -> Questionnaire_Type
NOTE:    Variable Name Change.  Time of Questionnaire Completion -> Time_of_Questionnaire_Completion
NOTE:    Variable Name Change.  Visit Number -> Visit_Number
NOTE: One or more variables were converted because the data type is not supported by the V9 engine.
      For more details, run with options MSGLEVEL=I.
NOTE: The import data set has 5112 observations and 10 variables.
NOTE: WORK.WPAIGH_ERT data set was successfully created.
NOTE: PROCEDURE IMPORT used (Total process time):
      real time           2.64 seconds
      cpu time            0.48 seconds


430  DATA WPAIGH_ERT;
431      SET WPAIGH_2;
ERROR: File WORK.WPAIGH_2.DATA does not exist.
432      OVERALL = "Overall";
433  RUN;

NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.WPAIGH_ERT may be incomplete.  When this step was stopped there were 0
         observations and 1 variables.
WARNING: Data set WORK.WPAIGH_ERT was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

I have modified the names of my variables so that SAS can read them and the code you provided worked exactly how I wanted it to!

SAS_Novice22_0-1654796551707.png

So I can understand how you knew to code this way, what are lines 469-472 exactly doing in English?

467  TITLE "Count of WPAIGH completed per visit per subject";
468  PROC TABULATE data=WPAIGH_ERT;
469      where not missing (CACodedRes);
470      class VisitName CAShort;
471      table VisitName,
472            CAShort
473      ;
474  RUN;

NOTE: There were 5112 observations read from the data set WORK.WPAIGH_ERT.
      WHERE not MISSING(CACodedRes);
NOTE: PROCEDURE TABULATE used (Total process time):
      real time           0.03 seconds
      cpu time            0.03 seconds

This is all very useful for my learning and thank you again for providing the solution, I can now apply these same steps to my other PRO datasets.

Best,

T.

ballardw
Super User

The name on the DATA statement is the New (or output data set(s) ), the name on the SET statement is the old or source data set(s).

So when you used

430  DATA WPAIGH_ERT;
431      SET WPAIGH_2;
ERROR: File WORK.WPAIGH_2.DATA does not exist.
432      OVERALL = "Overall";
433  RUN;

you told SAS to read the data from WPAIGH_2 and write it out to WPAIGHT_ERT.

Since you just imported WPAIGHT_ERT you likely intended this if you want to add a variable Overall with that value.

DATA WPAIGH_2;
      SET WPAIGH_ERT;
      OVERALL = "Overall";
RUN;

 

So I can understand how you knew to code this way, what are lines 469-472 exactly doing in English?

467  TITLE "Count of WPAIGH completed per visit per subject";
468  PROC TABULATE data=WPAIGH_ERT;
469      where not missing (CACodedRes);
470      class VisitName CAShort;
471      table VisitName,
472            CAShort
473      ;
474  RUN;

 Everything is related to Proc Tabulate and there are a lot of possibilities for that procedure so you want to read some of the documentation.

Short:

Where statement, available in most procedures, filters data. Your original data seemed to show that you wanted to count only when there were values in that variable (not missing).

The Class statement(s) identify variables that are used for categories which are basically restricted to count type statitistics, N and percentages: pctn colpctn rowpctn pagepctn

If you had variables to sum, find stats like max, min, median, standard deviation they would go on a VAR statement.

Proc Tabulate needs to know which role a variable will be used as each has different options.

The Table statements in Proc Tabulate (you can have more than one in a single procedure call) use commas to separate dimensions. If there are two variables, or a variable and statistic separated by a comma the first is the row variable and the stuff following the comma is the Column dimension. If you have a third comma the first group is a separate table (page), the 2nd becomes row and the 3rd is the column. So this table has Visitname for row and CAshort for the columns. Since we did not request any specific statistic then by default N is reported for the column.

 

A caveat for working with Proc Tabulate: The default behavior for CLASS variables is if one or more of the Class variables is missing then the record is excluded from the result. There are options to keep them if needed.

 

This procedure has enough options even before we look at things like how to highlight cells that SAS publishes a book on how to use the features. The online documentation has more than 15 examples.

 

You may want to the add the option Order=data to the class statement. The might make the result look nicer as it will populate things in the order the data set is read instead of using the default order of unformatted which makes Month 3 appear after Month 12 because character comparisons make the 3 come after 1 and ignore the 2.

470      class VisitName CAShort / order=data;

 

Proc Report could also be used.

 

SAS_Novice22
Quartz | Level 8

@ballardw 

 

Thank you very much, this is extremely useful information for me to know! I will do some further reading and also look up the examples provided.

Thank you also for letting me know about the code I can use to order my data, it worked perfectly. 

There is so much to learn but I feel like the possibilities are endless which is exciting.

 

I will implement the DATA and SET correctly going forward. Once again, I really appreciate the time you have afforded in answering my questions.

 

Best wishes,

T.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 945 views
  • 1 like
  • 2 in conversation