BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ScottM8
Fluorite | Level 6

Hello,

I have one dataset with responses from a patient questionnaire. The questions are all standard and the responses are multiple choice.

 

I have another dataset that has the questions, all possible responses, and separate columns for a goal and plan of action for each corresponding response.

 

Ex: Dataset 1

Column A - Question

1. What medical condition do you have

Column B - Response

Asthma

 

Dataset 2

Column A - Question

1. What medical condition do you have

Column B - Response

Asthma

Column C - Goal

Ensure Asthma Medications are current

Column D - Plan of Care

Review medications with physician

 

I'm trying to find a way to join the data using the criteria that if the question and answer match on both datasets, it will pull column A and B from Dataset 1 and Column C and D from Dataset 2

 

I've tried to following code:

 

 

PROC SQL;
   CREATE TABLE WORK.'MEMBER DATA'n AS 
   SELECT t1.ClientID, 
          
          t1.Question, 
          t1.Answer, 
          t1.AssessmentCreateDate,  
          t2.Need, 
          t2.'Problem 
          t2.'Care Goal
          t2.'Activity/Intervention
      FROM WORK.'MEMBER DATA'n t1
           LEFT JOIN WORK.'MAPPING'n t2 ON (t1.Question = t2.'Question from Questionnaire (Item'n) AND (t1.Answer = 
          t2.'Trigger Response(s) from Questionnaire'n);
QUIT;

 

There are multiple questions that have the same response options, but different questions so both the question and answer criteria have to match to pull the correct corresponding columns in table 2 (Ex: Do you smoke? Yes/No, Do you drink alcohol? Yes/No).

 

What am I doing wrong? So sorry if I didn't submit this correctly...I'm very new to SAS and this forum.

 

I'm on SAS EG 7.1

 

Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
unison
Lapis Lazuli | Level 10

If I were to guess, the variable naming going on in your datasets is giving you issues. Work on getting the variable names into a standard format (starts with A-z or _ and remove any spaces) and then your problem can be solved with a left join as you have done.

 

Here's an example:

 

proc format;
	value quest
	11 = "11. In the past year, how many times have you been to the emergency room?"
	;
	value ans
	1 = "Once"
	2 = "Twice"
	3 = "3 or more"
	;
run;


data survey_resp;
input clientid question answer;
datalines;
1 11 1
2 11 3
3 11 2
;
run;

data resp_xwalk;
infile datalines dlm=',';
length question 8. answer 8. need $50. problem $100. goal $200.;
input question answer need $ problem $ goal $;
datalines;
11,1,Emergency Room Visits, Emergency Room Visits - One visit reported., Have 0 ER visits over next 60 days. See your doctor or urgent care facility when appropriate.
11,2,Emergency Room Visits, Emergency Room Visits - Two visits reported., Have 0 ER visits over next 60 days. See your doctor or urgent care facility when appropriate.
11,3,Emergency Room Visits, Emergency Room Visits - Three or more visits reported., Have 0 ER visits over next 60 days. See your doctor or urgent care facility when appropriate.
;
run;

data survey_resp_formatted(drop=question answer);
set survey_resp;
quest = put(question,quest.);
ans = put(answer,ans.);
run;

data resp_xwalk_formatted(drop= question answer);
set resp_xwalk;
quest = put(question,quest.);
ans = put(answer,ans.);
run;

proc sql;
create table want as
select a.clientid, a.quest, a.ans, b.need, b.problem, b.goal 
from survey_resp_formatted a
left join resp_xwalk_formatted b
on a.quest=b.quest and a.ans=b.ans
order by a.clientid, a.quest, a.ans
;quit;

-unison

-unison

View solution in original post

11 REPLIES 11
Kurt_Bremser
Super User

First of all, make your coding easier by orders of magnitude by getting rid of the stupid nonstandard names. Really. I mean it.

Then, remove the spurious single quotes as they will cause syntax ERRORS.

See this short example

data ds1;
infile datalines dlm='|' dsd;
length question response $50;
input question response;
datalines;
1. What medical condition do you have|Asthma
;

data ds2;
infile datalines dlm='|' dsd;
length question response goal plan $50;
input question response goal plan;
datalines;
1. What medical condition do you have|Asthma|Ensure Asthma Medications are current|Review medications with physician
;

proc sql;
create table want as
select ds1.*, ds2.goal, ds2.plan
from ds1 left join ds2
on ds1.question = ds2.question and ds1.response = ds2.response;
quit;
ScottM8
Fluorite | Level 6

"First of all, make your coding easier by orders of magnitude by getting rid of the stupid nonstandard names. Really. I mean it." seems unnecessarily hostile, but thanks

Kurt_Bremser
Super User

@ScottM8 wrote:

"First of all, make your coding easier by orders of magnitude by getting rid of the stupid nonstandard names. Really. I mean it." seems unnecessarily hostile, but thanks


It's not meant to be hostile. It may save you years of grief in your future SAS life. Non-standard names are stupid, as they cause unnecessary work. A good programmer is lazy by nature and avoids unnecessary work like the plague.

 

Just compare your join condition with mine.

ScottM8
Fluorite | Level 6

That makes sense. I'm very new to this so I guess I was more focused on the join and getting the desired result than setting up the data correctly and efficiently. I'll keep this in mind going forward as it will make everything easier. Thank you.

ballardw
Super User

@ScottM8 wrote:

"First of all, make your coding easier by orders of magnitude by getting rid of the stupid nonstandard names. Really. I mean it." seems unnecessarily hostile, but thanks


I would say not.

 

I used to do contract data work for a very large computer company's hardware branch. They sent us file extracts 3 times a week and they had the columns change order almost every other file and column headings randomly acquire or lose characters constantly. If we had attempted to use those inconsistent column labels we would have had to change up to 100's of lines of code multiple times per week.

 

By using our consistent "standard" names we only had to modify our read program to handle the column order issue.

 

Interesting enough when we added some work from one of the exact same company's SOFTWARE divisions we never ran into either of these issues.

ballardw
Super User

I don't believe that your data set 2 is properly organized for this purpose and this approach may not be viable.

 

Provide some example data from both data sets AND what you expect the resulting data set to look like.

 

When I am working with similar data I generally do not attempt what I think you may be doing.

Instead the "question" information goes into my variable labels to display as needed and question responses, especially if common across many questions go into formats so that a response code of "A" (or what ever, you don't show what your actual response values might be) can show the appropriate text as needed.

 

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

 

You don't need to show all of the data or anything sensitive but maybe 3 of the multiple response variables, the categories and what you expect.

 

And an example of Label/format display.

proc format library=work;
value somevalue
1 = 'Always'
2 = 'Often'
3 = 'Sometimes'
4 = 'Never'
;

data example;
   input var1 - var5;
   label 
      var1='Question1: Response1'
      var2='Question1: Response2'
      var3='Question1: Response3'
      var4='Question2: Response1'
      var5='Question2: Response1'
   ;
datalines;
1 2 1 3 4
1 1 2 2 3
3 3 1 1 4
1 1 1 1 1
2 2 2 3 3
;

proc print data=example label noobs;
format var1 - var5 somevalue.;
run;

Multiple response questions may also take some more manipulation depending on how the data is recorded for some analysis or reporting.

ScottM8
Fluorite | Level 6

I'm very sorry...I tried to follow the instructions to turn the SAS data in to a data step, but was unsuccessful. Here is an example of a response, the response crosswalk, and the desired outcome

 

Survey Response Data

ClientIDQuestionAnswer
111. In the past year, how many times have you been to the emergency room?Twice

 

Survey Crosswalk

QuestionResponseNeedProblemGoal
11. In the past year, how many times have you been to the emergency room?OnceEmergency Room VisitsEmergency Room Visits - One visit reported.Have 0 ER visits over next 60 days. See your doctor or urgent care facility when appropriate.
11. In the past year, how many times have you been to the emergency room?

Twice
Emergency Room VisitsEmergency Room Visits - Two visits reported.Have 0 ER visits over next 60 days. See your doctor or urgent care facility when appropriate.
11. In the past year, how many times have you been to the emergency room?Three or moreEmergency Room VisitsEmergency Room Visits - Three or more visits reported.Have 0 ER visits over next 60 days. See your doctor or urgent care facility when appropriate.

 

 

Desired Outcome:

ClientIDQuestionAnswerNeedProblemGoal
111. In the past year, how many times have you been to the emergency room?TwiceEmergency Room VisitsEmergency Room Visits - Two visits reported.Have 0 ER visits over next 60 days. See your doctor or urgent care facility when appropriate.
unison
Lapis Lazuli | Level 10

See my edited response below!

One thing to keep in mind is that, according to your tables, the 'response' is called 'answer' in your client table and 'response' in your crosswalk.

-unison

-unison
ScottM8
Fluorite | Level 6

Thank you very much! It looks like I wasn't inputting the data correctly. This helps a lot. Thank you again.

ballardw
Super User

@ScottM8 wrote:

I'm very sorry...I tried to follow the instructions to turn the SAS data in to a data step, but was unsuccessful. Here is an example of a response, the response crosswalk, and the desired outcome

 

Survey Response Data

ClientID Question Answer
1 11. In the past year, how many times have you been to the emergency room? Twice

 

Survey Crosswalk

Question Response Need Problem Goal
11. In the past year, how many times have you been to the emergency room? Once Emergency Room Visits Emergency Room Visits - One visit reported. Have 0 ER visits over next 60 days. See your doctor or urgent care facility when appropriate.
11. In the past year, how many times have you been to the emergency room?

Twice
Emergency Room Visits Emergency Room Visits - Two visits reported. Have 0 ER visits over next 60 days. See your doctor or urgent care facility when appropriate.
11. In the past year, how many times have you been to the emergency room? Three or more Emergency Room Visits Emergency Room Visits - Three or more visits reported. Have 0 ER visits over next 60 days. See your doctor or urgent care facility when appropriate.

 

 

Desired Outcome:

ClientID Question Answer Need Problem Goal
1 11. In the past year, how many times have you been to the emergency room? Twice Emergency Room Visits Emergency Room Visits - Two visits reported. Have 0 ER visits over next 60 days. See your doctor or urgent care facility when appropriate.

Do you have an example of one or two of these where 1) the Need Changes and/or 2) the Goal Changes?

And is the "Need" common to other questions?

If the Need does not change then "Problem" is quite doable with a format.

If the Goal does not change then either a format may well be workable as well.

Does you question 11 have any response choice of Zero or None? Is that all of the possible responses to question?

 

A slightly more targeted format and label approach.

proc format library=work;
value $q11_P
"Once"         = "Emergency Room Visits - One visit reported"
"Twice"        = "Emergency Room Visits - Two visits reported"
"Three or more"= "Emergency Room Visits - Three or more visits reported"
"None"         = " "
;
value $q11_G
"Once",        
"Twice",       
"Three or more"= "Have 0 ER visits over next 60 days. See your doctor or urgent care facility when appropriate."
"None"         = " "
;
run;
data have;
   infile datalines dlm=',';
   input clientid $ q11 $15.;
   label 
      clientid ="Client Identifier"
      Q11= "In the past year, how many times have you been to the emergency room?"
   ;
datalines;
1,Once
2,Twice
3,Three or more
4,None
;

proc print data=have noobs label;
   title "Q11 values printed with Problem format";
   format q11 $q11_p.;
run; title;
proc print data=have noobs label;
   title "Q11 values printed with Goal format";
   format q11 $q11_G.;
run; title;

I am afraid that to get output that looks exactly like your desired outcome is very dependent on your actual data structure and content and without a good example I'm not sure what to suggest.

 

I will say that  if I have a document like your crosswalk then it is not difficult in Excel to create the Format statements similar to what I show. And if the document is well structured creating a variable label statement is not much work either.

 

If you can't turn the SAS data set, or part of it into data step code can you copy some lines of the data from your data source and paste that as text? Change anything sensitive like client identity information to some that can't link back to your actual data.

Copy the data lines, including a header row if any, into a text box opened with the forums {I}.

 

I can say from working with survey data for more than 20 years I have yet to have a software package output the responses with question identity and value one the same row aw your "response data" implies.

unison
Lapis Lazuli | Level 10

If I were to guess, the variable naming going on in your datasets is giving you issues. Work on getting the variable names into a standard format (starts with A-z or _ and remove any spaces) and then your problem can be solved with a left join as you have done.

 

Here's an example:

 

proc format;
	value quest
	11 = "11. In the past year, how many times have you been to the emergency room?"
	;
	value ans
	1 = "Once"
	2 = "Twice"
	3 = "3 or more"
	;
run;


data survey_resp;
input clientid question answer;
datalines;
1 11 1
2 11 3
3 11 2
;
run;

data resp_xwalk;
infile datalines dlm=',';
length question 8. answer 8. need $50. problem $100. goal $200.;
input question answer need $ problem $ goal $;
datalines;
11,1,Emergency Room Visits, Emergency Room Visits - One visit reported., Have 0 ER visits over next 60 days. See your doctor or urgent care facility when appropriate.
11,2,Emergency Room Visits, Emergency Room Visits - Two visits reported., Have 0 ER visits over next 60 days. See your doctor or urgent care facility when appropriate.
11,3,Emergency Room Visits, Emergency Room Visits - Three or more visits reported., Have 0 ER visits over next 60 days. See your doctor or urgent care facility when appropriate.
;
run;

data survey_resp_formatted(drop=question answer);
set survey_resp;
quest = put(question,quest.);
ans = put(answer,ans.);
run;

data resp_xwalk_formatted(drop= question answer);
set resp_xwalk;
quest = put(question,quest.);
ans = put(answer,ans.);
run;

proc sql;
create table want as
select a.clientid, a.quest, a.ans, b.need, b.problem, b.goal 
from survey_resp_formatted a
left join resp_xwalk_formatted b
on a.quest=b.quest and a.ans=b.ans
order by a.clientid, a.quest, a.ans
;quit;

-unison

-unison

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 3675 views
  • 3 likes
  • 4 in conversation