Hello,
I have one dataset with responses from a patient questionnaire. The questions are all standard and the responses are multiple choice.
I have another dataset that has the questions, all possible responses, and separate columns for a goal and plan of action for each corresponding response.
Ex: Dataset 1
Column A - Question
1. What medical condition do you have
Column B - Response
Asthma
Dataset 2
Column A - Question
1. What medical condition do you have
Column B - Response
Asthma
Column C - Goal
Ensure Asthma Medications are current
Column D - Plan of Care
Review medications with physician
I'm trying to find a way to join the data using the criteria that if the question and answer match on both datasets, it will pull column A and B from Dataset 1 and Column C and D from Dataset 2
I've tried to following code:
PROC SQL;
CREATE TABLE WORK.'MEMBER DATA'n AS
SELECT t1.ClientID,
t1.Question,
t1.Answer,
t1.AssessmentCreateDate,
t2.Need,
t2.'Problem
t2.'Care Goal
t2.'Activity/Intervention
FROM WORK.'MEMBER DATA'n t1
LEFT JOIN WORK.'MAPPING'n t2 ON (t1.Question = t2.'Question from Questionnaire (Item'n) AND (t1.Answer =
t2.'Trigger Response(s) from Questionnaire'n);
QUIT;
There are multiple questions that have the same response options, but different questions so both the question and answer criteria have to match to pull the correct corresponding columns in table 2 (Ex: Do you smoke? Yes/No, Do you drink alcohol? Yes/No).
What am I doing wrong? So sorry if I didn't submit this correctly...I'm very new to SAS and this forum.
I'm on SAS EG 7.1
Thank you.
If I were to guess, the variable naming going on in your datasets is giving you issues. Work on getting the variable names into a standard format (starts with A-z or _ and remove any spaces) and then your problem can be solved with a left join as you have done.
Here's an example:
proc format;
value quest
11 = "11. In the past year, how many times have you been to the emergency room?"
;
value ans
1 = "Once"
2 = "Twice"
3 = "3 or more"
;
run;
data survey_resp;
input clientid question answer;
datalines;
1 11 1
2 11 3
3 11 2
;
run;
data resp_xwalk;
infile datalines dlm=',';
length question 8. answer 8. need $50. problem $100. goal $200.;
input question answer need $ problem $ goal $;
datalines;
11,1,Emergency Room Visits, Emergency Room Visits - One visit reported., Have 0 ER visits over next 60 days. See your doctor or urgent care facility when appropriate.
11,2,Emergency Room Visits, Emergency Room Visits - Two visits reported., Have 0 ER visits over next 60 days. See your doctor or urgent care facility when appropriate.
11,3,Emergency Room Visits, Emergency Room Visits - Three or more visits reported., Have 0 ER visits over next 60 days. See your doctor or urgent care facility when appropriate.
;
run;
data survey_resp_formatted(drop=question answer);
set survey_resp;
quest = put(question,quest.);
ans = put(answer,ans.);
run;
data resp_xwalk_formatted(drop= question answer);
set resp_xwalk;
quest = put(question,quest.);
ans = put(answer,ans.);
run;
proc sql;
create table want as
select a.clientid, a.quest, a.ans, b.need, b.problem, b.goal
from survey_resp_formatted a
left join resp_xwalk_formatted b
on a.quest=b.quest and a.ans=b.ans
order by a.clientid, a.quest, a.ans
;quit;
-unison
First of all, make your coding easier by orders of magnitude by getting rid of the stupid nonstandard names. Really. I mean it.
Then, remove the spurious single quotes as they will cause syntax ERRORS.
See this short example
data ds1;
infile datalines dlm='|' dsd;
length question response $50;
input question response;
datalines;
1. What medical condition do you have|Asthma
;
data ds2;
infile datalines dlm='|' dsd;
length question response goal plan $50;
input question response goal plan;
datalines;
1. What medical condition do you have|Asthma|Ensure Asthma Medications are current|Review medications with physician
;
proc sql;
create table want as
select ds1.*, ds2.goal, ds2.plan
from ds1 left join ds2
on ds1.question = ds2.question and ds1.response = ds2.response;
quit;
"First of all, make your coding easier by orders of magnitude by getting rid of the stupid nonstandard names. Really. I mean it." seems unnecessarily hostile, but thanks
@ScottM8 wrote:
"First of all, make your coding easier by orders of magnitude by getting rid of the stupid nonstandard names. Really. I mean it." seems unnecessarily hostile, but thanks
It's not meant to be hostile. It may save you years of grief in your future SAS life. Non-standard names are stupid, as they cause unnecessary work. A good programmer is lazy by nature and avoids unnecessary work like the plague.
Just compare your join condition with mine.
That makes sense. I'm very new to this so I guess I was more focused on the join and getting the desired result than setting up the data correctly and efficiently. I'll keep this in mind going forward as it will make everything easier. Thank you.
@ScottM8 wrote:
"First of all, make your coding easier by orders of magnitude by getting rid of the stupid nonstandard names. Really. I mean it." seems unnecessarily hostile, but thanks
I would say not.
I used to do contract data work for a very large computer company's hardware branch. They sent us file extracts 3 times a week and they had the columns change order almost every other file and column headings randomly acquire or lose characters constantly. If we had attempted to use those inconsistent column labels we would have had to change up to 100's of lines of code multiple times per week.
By using our consistent "standard" names we only had to modify our read program to handle the column order issue.
Interesting enough when we added some work from one of the exact same company's SOFTWARE divisions we never ran into either of these issues.
I don't believe that your data set 2 is properly organized for this purpose and this approach may not be viable.
Provide some example data from both data sets AND what you expect the resulting data set to look like.
When I am working with similar data I generally do not attempt what I think you may be doing.
Instead the "question" information goes into my variable labels to display as needed and question responses, especially if common across many questions go into formats so that a response code of "A" (or what ever, you don't show what your actual response values might be) can show the appropriate text as needed.
Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.
You don't need to show all of the data or anything sensitive but maybe 3 of the multiple response variables, the categories and what you expect.
And an example of Label/format display.
proc format library=work; value somevalue 1 = 'Always' 2 = 'Often' 3 = 'Sometimes' 4 = 'Never' ; data example; input var1 - var5; label var1='Question1: Response1' var2='Question1: Response2' var3='Question1: Response3' var4='Question2: Response1' var5='Question2: Response1' ; datalines; 1 2 1 3 4 1 1 2 2 3 3 3 1 1 4 1 1 1 1 1 2 2 2 3 3 ; proc print data=example label noobs; format var1 - var5 somevalue.; run;
Multiple response questions may also take some more manipulation depending on how the data is recorded for some analysis or reporting.
I'm very sorry...I tried to follow the instructions to turn the SAS data in to a data step, but was unsuccessful. Here is an example of a response, the response crosswalk, and the desired outcome
Survey Response Data
ClientID | Question | Answer |
1 | 11. In the past year, how many times have you been to the emergency room? | Twice |
Survey Crosswalk
Question | Response | Need | Problem | Goal |
11. In the past year, how many times have you been to the emergency room? | Once | Emergency Room Visits | Emergency Room Visits - One visit reported. | Have 0 ER visits over next 60 days. See your doctor or urgent care facility when appropriate. |
11. In the past year, how many times have you been to the emergency room? | Twice | Emergency Room Visits | Emergency Room Visits - Two visits reported. | Have 0 ER visits over next 60 days. See your doctor or urgent care facility when appropriate. |
11. In the past year, how many times have you been to the emergency room? | Three or more | Emergency Room Visits | Emergency Room Visits - Three or more visits reported. | Have 0 ER visits over next 60 days. See your doctor or urgent care facility when appropriate. |
Desired Outcome:
ClientID | Question | Answer | Need | Problem | Goal |
1 | 11. In the past year, how many times have you been to the emergency room? | Twice | Emergency Room Visits | Emergency Room Visits - Two visits reported. | Have 0 ER visits over next 60 days. See your doctor or urgent care facility when appropriate. |
See my edited response below!
One thing to keep in mind is that, according to your tables, the 'response' is called 'answer' in your client table and 'response' in your crosswalk.
-unison
Thank you very much! It looks like I wasn't inputting the data correctly. This helps a lot. Thank you again.
@ScottM8 wrote:
I'm very sorry...I tried to follow the instructions to turn the SAS data in to a data step, but was unsuccessful. Here is an example of a response, the response crosswalk, and the desired outcome
Survey Response Data
ClientID Question Answer 1 11. In the past year, how many times have you been to the emergency room? Twice
Survey Crosswalk
Question Response Need Problem Goal 11. In the past year, how many times have you been to the emergency room? Once Emergency Room Visits Emergency Room Visits - One visit reported. Have 0 ER visits over next 60 days. See your doctor or urgent care facility when appropriate. 11. In the past year, how many times have you been to the emergency room?
TwiceEmergency Room Visits Emergency Room Visits - Two visits reported. Have 0 ER visits over next 60 days. See your doctor or urgent care facility when appropriate. 11. In the past year, how many times have you been to the emergency room? Three or more Emergency Room Visits Emergency Room Visits - Three or more visits reported. Have 0 ER visits over next 60 days. See your doctor or urgent care facility when appropriate.
Desired Outcome:
ClientID Question Answer Need Problem Goal 1 11. In the past year, how many times have you been to the emergency room? Twice Emergency Room Visits Emergency Room Visits - Two visits reported. Have 0 ER visits over next 60 days. See your doctor or urgent care facility when appropriate.
Do you have an example of one or two of these where 1) the Need Changes and/or 2) the Goal Changes?
And is the "Need" common to other questions?
If the Need does not change then "Problem" is quite doable with a format.
If the Goal does not change then either a format may well be workable as well.
Does you question 11 have any response choice of Zero or None? Is that all of the possible responses to question?
A slightly more targeted format and label approach.
proc format library=work; value $q11_P "Once" = "Emergency Room Visits - One visit reported" "Twice" = "Emergency Room Visits - Two visits reported" "Three or more"= "Emergency Room Visits - Three or more visits reported" "None" = " " ; value $q11_G "Once", "Twice", "Three or more"= "Have 0 ER visits over next 60 days. See your doctor or urgent care facility when appropriate." "None" = " " ; run; data have; infile datalines dlm=','; input clientid $ q11 $15.; label clientid ="Client Identifier" Q11= "In the past year, how many times have you been to the emergency room?" ; datalines; 1,Once 2,Twice 3,Three or more 4,None ; proc print data=have noobs label; title "Q11 values printed with Problem format"; format q11 $q11_p.; run; title; proc print data=have noobs label; title "Q11 values printed with Goal format"; format q11 $q11_G.; run; title;
I am afraid that to get output that looks exactly like your desired outcome is very dependent on your actual data structure and content and without a good example I'm not sure what to suggest.
I will say that if I have a document like your crosswalk then it is not difficult in Excel to create the Format statements similar to what I show. And if the document is well structured creating a variable label statement is not much work either.
If you can't turn the SAS data set, or part of it into data step code can you copy some lines of the data from your data source and paste that as text? Change anything sensitive like client identity information to some that can't link back to your actual data.
Copy the data lines, including a header row if any, into a text box opened with the forums {I}.
I can say from working with survey data for more than 20 years I have yet to have a software package output the responses with question identity and value one the same row aw your "response data" implies.
If I were to guess, the variable naming going on in your datasets is giving you issues. Work on getting the variable names into a standard format (starts with A-z or _ and remove any spaces) and then your problem can be solved with a left join as you have done.
Here's an example:
proc format;
value quest
11 = "11. In the past year, how many times have you been to the emergency room?"
;
value ans
1 = "Once"
2 = "Twice"
3 = "3 or more"
;
run;
data survey_resp;
input clientid question answer;
datalines;
1 11 1
2 11 3
3 11 2
;
run;
data resp_xwalk;
infile datalines dlm=',';
length question 8. answer 8. need $50. problem $100. goal $200.;
input question answer need $ problem $ goal $;
datalines;
11,1,Emergency Room Visits, Emergency Room Visits - One visit reported., Have 0 ER visits over next 60 days. See your doctor or urgent care facility when appropriate.
11,2,Emergency Room Visits, Emergency Room Visits - Two visits reported., Have 0 ER visits over next 60 days. See your doctor or urgent care facility when appropriate.
11,3,Emergency Room Visits, Emergency Room Visits - Three or more visits reported., Have 0 ER visits over next 60 days. See your doctor or urgent care facility when appropriate.
;
run;
data survey_resp_formatted(drop=question answer);
set survey_resp;
quest = put(question,quest.);
ans = put(answer,ans.);
run;
data resp_xwalk_formatted(drop= question answer);
set resp_xwalk;
quest = put(question,quest.);
ans = put(answer,ans.);
run;
proc sql;
create table want as
select a.clientid, a.quest, a.ans, b.need, b.problem, b.goal
from survey_resp_formatted a
left join resp_xwalk_formatted b
on a.quest=b.quest and a.ans=b.ans
order by a.clientid, a.quest, a.ans
;quit;
-unison
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.