Everyone has more than one result in my dataset. I am trying to create a new column that shows if they answered 1 and 2 OR 1 and 3 OR 1 and 5 OR 3 and 5, etc.
This is what I currently have, but it is not looking at the data by ssn which is what I need it to do:
proc sql;
create table diff_answer_combos as
select *
, case when status_flg in (1,2) then "1+2"
when status_flg in (1,3) then "1+3"
when status_flg in (1,4) then "1+4"
when status_flg in (1,5) then "1+5"
when status_flg in (1,6) then "1+6"
when status_flg in (2,3) then "2+3"
when status_flg in (2,4) then "2+4"
when status_flg in (2,5) then "2+5"
when status_flg in (2,6) then "2+6"
when status_flg in (3,4) then "3+4"
when status_flg in (3,5) then "3+5"
when status_flg in (3,6) then "3+6"
when status_flg in (4,5) then "4+5"
when status_flg in (4,6) then "4+6"
when status_flg in (5,6) then "5+6"
else ""
end as status_combined
from duplicates_diff_answer
order by ssn;
quit;
Please help!
Thanks 🙂
So if you have two observations per group you can just build the new string from the MIN and MAX values.
proc sql;
create table diff_answer_combos as
select ssn
, count(*) as nobs
, catx('+',min(status_flg),max(status_flg)) as status_combined length=5
from duplicates_diff_answer
group by ssn
;
quit;
So you will always have exactly two observations per ssn?
Then (IMO) a data step will be your best solution:
proc sort data=duplicates_diff_answer;
by ssn status_flg;
run;
data diff_answer_combos;
set duplicates_diff_answer;
by ssn;
if first.ssn
then status_combined = " "; /* set a sufficient length here, if more is needed */
else status_combined = catx("+",status_combined,status_flg);
if last.ssn;
drop status_flg;
run;
If you need to make sure that only 2 rows exist per ssn, count the number of plus signs.
If you're looking for status_flg across more than one row this type of logic will not work, it only evaluates once per row so I don't think this will achieve what you're looking to accomplish.
Can you show a sample of input data and desired results?
So if you have two observations per group you can just build the new string from the MIN and MAX values.
proc sql;
create table diff_answer_combos as
select ssn
, count(*) as nobs
, catx('+',min(status_flg),max(status_flg)) as status_combined length=5
from duplicates_diff_answer
group by ssn
;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.