Everyone has more than one result in my dataset. I am trying to create a new column that shows if they answered 1 and 2 OR 1 and 3 OR 1 and 5 OR 3 and 5, etc.
This is what I currently have, but it is not looking at the data by ssn which is what I need it to do:
proc sql;
create table diff_answer_combos as
select *
, case when status_flg in (1,2) then "1+2"
when status_flg in (1,3) then "1+3"
when status_flg in (1,4) then "1+4"
when status_flg in (1,5) then "1+5"
when status_flg in (1,6) then "1+6"
when status_flg in (2,3) then "2+3"
when status_flg in (2,4) then "2+4"
when status_flg in (2,5) then "2+5"
when status_flg in (2,6) then "2+6"
when status_flg in (3,4) then "3+4"
when status_flg in (3,5) then "3+5"
when status_flg in (3,6) then "3+6"
when status_flg in (4,5) then "4+5"
when status_flg in (4,6) then "4+6"
when status_flg in (5,6) then "5+6"
else ""
end as status_combined
from duplicates_diff_answer
order by ssn;
quit;
Please help!
Thanks 🙂
So if you have two observations per group you can just build the new string from the MIN and MAX values.
proc sql;
create table diff_answer_combos as
select ssn
, count(*) as nobs
, catx('+',min(status_flg),max(status_flg)) as status_combined length=5
from duplicates_diff_answer
group by ssn
;
quit;
So you will always have exactly two observations per ssn?
Then (IMO) a data step will be your best solution:
proc sort data=duplicates_diff_answer;
by ssn status_flg;
run;
data diff_answer_combos;
set duplicates_diff_answer;
by ssn;
if first.ssn
then status_combined = " "; /* set a sufficient length here, if more is needed */
else status_combined = catx("+",status_combined,status_flg);
if last.ssn;
drop status_flg;
run;
If you need to make sure that only 2 rows exist per ssn, count the number of plus signs.
If you're looking for status_flg across more than one row this type of logic will not work, it only evaluates once per row so I don't think this will achieve what you're looking to accomplish.
Can you show a sample of input data and desired results?
So if you have two observations per group you can just build the new string from the MIN and MAX values.
proc sql;
create table diff_answer_combos as
select ssn
, count(*) as nobs
, catx('+',min(status_flg),max(status_flg)) as status_combined length=5
from duplicates_diff_answer
group by ssn
;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.