Hello,
I am trying to create columns from a dataset. I tried to use the if then/else statement but I'm not getting the desired results.
data have;
input case $ id $ flag30 flag60;
datalines;
PAA 001199905 0 0
PAA 001199905 0 0
PAA 001199905 1 1
PostDDA 001199905 1 1
PAA 001587306 1 1
PostDDA 001587306 0 0
PostDDA 001546292 1 1
;
run;
From the data. I have ID where case PAA and PostDDA are both 1.
I want to category any ID where PAA and PostDDA is equal to 1 as multi_form
ID with PAA, where flag30=1 should be categorized as PAA_only
ID with PostDDA, where flag30=1 should be categorized as PostDDA_only.
data need;
*do until(last.id);
set have;
by id;
if case='PAA' and flag30=1 then PAA_only=1;
if case='PostDDA' and flag30=1 then PostDDA_only=1;
if case='PAA' and case='PostDDA' and flag30=1 then multi_form=1;
end;
run;
Here is my desired output:
id | flag30 | flag60 | PAA_only | PostDDA_only | multi_form |
00119990 | 1 | 1 | . | . | 1 |
00154629 | 1 | 1 | . | 1 | . |
00158730 | 1 | 1 | 1 | . | . |
I think this might require multiple data steps. Am help will be highly appreciated. Thanks
We can proc transpose data by ID where flag30=1, this will pick up both PAA and POSTDDA. The only concern is flag60, if this is always equal to flag30, then we can hardcode it 1.
eg:
proc sort data=have; by id case; run;
proc transpose data=have out=wide(drop=_name_);
where flag30=1;
by id;
var flag30;
id case;
run;
data want;
set wide;
if paa=1 and PostDDA=1 then do;
multi_form=1;
call missing(paa, postdda);
end;
flag30=1;
flag60=1; /*if always 1 when FLAG30=1*/;
proc print;run;
variable order slightly changes, but it could be fixed in data step or in the proc step where it needed. Or it could me merged back from the HAVE dataset.
Obs | id | PAA | PostDDA | multi_form | flag30 | flag60 |
---|---|---|---|---|---|---|
1 | 00119990 | . | . | 1 | 1 | 1 |
2 | 00154629 | . | 1 | . | 1 | 1 |
3 | 00158730 | 1 | . | . | 1 | 1 |
Ugly but works.
data have;
input case $ id $ flag30 flag60;
datalines;
PAA 001199905 0 0
PAA 001199905 0 0
PAA 001199905 1 1
PostDDA 001199905 1 1
PAA 001587306 1 1
PostDDA 001587306 0 0
PostDDA 001546292 1 1
;
run;
proc sql;
create table want(drop = PAA_flag POSTDDA_FLAG) as
select id, max(flag30) as flag30, max(flag60) as flag60,
sum(case when caseID = 'PAA' then 1*flag30 else 0 end) as PAA_flag,
sum(case when caseID = 'PostDDA' then 1*flag30 else 0 end) as PostDDA_flag,
case when calculated PAA_flag=calculated PostDDA_flag=1 then 1 else . end as multi_form,
case when calculated PAA_flag=1 and calculated PostDDA_flag=0 then 1 else . end as PAA_only,
case when calculated PAA_flag=0 and calculated PostDDA_flag=1 then 1 else . end as PostDDA_only
from have(rename=case=caseID)
group by id;
quit;
If you're already sorted, a data step will do this in one step:
proc sort data=have;
by id;
run;
data want;
set have;
by id;
retain _flag30 _flag60 paa postdda;
if first.id then
call missing(_flag30, _flag60, paa, postdda, paa_only, postdda_only,
multi_form);
_flag30=max(flag30, _flag30);
_flag60=max(flag60, _flag60);
if case='PAA' and flag30=1 then
paa=1;
else if case='PostDDA' and flag30=1 then
postDDA=1;
if last.id then
do;
flag30=_flag30;
flag60=_flag60;
if paa=1 and postdda=1 then
multi_form=1;
else if paa=1 then
paa_only=1;
else if postdda=1 then
postdda_only=1;
OUTPUT;
end;
keep id flag30 flag60 paa_only postdda_only multi_form;
run;
@CathyVI wrote:
Hello,
I am trying to create columns from a dataset. I tried to use the if then/else statement but I'm not getting the desired results.
data have; input case $ id $ flag30 flag60; datalines; PAA 001199905 0 0 PAA 001199905 0 0 PAA 001199905 1 1 PostDDA 001199905 1 1 PAA 001587306 1 1 PostDDA 001587306 0 0 PostDDA 001546292 1 1 ; run;
From the data. I have ID where case PAA and PostDDA are both 1.
I want to category any ID where PAA and PostDDA is equal to 1 as multi_form
ID with PAA, where flag30=1 should be categorized as PAA_only
ID with PostDDA, where flag30=1 should be categorized as PostDDA_only.data need; *do until(last.id); set have; by id; if case='PAA' and flag30=1 then PAA_only=1; if case='PostDDA' and flag30=1 then PostDDA_only=1; if case='PAA' and case='PostDDA' and flag30=1 then multi_form=1; end; run;
Here is my desired output:
id
flag30
flag60
PAA_only
PostDDA_only
multi_form
00119990
1
1
.
.
1
00154629
1
1
.
1
.
00158730
1
1
1
.
.
I think this might require multiple data steps. Am help will be highly appreciated. Thanks
Not sure why you want the missing values, but here is a simple data step to check all of the observations for an ID and keep track of whether any of the two specific cases happen or not. Then just use normal boolean logic to make your new flags.
data want;
do until(last.id);
set have;
by id;
if flag30 then do;
if case='PAA' then PAA=1;
else if case='PostDDA' then PostDDA=1;
end;
_flag30=max(_flag30,flag30);
_flag60=max(_flag60,flag60);
end;
flag30=_flag30=1;
flag60=_flag60=1;
PAA_only= PAA and not PostDDA;
PostDDA_only = PostDDA and not PAA;
Multi_form = PostDDA and PAA ;
keep id flag30 flag60 PAA_only PostDDA_only Multi_form ;
run;
Let's try it.
data have;
length id $9 case $8 ;
input case id flag30 flag60;
datalines;
PAA 001199905 0 0
PAA 001199905 0 0
PAA 001199905 1 1
PostDDA 001199905 1 1
PAA 001587306 1 1
PostDDA 001587306 0 0
PostDDA 001546292 1 1
xxx YYYY 1 1
PAA ZZZZ 0 0
;
Result
PostDDA_ Multi_ Obs id flag30 flag60 PAA_only only form 1 001199905 1 1 0 0 1 2 001546292 1 1 0 1 0 3 001587306 1 1 1 0 0 4 YYYY 1 1 0 0 0 5 ZZZZ 0 0 0 0 0
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.