Hi all,
I need some help manipulating a dataset I have.
In a data step, the code below produces the table that follows
DATA Out.tmp1a;
set out.tmp1;
if TOS_CAT1_CD = 'FAC_IP' and MAP_CD in ('170','171','172','173','174','175','179') then TOS_CAT2_CD = 'MATER';
else TOS_CAT2_CD = ‘UNKNOWN’;
RUN;
INT_CLM_NUM CLM_LINE_NUM MAP_CD TOS_CAT1_CD TOS_CAT2_CD
123456789012345 0010 173 FAC_IP MATER
123456789012345 0020 174 FAC_IP MATER
123456789012345 0030 250 FAC_IP
123456789012345 0040 300 FAC_IP
123456789012345 0050 410 FAC_IP
Although the produced table is right, I want to rewrite my code to affect every claim (INT_CLM_NUM) that meets the criteria listed instead of the claim line(CLM_LINE_NUM)
So in this case, TOS_CAT2_CD should equal ‘MATER’ for every row in the table.
I’m having trouble figuring out how to write this.
Thank you for the help
Could you explain better what you need?
Give example of your desired output....
So currently, it's applying it only on the rows where the if statement is true. I want it applied to every row so long as the claim number(INT_CLM_NUM) is the same
I want the desired output to look like this (only when the if statement is true):
INT_CLM_NUM CLM_LINE_NUM MAP_CD TOS_CAT1_CD TOS_CAT2_CD
123456789012345 0010 173 FAC_IP MATER
123456789012345 0020 174 FAC_IP MATER
123456789012345 0030 250 FAC_IP MATER
123456789012345 0040 300 FAC_IP MATER
888888888888888 0010 133 FAC_IP
888888888888888 0020 216 FAC_IP
777777777777777 0010 170 FAC_IP MATER
777777777777777 0020 412 FAC_IP MATER
Would this be good?
/* Create test data */
data tmp1;
input INT_CLM_NUM :$15. CLM_LINE_NUM $ MAP_CD $ TOS_CAT1_CD $;
cards;
123456789012345 0010 173 FAC_IP
123456789012345 0020 174 FAC_IP
123456789012345 0030 250 FAC_IP
123456789012345 0040 300 FAC_IP
123456789012345 0050 410 FAC_IP
123456789012346 0010 273 FAC_IP
123456789012346 0020 274 FAC_IP
123456789012347 0010 273 ABC_IP
123456789012347 0020 179 ABC_IP
;
/* Create dataset with the desired additional variable */
proc sql;
create table tmp1a as
select a.*, coalesce(tmpvar, 'UNKNOWN') as TOS_CAT2_CD length=7 from
tmp1 a left join
(select distinct INT_CLM_NUM, 'MATER' as tmpvar
from tmp1
where TOS_CAT1_CD = 'FAC_IP'
and MAP_CD in ('170','171','172','173','174','175','179')) b
on a.INT_CLM_NUM=b.INT_CLM_NUM
order by a.INT_CLM_NUM, CLM_LINE_NUM;
quit;
proc print data=tmp1a;
run;
Here's a DATA step solution that assumes you have already sorted your data by INT_CLM_NUM:
data want;
tos_cat2_cd='UNKNOWN';
do until (last.int_clm_num);
set have;
by int_clm_num;
if TOS_CAT1_CD = 'FAC_IP' and MAP_CD in ('170','171','172','173','174','175','179') then TOS_CAT2_CD = 'MATER';
end;
do until (last.int_clm_num);
set have;
by int_clm_num;
output;
end;
run;
The top loop reads all the records for one INT_CLM_NUM and re-sets TOS_CAT2_CD if any record meet the conditions. The bottom loop then reads the same records, and outputs.
If you actually want blank values, instead of "UNKNOWN" when no observations meet the conditions, just remove the line that assigns the value "UNKNOWN".
Good luck.
Show your code, and we can fix it.
The most likely cause is that your data already contains TOS_CAT2_CD so you will need to drop it:
set have (drop=TOS_CAT2_CD);
It looks like you want mark the whole group .
data tmp1;
input INT_CLM_NUM :$15. CLM_LINE_NUM $ MAP_CD $ TOS_CAT1_CD $;
cards;
123456789012345 0010 173 FAC_IP
123456789012345 0020 174 FAC_IP
123456789012345 0030 250 FAC_IP
123456789012345 0040 300 FAC_IP
888888888888888 0010 133 FAC_IP
888888888888888 0020 216 FAC_IP
777777777777777 0010 170 FAC_IP
777777777777777 0020 412 FAC_IP
;
run;
proc sql;
create table want as
select *,case
when sum(TOS_CAT1_CD = 'FAC_IP')=count(*) and sum(MAP_CD in ('170','171','172','173','174','175','179')) gt 0 then 'MATER'
else 'UNKNOWN'
end as TOS_CAT2_CD length=40
from tmp1
group by INT_CLM_NUM ;
quit;
@chemph: Now you've got three solutions. If you go for a PROC SQL approach, I would recommend @Ksharp's, because it's more elegant than mine. (It lets PROC SQL perform the join of grouped and original data automatically.) Please note that his definition of TOS_CAT2_CD is slightly different from the definition used in the other two suggestions: He requires all observations with the same INT_CLM_NUM to have TOS_CAT1_CD='FAC_IP', whereas in the other approaches a single occurrence of 'FAC_IP' in that group of observations would trigger the "MATER" designation for the group. But this could be adapted easily by deleting "=count(*)" in the WHEN condition. Also, an ORDER BY clause (as in my code) could be added to sort the output dataset.
@Astounding's DOW loop approach would need only a minor adaptation if it were to work with grouped, but unsorted input data (like 888... before 777...): You would just need to add the NOTSORTED option to the two BY statements: by int_clm_num notsorted;.
As a benefit, it would maintain the same sort order in the output dataset (unlike PROC SQL).
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.