Hi all, I have the data in following format
medicaid_Id | total_months | start_month | end_month |
00001463A | 12 | 22013 | 112015 |
00001773A | 12 | 32013 | 12015 |
00004763A | 1 | 32014 | 32014 |
00006014A | 1 | 62014 | 32015 |
00007453A | 12 | 12013 | 12016 |
00008433A | 5 | 42013 | 112013 |
00009943A | 7 | 72013 | 12014 |
00010933A | 1 | 72013 | 72013 |
00010933A | 1 | 32014 | 32014 |
00011183A | 7 | 12013 | 12014 |
00011183A | 1 | 62014 | 72015 |
00011184A | 2 | 72013 | 82013 |
00043093A | 4 | 72013 | 102013 |
00043093A | 9 | 102013 | 122078 |
I need to count total_months for each medicaid_id. Fairly simple, but here is the catch. Some of those id's- which have multiple records- have the first time end_month as same as second time start_month or the second time end_month same as third time start_month. I need to subtract total months by 1 in those cases. Here is what I have coded and its not working. Help will be really appreciated.
data want;
set hAVE;
by medicaid_id start_month;
retain end_month =n;
if first.medicaid_id then total = total_months;
if not first.medicaid_id and start_month NE n -1 then total + total_months; else total +total_months - 1 ;
end;
run;
Ah, that makes a whole lot more sense. OK, here's one way:
data want;
set have;
by medicaid_id;
prior_end = lag(end_month);
if first.medicaid_id then total = total_months;
else do;
total + total_months;
if start_month = prior_end then total = total - 1;
end;
drop prior_end;
run;
First thing you better do is fix your data. If you have a date like 12115, you can't tell if that's December 1 or January 21.
Thanks. My bad, yes, i didnt notice that. Actually i used following method to get month year combine. On the other hand, it may not matter as its characters we are comparing, so it solves the purpose of identifying those cases where start month and end month of previous iteration is same.
data inal (drop = year month year_ month_);
set hh_2014_4;
month = month(start_dt);
year = year(start_dt);
start_month = compress(month||year);
month_ = month(end_dt);
year_ = year(end_dt);
end_month = compress(month_||year_);
run;
I'm sure there is better way to extract month and year than above?
If that code is producing the sample data you posted, there is definitely something wrong. Why do some of your dates contain the day, when you are only putting together the month and the year?
I wouldn't feel confident in any solution until I know more about your original date variables. When you run a PROC CONTENTS, what is in them? I assume they are numeric, but do they have a format connected with them? What do they look like when they are printed (unformatted vs. formatted)?
Once that is sorted out, it would probably be simpler to keep four variables instead of two: start_month, start_year, end_month, and end_year.
No sir, there are no days in the data i have or posted. First character is month (1 to 12) and after that its year (2013, 2014, 2015, 2016 or 2078)
These are the results after proc contents:
SAS Output
# | Variable | Type | Len | Format | Informat |
---|---|---|---|---|---|
3 | end_Dt | Num | 8 | MMDDYY10. | MMDDYY10. |
6 | end_month | Char | 24 | ||
1 | medicaid_Id | Char | 20 | $20. | |
2 | start_Dt | Num | 8 | MMDDYY10. | MMDDYY10. |
5 | start_month | Char | 24 | ||
4 | total_months | Num | 8 |
Ah, that makes a whole lot more sense. OK, here's one way:
data want;
set have;
by medicaid_id;
prior_end = lag(end_month);
if first.medicaid_id then total = total_months;
else do;
total + total_months;
if start_month = prior_end then total = total - 1;
end;
drop prior_end;
run;
Wonderful! Yes, that did it..so it seems retain was not required here--im still new to lag function..
I have to further do this below to get distinct id's with total months..
proc sql;
create table want_ as select medicaid_id, max(total) as total_ from want group by medicaid_id; quit;
Hope that's correct.
In a way, RETAIN is required. This statement is key:
total + total_months;
That form of statement (variable name + some numeric value) actually has a more complex set of meanings, one of which is to retain TOTAL.
Also note, if you only want a single observation per ID, the same DATA step can accomplish that. Just before the RUN statement, add:
if last.medicaid_id;
Your SQL code looks like it works as well.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.