Hi! I’m quite new to SAS 9.4 and would need som help to solve a problem. I have a dataset that contains insurance-ID, year when the insurance started to and year when it ended. But everytime something happened (like a claim) there is a new row with a new period. So it looks something like this:
InsNr StartDate EndDate
112233 20010409 20030431
112233 20030501 20060808
112233 20060809 20100502
223344 20070812 20091010
223344 20091011 20151118
But I would like one row for each year the insurance has been active. Like this:
I don’t know where to start actually, I’ve tried some things, but it didn’t seem to lead anywhere. I’m thankful for all help or ideas on this, Thank you in advance!
Use a do loop:
data want (keep=insnr claimyear); set have; do claimyear = year(input(startdate,yymmdd8.)) to year(input(enddate,yymmdd8.)); output; end; run; proc sort data=want nodupkey; by insnr claimyear; run;
The second step eliminates doubles caused by breaks not happening on dec 31.
Thank you for your response.
I tried your code. Unfortunately, It didn't work. My dataset does not contain the variable ClaimYear yet, it is a variable that I wan't to create instead of StartDate and EndDate.
Now I've added two new columns in the data set that only contain the years from StartDate and EndDate (I thought it might be easier):
Can I somehow use DO here too?
First of all,
"it didn't work"
is about as informative as
"as sack of rice dropped in China"
Post the log of a failed step, and in case there are no ERRORs or WARNINGs, describe how the result differed from what you expected.
Second, the variable claimyear is created by the do loop, and does not have to be there at all before the step runs.
When testing, I found that you posted bogus example data; there can never be a 31st of April!
Now see this:
data have; input InsNr $ StartDate $ EndDate $; cards; 112233 20010409 20030430 112233 20030501 20060808 112233 20060809 20100502 223344 20070812 20091010 223344 20091011 20151118 ; run; data want (keep=insnr claimyear); set have; do claimyear = year(input(startdate,yymmdd8.)) to year(input(enddate,yymmdd8.)); output; end; run; proc sort data=want nodupkey; by insnr claimyear; run; proc print data=want noobs; run;
InsNr claimyear 112233 2001 112233 2002 112233 2003 112233 2004 112233 2005 112233 2006 112233 2007 112233 2008 112233 2009 112233 2010 223344 2007 223344 2008 223344 2009 223344 2010 223344 2011 223344 2012 223344 2013 223344 2014 223344 2015
Tuesday - last edited Tuesday
Thank you for your answer.
I'm sorry, I'm new here. And yes I was a little too hasty when I created the example data it seems.
But I will try my best to be more informative, here is the log where the error ocurred (english isn't my mother's tongue so the variables has other names in my program, but it means the same thing):
NOTE: The data set WORK.PREMIER has 134077 observations and 10 variables. NOTE: DATA statement used (Total process time): real time 4.13 seconds cpu time 0.18 seconds 83 data premier2 (keep=Forsnr SkadeAr); 84 set premier; 85 do SkadeAr = year(input(begynnelsedatum,yymmdd8.)) to 85 ! year(input(forfallodatum,yymmdd8.)); 86 output; 87 end; 88 run; NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column). 85:25 85:66 NOTE: Invalid argument to function INPUT at line 85 column 60. NOTE: Invalid argument to function INPUT at line 85 column 19. ERROR: Invalid DO loop control information, either the INITIAL or TO expression is missing or the BY expression is missing, zero, or invalid.
This is why we usually ask to post example datasets in a data step. Use the macro provided in https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... to convert your dataset to a data step that can be posted in a code window. This method will preserve the dataset structure and the data as is, so we don't have to make guesses about variable types and formats from data that is just displayed.
In your case, I made the assumption that your dates are still character, which was obviously wrong.
So omit the input functions:
data want (keep=insnr claimyear); set have; do claimyear = year(startdate) to year(enddate); output; end; run;
This of course assumes that you really have SAS date values (days from 01-01-1960, formatted with a SAS date format) and not just some numbers that look like dates.
Thank you for your answer and information about how to generate data for forums. I appreciate it.
I think that the dates are just numbers to SAS, because the data originally comes from another system.
Tuesday - last edited Tuesday
Now if a date is stored as a number that looks like a date, eg 20170717 (20 million 170 thousand 717), that should be converted to a real SAS date value as early as possible in your analytics chain, preferrably when you read external data. How you do that depends on how data is delivered to you (connection to database, text file, Excel (yucc) file, ...).
If you have a number like that, convert it like this:
startdate = input(put(startdate,8.),yymmdd8.); format startdate yymmddn8.; /* variable will look the same, but have a different internal value */ format startdate date9.; /* the standard SAS date notation */ format startdate yymmddd8.; /* ISO-compliant notation, used in databases */ format startdate ddmmyyp10.; /* typical in Germany */ format startdate mmddyy10.; /* US and UK */
Use one of the formats that suits you.
Do a Google search for "SAS date values", and one of the first hits will be this https://support.sas.com/resources/papers/proceedings15/1334-2015.pdf, which is a very good paper about how SAS stores date and time values, and how you import, handle and display/export them.