BookmarkSubscribeRSS Feed
AnnieFrannie
Calcite | Level 5

Hi! I’m quite new to SAS 9.4 and would need som help to solve a problem. I have a dataset that contains insurance-ID, year when the insurance started to and year when it ended. But everytime something happened (like a claim) there is a new row with a new period. So it looks something like this:

 

InsNr                  StartDate          EndDate

112233               20010409          20030431
112233               20030501          20060808
112233               20060809          20100502
223344               20070812          20091010
223344               20091011          20151118

 

But I would like one row for each year the insurance has been active. Like this:

 

InsNr                  ClaimYear

112233               2001
112233               2002
112233               2003

112233               2004

112233               2005

112233               2006

112233               2007

112233               2008

112233               2009

112233               2010

223344               2007

223344               2008

223344               2009

223344               2010

223344               2011

223344               2012

223344               2013

223344               2014

223344               2015

 

I don’t know where to start actually, I’ve tried some things, but it didn’t seem to lead anywhere. I’m thankful for all help or ideas on this, Thank you in advance!

7 REPLIES 7
Kurt_Bremser
Super User

Use a do loop:

data want (keep=insnr claimyear);
set have;
do claimyear = year(input(startdate,yymmdd8.)) to year(input(enddate,yymmdd8.));
  output;
end;
run;

proc sort data=want nodupkey;
by insnr claimyear;
run;

The second step eliminates doubles caused by breaks not happening on dec 31.

AnnieFrannie
Calcite | Level 5

Thank you for your response.
I tried your code. Unfortunately, It didn't work. 😞 My dataset does not contain the variable ClaimYear yet, it is a variable that I wan't to create instead of StartDate and EndDate.

Now I've added two new columns in the data set that only contain the years from StartDate and EndDate (I thought it might be easier):

 

data have;

set "C:/blablabla...";

StartYear=substrn(StartDate,1,4);

EndYear=substrn(EndDate,1,4);

run;

 

Can I somehow use DO here too?

Kurt_Bremser
Super User

First of all,

"it didn't work"

is about as informative as

"as sack of rice dropped in China"

!

Post the log of a failed step, and in case there are no ERRORs or WARNINGs, describe how the result differed from what you expected.

Second, the variable claimyear is created by the do loop, and does not have to be there at all before the step runs.

When testing, I found that you posted bogus example data; there can never be a 31st of April!

Now see this:

data have;
input InsNr $ StartDate $ EndDate $;
cards;
112233               20010409          20030430
112233               20030501          20060808
112233               20060809          20100502
223344               20070812          20091010
223344               20091011          20151118
;
run;

data want (keep=insnr claimyear);
set have;
do claimyear = year(input(startdate,yymmdd8.)) to year(input(enddate,yymmdd8.));
  output;
end;
run;

proc sort data=want nodupkey;
by insnr claimyear;
run;

proc print data=want noobs;
run;

Result:

InsNr     claimyear

112233       2001  
112233       2002  
112233       2003  
112233       2004  
112233       2005  
112233       2006  
112233       2007  
112233       2008  
112233       2009  
112233       2010  
223344       2007  
223344       2008  
223344       2009  
223344       2010  
223344       2011  
223344       2012  
223344       2013  
223344       2014  
223344       2015  
AnnieFrannie
Calcite | Level 5

Thank you for your answer.
I'm sorry, I'm new here. And yes I was a little too hasty when I created the example data it seems.
But I will try my best to be more informative, here is the log where the error ocurred (english isn't my mother's tongue so the variables has other names in my program, but it means the same thing):

 

NOTE: The data set WORK.PREMIER has 134077 observations and 10 variables.
 NOTE: DATA statement used (Total process time):
       real time           4.13 seconds
       cpu time            0.18 seconds
       

 
 83         data premier2 (keep=Forsnr SkadeAr);
 84         set premier;
 85         do SkadeAr = year(input(begynnelsedatum,yymmdd8.)) to
 85       ! year(input(forfallodatum,yymmdd8.));
 86           output;
 87         end;
 88         run;
 
 NOTE: Numeric values have been converted to character values at the places given by: 
       (Line):(Column).
       85:25   85:66   
 NOTE: Invalid argument to function INPUT at line 85 column 60.
 NOTE: Invalid argument to function INPUT at line 85 column 19.
 ERROR: Invalid DO loop control information, either the INITIAL or TO expression is missing or the 
        BY expression is missing, zero, or invalid.

 

 

 
Kurt_Bremser
Super User

This is why we usually ask to post example datasets in a data step. Use the macro provided in https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... to convert your dataset to a data step that can be posted in a code window. This method will preserve the dataset structure and the data as is, so we don't have to make guesses about variable types and formats from data that is just displayed.

 

In your case, I made the assumption that your dates are still character, which was obviously wrong.

So omit the input functions:

data want (keep=insnr claimyear);
set have;
do claimyear = year(startdate) to year(enddate);
  output;
end;
run;

This of course assumes that you really have SAS date values (days from 01-01-1960, formatted with a SAS date format) and not just some numbers that look like dates.

AnnieFrannie
Calcite | Level 5

Thank you for your answer and information about how to generate data for forums. I appreciate it.
I think that the dates are just numbers to SAS, because the data originally comes from another system.

 

Kurt_Bremser
Super User

Now if a date is stored as a number that looks like a date, eg 20170717 (20 million 170 thousand 717), that should be converted to a real SAS date value as early as possible in your analytics chain, preferrably when you read external data. How you do that depends on how data is delivered to you (connection to database, text file, Excel (yucc) file, ...).

If you have a number like that, convert it like this:

startdate = input(put(startdate,8.),yymmdd8.);
format startdate yymmddn8.; /* variable will look the same, but have a different internal value */
format startdate date9.; /* the standard SAS date notation */
format startdate yymmddd8.; /* ISO-compliant notation, used in databases */
format startdate ddmmyyp10.; /* typical in Germany */
format startdate mmddyy10.; /* US and UK */

Use one of the formats that suits you.

Do a Google search for "SAS date values", and one of the first hits will be this https://support.sas.com/resources/papers/proceedings15/1334-2015.pdf, which is a very good paper about how SAS stores date and time values, and how you import, handle and display/export them.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 918 views
  • 0 likes
  • 2 in conversation