BookmarkSubscribeRSS Feed
wh95
Fluorite | Level 6

I am trying to select the most recent information on a Member and Plan basis, but the payments can be adjusted in future reports, and I would like to select those values. The data is coming from a SQL database, but I have provided sample data here:

 

proc sql;
create table HAVE
	(Member char(10),
	PlanType char(3),
	ReportDate num informat=yymmn6.
			format=yymmn6.,
	StartDate num informat=yymmdd10.
			format=yymmdd10.,
	EndDate num informat=yymmdd10.
			format=yymmdd10.,
	AdjustmentReason char(64),
Code char(1), Payment num); insert into HAVE values('123AAABBBC', '001', '01JAN24'd, '01JAN24'd, '31JAN24'd, '', 'A', 500) values('123AAABBBC', '001', '01FEB24'd, '01FEB24'd, '29FEB24'd, '', 'A', 500) values('123AAABBBC', '001', '01MAR24'd, '01MAR24'd, '31MAR24'd, '', 'A', 500) values('123AAABBBC', '001', '01APR24'd, '01APR24'd, '30APR24'd, '', 'A', 500) values('123AAABBBC', '001', '01MAY24'd, '01APR24'd, '30APR24'd, 'Retro Disenroll', 'A', 500) values('123AAABBBC', '002', '01MAY24'd, '01APR24'd, '30APR24'd, 'Retro Enroll', 'A', 500) values('123AAABBBC', '002', '01MAY24'd, '01MAY24'd, '31MAY24'd, '', 'A', 500) values('123AAABBBC', '001', '01JUN24'd, '01JUN24'd, '30JUN24'd, '', 'A', 500) values('123AAABBBC', '001', '01JUN24'd, '01JAN24'd, '31MAR24'd, 'Retro Adjustment', 'B', 450) values('123AAABBBC', '001', '01JUN24'd, '01JUN24'd, '30JUN24'd, 'Retro Adjustment', 'B', 650); select * from HAVE; quit; run;

 

I have tried to create dummy variables by month and ordered them, so that the most recent value is the closest to the top of the table, but I am struggling to actually select the value I want.

PaymentJan2024 = 0; PaymentFeb2024 = 0; PaymentMar2024 = 0; PaymentApr2024 = 0; PaymentMay2024 = 0; PaymentJun2024 = 0;

length CodeJan2024 CodeFeb2024 CodeMar2024 CodeApr2024 CodeMay2024 CodeJun2024 $1.;

if StartDate <= '01JAN2024'd < EndDate then do; PaymentJan2024 = Payment; CodeJan2024 = Code; end; 
if StartDate <= '01FEB2024'd < EndDate then do; PaymentFeb2024 = Payment; CodeFeb2024 = Code; end;
if StartDate <= '01MAR2024'd < EndDate then do; PaymentMar2024 = Payment; CodeMar2024 = Code; end;
if StartDate <= '01APR2024'd < EndDate then do; PaymentApr2024 = Payment; CodeApr2024 = Code; end;
if StartDate <= '01MAY2024'd < EndDate then do; PaymentMay2024 = Payment; CodeMay2024 = Code; end;
if StartDate <= '01JUN2024'd < EndDate then do; PaymentJun2024 = Payment; CodeJun2024 = Code; end;

 

I believe the end result should look something like the following example, though I am open to feedback and adjustments to make the results understandable. I use '' to indicate no value.

Member | Plan | PaymentJan2024 | CodeJan2024 | PaymentFeb2024 | CodeFeb2024 | PaymentMar2024 | CodeMar2024 | PaymentApr2024 | CodeApr2024 | PaymentMay2024 | CodeMay2024 | PaymentJun2024 | CodeJun2024 | TotalPayment2024

123AAABBBC | 001 | 450 | B | 450 | B | 450 | B | 0 | '' | 0 | '' | 650 | B | 2000

123AAABBBC | 002 | 0 | '' | 0 | '' | 0 | '' | 500 | A | 500 | A | 0 | '' | 1000

7 REPLIES 7
PeterClemmensen
Tourmaline | Level 20

I don't understand what you're trying to achieve here? Why do you want the data structured with a column for each month?

 

And shouldn't the second row of your desired output have plan = '002'?

wh95
Fluorite | Level 6

The reason is because I am trying to use the dummied columns to do calculations, i.e. if code A then assign a certain value to do calculations off of. There are other values that I am trying to extract as well, but there are many and they come from a SQL database. I think the logic should carry through for the other values. You are correct, something didn't submit correctly, so I have updated the post with more code I meant to include originally.

ballardw
Super User

Are you trying to write code to execute in an external database or strictly within SAS? I ask because sometimes "dates" are treated enough differently in other systems that SAS may need some extra coding.

 

If the values are SAS dates in SAS data sets then the are numeric and "most recent" usually translates to "largest value" (unless someone is putting stupid dummy dates like 31DEC9999 instead of using missing values) so with SQL grouping by variables and Having datevariable=max(datevariable) or similar.

 

Like this:

proc sql;
   create table work.mostrecentstart as
   select * 
   from have
   group by member, plantype
   having startdate=max(startdate)
   ;
quit;

Which does yield duplicate startdates because your example has multiple startdates the same. You weren't very clear. Every one of your example data set date variables has repeats by Member and Plantype so it isn't very clear what "most recent" actually means here.

 

I will submit that creating all those date-data-in-the-variable-names is quite likely not helpful. For one thing the next report with more dates would require more variables to use in multiple places.

I would say that BY group processing or careful use of date variables in Proc Report or Tabulate is likely more useful but I can't follow what you really need because your HAVE data set does not have any variables named STARTPAYMENT or ENDPAYMENT.

 

Maybe something similar to :

proc report data=have;
   columns member plantype reportdate,(code payment);
   define member /group;
   define plantype/ group;
   define reportdate/across ;
run;
wh95
Fluorite | Level 6

The data will end up written back to a SQL database and I am pulling all of the data from a report that is extracted into a SQL table that I then use an ODBC connection to pull into SAS. I gave the example result to explain what I meant by most recent.  To clarify, yes, I will be making new variables for future months. The dates are given to me as in the sample data, but as a text string ('202401' for ReportDate and '2024-01-01' for StartDate/EndDate), however, I have already reformatted them into SAS datetime. The problem lies in the fact that each report can go back and retroactively address previous months. The most recent data for each month may not be in the most recent report if that date span is not addressed. The example of the second to last data line addresses Jan-Mar, but the report is in Jun, and that is the data I want to select for. Let me know if I'm not explaining it well enough. I have corrected the StartPayment > StartDate error in the original post as well. Your solution with proc report gives many rows for each member, when I only want 1 for every plan type at most. The first solution gives most recent payment dates, but those do not necessarily contain the data for every month. Sorry, it's a difficult problem to explain and I've gotten pretty close to what I want, I am just lost on how to then select the correct data.

wh95_0-1724271416959.png

This shows how the columns look when dummied, and what I would want would be the same as the result I originally gave.

Tom
Super User Tom
Super User

You variable names will be easier to work with if you use date strings in YMD order.  They will sort properly and you can use variable lists since they will have numeric suffixes.

 

Perhaps something like this?

data want;
  set have;
  array p Payment_2024_01 - Payment_2024_06 ;
  array c $3 Code_2024_01 - Code_2024_06 ;
  do index=1 to dim(p);
    if StartDate <= intnx('month','01JAN2024'd,index-1) < EndDate then do;
       p[index] = payment;
       c[index] = code;
    end;
    else p[index] =0 ;
  end;
run;

Note it is much easier to enter sample data using a data step then trying to use the overly verbose coding style of SQL.

data have;
  length Member $12 PlanType $3 ReportDate StartDate EndDAte 8
         AdjustmentReason $64 Code $1 Payment 8
  ;
  infile datalines dsd truncover;
  input Member -- Payment;
  informat ReportDate StartDate EndDAte date.;
  format ReportDate StartDate EndDAte yymmdd10.;
datalines ;
123AAABBBC,001,01JAN24,01JAN24,31JAN24,,A,500
123AAABBBC,001,01FEB24,01FEB24,29FEB24,,A,500
123AAABBBC,001,01MAR24,01MAR24,31MAR24,,A,500
123AAABBBC,001,01APR24,01APR24,30APR24,,A,500
123AAABBBC,001,01MAY24,01APR24,30APR24,Retro Disenroll,A,500
123AAABBBC,002,01MAY24,01APR24,30APR24,Retro Enroll,A,500
123AAABBBC,002,01MAY24,01MAY24,31MAY24,,A,500
123AAABBBC,001,01JUN24,01JUN24,30JUN24,,A,500
123AAABBBC,001,01JUN24,01JAN24,31MAR24,Retro Adjustment,B,450
123AAABBBC,001,01JUN24,01JUN24,30JUN24,Retro Adjustment,B,650
;
ballardw
Super User

@wh95 wrote:

I am trying to select the most recent information on a Member and Plan basis, but the payments can be adjusted in future reports, and I would like to select those values. The data is coming from a SQL database, but I have provided sample data here:

 

proc sql;
create table HAVE
	(Member char(10),
	PlanType char(3),
	ReportDate num informat=yymmn6.
			format=yymmn6.,
	StartDate num informat=yymmdd10.
			format=yymmdd10.,
	EndDate num informat=yymmdd10.
			format=yymmdd10.,
	AdjustmentReason char(64),
Code char(1), Payment num); insert into HAVE values('123AAABBBC', '001', '01JAN24'd, '01JAN24'd, '31JAN24'd, '', 'A', 500) values('123AAABBBC', '001', '01FEB24'd, '01FEB24'd, '29FEB24'd, '', 'A', 500) values('123AAABBBC', '001', '01MAR24'd, '01MAR24'd, '31MAR24'd, '', 'A', 500) values('123AAABBBC', '001', '01APR24'd, '01APR24'd, '30APR24'd, '', 'A', 500) values('123AAABBBC', '001', '01MAY24'd, '01APR24'd, '30APR24'd, 'Retro Disenroll', 'A', 500) values('123AAABBBC', '002', '01MAY24'd, '01APR24'd, '30APR24'd, 'Retro Enroll', 'A', 500) values('123AAABBBC', '002', '01MAY24'd, '01MAY24'd, '31MAY24'd, '', 'A', 500) values('123AAABBBC', '001', '01JUN24'd, '01JUN24'd, '30JUN24'd, '', 'A', 500) values('123AAABBBC', '001', '01JUN24'd, '01JAN24'd, '31MAR24'd, 'Retro Adjustment', 'B', 450) values('123AAABBBC', '001', '01JUN24'd, '01JUN24'd, '30JUN24'd, 'Retro Adjustment', 'B', 650); select * from HAVE; quit; run;

 

I have tried to create dummy variables by month and ordered them, so that the most recent value is the closest to the top of the table, but I am struggling to actually select the value I want.

PaymentJan2024 = 0; PaymentFeb2024 = 0; PaymentMar2024 = 0; PaymentApr2024 = 0; PaymentMay2024 = 0; PaymentJun2024 = 0;

length CodeJan2024 CodeFeb2024 CodeMar2024 CodeApr2024 CodeMay2024 CodeJun2024 $1.;

if StartDate <= '01JAN2024'd < EndDate then do; PaymentJan2024 = Payment; CodeJan2024 = Code; end; 
if StartDate <= '01FEB2024'd < EndDate then do; PaymentFeb2024 = Payment; CodeFeb2024 = Code; end;
if StartDate <= '01MAR2024'd < EndDate then do; PaymentMar2024 = Payment; CodeMar2024 = Code; end;
if StartDate <= '01APR2024'd < EndDate then do; PaymentApr2024 = Payment; CodeApr2024 = Code; end;
if StartDate <= '01MAY2024'd < EndDate then do; PaymentMay2024 = Payment; CodeMay2024 = Code; end;
if StartDate <= '01JUN2024'd < EndDate then do; PaymentJun2024 = Payment; CodeJun2024 = Code; end;

 

I believe the end result should look something like the following example, though I am open to feedback and adjustments to make the results understandable. I use '' to indicate no value.

Member | Plan | PaymentJan2024 | CodeJan2024 | PaymentFeb2024 | CodeFeb2024 | PaymentMar2024 | CodeMar2024 | PaymentApr2024 | CodeApr2024 | PaymentMay2024 | CodeMay2024 | PaymentJun2024 | CodeJun2024 | TotalPayment2024

123AAABBBC | 001 | 450 | B | 450 | B | 450 | B | 0 | '' | 0 | '' | 650 | B | 2000

123AAABBBC | 002 | 0 | '' | 0 | '' | 0 | '' | 500 | A | 500 | A | 0 | '' | 1000


If this is true for Startdate then your Enddate better not be any later than 31JAN2024 as it would then be true for ofr the comparison with FEB (and possibly Mar, Apr, ... depending on an actual value of enddate. And the result would have multiple variables with the same values. Is that expected? And just where to Startdate and Enddate come from?

 

Provide enough information to calculate your result based on your values of codes. Provide sufficient description of all the details needed to do such.

if StartDate <= '01JAN2024'd < EndDate then do; PaymentJan2024 = Payment; CodeJan2024 = Code; end; 
wh95
Fluorite | Level 6
StartDate and EndDate refer to the span of dates that the record is affecting. The example of the second to last values input indicates that the record is retroactively affecting the Jan-Mar data.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 571 views
  • 0 likes
  • 4 in conversation