BookmarkSubscribeRSS Feed
catkat96
Obsidian | Level 7

I'm trying to create a table from information on another table. So, on the first one, let's call it Data1, I have panel data (many observations/individuals for many days).

On the second one, which I called "DataOutcome", I am trying to get one row per observation, and summarize something on the other table: if the last date of an observation is later than some day, then a variable will take value 0, or 1 in any other case. 

This is my code so far, it just gets the id's from another table and creates the date that would define this new variable's outcome.  

 

Apologies if this is a very bad question, I'm quite new at SAS/SQL. 

Here's a data snippet:

 

data OriginalData;
   infile datalines delimiter=',';
   input date $ agreement_number $ outcome;
   datalines;
05/11/2020,1000,Answered
06/11/2020,1001,Answered
08/11/2020,1002,Not Answered
08/11/2020,1003,Answered
;
data Data1;
   infile datalines delimiter=',';
   input agreement_number $ date;
   datalines;
1000, 06/11/2020
1000, 07/11/2020
1001, 06/11/2020
1001, 07/11/2020
1001, 08/11/2020
1001, 09/11/2020
1002, 08/11/2020
1003, 08/11/2020
1003, 08/11/2020
;

 

 

proc SQL;
create table DataOutcome as
select Agreement_Number,
	Date,
Outcome, intnx("day",Date,2) as new_date format=ddmmyy11. from work.OriginalData where outcome = "Answered";
/* for each Agreement_Number
if the last date in Data1 is between Date and new_date
New_var = 1 */ Group by 1; Quit;

 

6 REPLIES 6
maguiremq
SAS Super FREQ

Please do provide a data snippet using DATALINES. This seems like it would be done more efficiently in a DATA step, but I can't really tell. I don't see a join condition anywhere in your code, either.

catkat96
Obsidian | Level 7

Included a datalines snippet on the original post. 

I'm not using joins because I don't really need to join anything (I believe), I'm creating a completely new table... but perhaps there is a use for JOIN here that I'm not aware of.

utrocketeng
Quartz | Level 8

if i understand your objective correctly, i believe this code should do the trick:

data OriginalData;
   infile datalines delimiter=',';
   input date :ddmmyy10. agreement_number $ outcome $ ;
   format date ddmmyy10.;
   datalines;
05/11/2020,1000,Answered
06/11/2020,1001,Answered
08/11/2020,1002,Not Answered
08/11/2020,1003,Answered
;

data Data1;
   infile datalines delimiter=',';
   input agreement_number $ date :ddmmyy10.;
   format date ddmmyy10.;
   datalines;
1000, 06/11/2020
1000, 07/11/2020
1001, 06/11/2020
1001, 07/11/2020
1001, 08/11/2020
1001, 09/11/2020
1002, 08/11/2020
1003, 08/11/2020
1003, 08/11/2020
;

PROC SQL;
CREATE TABLE work.dataMaxDate AS SELECT agreement_number, Max(Date) as MostRecentDate FROM work.data1 GROUP BY agreement_number;
CREATE TABLE work.final AS
SELECT 
	a.*
	,CASE when b.MostRecentDate is not null then 1 else 0 end as New_Var
FROM 
	originaldata a
	LEFT JOIN work.dataMaxDate b on a.agreement_number = b.agreement_number and b.MostRecentDate >= a.date and b.MostRecentDate < a.date+2
;
QUIT;

note that you want to look at the work.final table.

 

good luck

catkat96
Obsidian | Level 7

Hi, thanks for your reply! 

I think it's kinda doing what I want sans the filtering by "Answered". I tried adding that but I can't get it to work, can you tell me where am I mistaken here?

 

PROC SQL;
CREATE TABLE work.dataMaxDate AS SELECT agreement_number, Max(Date) as MostRecentDate FROM work.data1 GROUP BY agreement_number;
CREATE TABLE work.final AS
SELECT 
	a.*
	,CASE when b.MostRecentDate is not null then 1 else 0 end as New_Var
FROM 
	originaldata a
	LEFT JOIN work.dataMaxDate (WHERE = (outcome = "Answered")) b on a.agreement_number = b.agreement_number and b.MostRecentDate >= a.date and b.MostRecentDate < a.date+2
;
QUIT;

 

maguiremq
SAS Super FREQ

I still think we're missing something. This code doesn't capture exactly what you put in your "OriginalData", but I don't know if you intend to count duplicate dates as "Answered". Some clarification there might help. You could also use PROC TRANSPOSE, I think. Nonetheless, PROC SQL:

 

proc sql;
	create table 	want as
		select		
					agreement_number,
					min(date) as min_date
						format = mmddyy10.,
					case
						when calculated min_date = max(date) then "Not Answered"
						else "Answered"
					end as outcome
		from		
					data1
		group by
					agreement_number
		having
					agreement_number = min(agreement_number);
quit;

I highly recommend that you format your code. It helps spot errors more easily, like your semicolon after your WHERE clause in your query. Some people probably think my formatting is a little overkill, but consistency is key. Edit: it looks as though the "OriginalData" and "Data1" don't have the same dates. I hope I'm not missing something obvious.

 

Thank you providing some example data.

catkat96
Obsidian | Level 7

Hi, thanks for the reply!

I think I might have not explained my problem properly. I have both OriginalData and Data1 already, I want to create a new table mixing info from both: I want to

-check OriginalData to identify those agreements that are "Answered"

-identify the date when that happened

-identify what is two days after that date (date + 2)

-check on the Data1 if those agreements' last date is larger than date + 2 (then assign value 0) or is between date and date + 2 (then assign value 1).

 

Is there anything I'm forgetting to make it clear?

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1328 views
  • 1 like
  • 3 in conversation