I just learned about proc expand, I have the code set up to run and get my desired results but the file seems too big and it's not working. I think my syntax is off.
I have attached what I have and what I'm looking for, followed by my proc expand syntax, followed by the warnings I'm seeing in the log.
I'm first using the proc expand to add missing rows in a time series file, then I'm deleting the missing row and preceding row. If there is a better solution I'm open to that also, but would still like to learn what I'm doing wrong with proc expand.
Thank You Very Much,
Mark
Arthur and user24feb,
Thank you for your help on this. I was thinking about this last night and using thoughts from both of your solutions came up with another idea. I'm hoping you can poke a hole in this or confirm that this will work too. What are your thoughts on this?
data test;
Input new_date:MMDDYY8. acct_full_acct_id $; Format new_date MMDDYY8.;
Datalines;
10/01/07 A
09/01/07 A
08/01/07 A
06/01/07 A
05/01/07 A
04/01/07 A
10/01/07 B
09/01/07 B
08/01/07 B
06/01/07 B
05/01/07 B
04/01/07 B
;
proc sort data=test;
by acct_full_acct_id descending new_date ;
run;
data testII;
format ldate mmddyy8.;
set test;
by acct_full_acct_id;
ldate = lag(new_date);
if first.acct_full_acct_id then do;
ldate = .;
end;
if intck('month',new_date,ldate) lt 2;
run;
I added options nonotes; before and options notes; after the expand and it runs very smoothly but I'm still open to any other suggestions for the task, or corrections for the expand procedure.
Try: (Previous version had a typo=
Data test;
Input date:MMDDYY8. Time_Key ID $; Format date MMDDYY8.;
Datalines;
10/01/07 20071031 A
09/01/07 20070930 A
08/01/07 20070831 A
06/01/07 20070630 A
05/01/07 20070531 A
04/01/07 20070430 A
10/01/07 20071031 B
09/01/07 20070930 B
08/01/07 20070831 B
06/01/07 20070630 B
05/01/07 20070531 B
04/01/07 20070430 B
;
Run;
Proc Sort Data=test;
By ID date;
Run;
Proc Expand Data=Test Out=TestII To=Month Method=None;
ID date;
By ID;
Run;
Data TestIII (Drop=Time_Key_L);
Obs=_N_+1;
Set TestII End=Eof;
By ID;
If not Eof Then Set TestII (Keep=Time_Key Rename=(Time_Key=Time_Key_L)) Point=Obs;
Else Call Missing (Time_Key_l);
If Last.ID Then Time_Key_L=1;
If Missing (Time_Key_L) or Missing (Time_Key) Then Delete;
Run;
In this version the values for id and by are reversed, it works the same but I'm still getting all of the warnings, log to big for sas. If I process too much it errors out with a pop up, no error to the log.
I was wrong, the option nonotes does not work with the entire file. Same issue, I added:
proc printto log= "location/filename.log";
run;
This works but I'd much rather have another solution.
Wait. Could it be that simply need to fill in the "Time_Key" variable. If I run the program below, I get the same Warnings you get. I think you need to insert the modify step, then it sould work:
Data test;
Input date:MMDDYY8. Time_Key ID $; Format date MMDDYY8.;
Datalines;
10/01/07 . A
09/01/07 . A
10/01/07 20071031 B
09/01/07 20070930 B
08/01/07 20070831 B
06/01/07 20070630 B
05/01/07 20070531 B
04/01/07 20070430 B
;
Run;
/*
Data test;
Modify test;
If Missing (Time_Key) Then Do;
Time_Key=IntNx('month',Date,1,'same')-1;
Time_Key=Year(Time_Key)*10000+Month(Time_key)*100+Day(Time_Key); * this depends if time_key is a number or a date;
End;
Run;
*/
Proc Sort Data=test;
By ID date;
Run;
Proc Expand Data=Test Out=TestII To=Month Method=None;
ID date;
By ID;
Run;
WARNING: The variable Time_Key has only 0 nonmissing observations, which is too few to apply the conversion method. The result
series is set to missing.
the time_key doesn't have any missing values, there are many missing values in this table but none for the id or the date. Because of the word method I am looking up what 'method=' options there are, That's just a guess though.
I guess the solution has already been posted. The only other way I could think about is to create a complete "date-dataset" then merge it with the original data and finally remove the missing values and their predecessor. But of course this is much longer than the solution above.
Data test;
Input date:MMDDYY8. Time_Key ID $ ; Format date MMDDYY8.;
Datalines;
12/01/07 20071031 A
09/01/07 20070930 A
08/01/07 20070831 A
06/01/07 20070630 A
05/01/07 20070531 A
04/01/07 20070430 A
10/01/07 20071031 B
09/01/07 20070930 B
08/01/07 20070831 B
06/01/07 20070630 B
05/01/07 20070531 B
04/01/07 20070430 B
;
Run;
Proc Sort Data=Test;
By ID Date;
Run;
Data TestII;
Format First_Last Date9. ;
Do i=1 By 1 Until (Last.ID);
Set Test;
By ID;
If First.ID Then Do; First_Last=Date; Output; End;
Else If Last.ID Then Do; First_Last=Date; Output; End;
End;
Run;
Proc Sort Data=TestII;
By ID First_Last;
Run;
Proc Expand Data=TestII Out=TestIII (Keep=First_Last ID Rename=(First_Last=Date)) To=Month Method=None;
ID First_Last;
By ID;
Run;
Data Test;
Merge Test TestIII (in=in3);
By ID Date;
Run;
Data TestIV (Drop=Time_Key_L);
Obs=_N_+1;
Set Test End=Eof;
By ID;
If not Eof Then Set Test (Keep=Time_Key Rename=(Time_Key=Time_Key_L)) Point=Obs;
Else Call Missing (Time_Key_l);
If Last.ID Then Time_Key_L=1;
If Missing (Time_Key_L) or Missing (Time_Key) Then Delete;
Run;
If I correctly understand what you are trying to do, then I think that proc expand might be overkill. Does the following accomplish what you want?:
data test;
Input date:MMDDYY8. Time_Key ID $; Format date MMDDYY8.;
Datalines;
10/01/07 20071031 A
09/01/07 20070930 A
08/01/07 20070831 A
06/01/07 20070630 A
05/01/07 20070531 A
04/01/07 20070430 A
10/01/07 20071031 B
09/01/07 20070930 B
08/01/07 20070831 B
06/01/07 20070630 B
05/01/07 20070531 B
04/01/07 20070430 B
;
proc sort data=test;
by ID date;
run;
data want (drop=next_date);
set test;
by id;
set test ( firstobs = 2 keep = date rename = (date = next_date) )
test ( obs = 1 drop = _all_ );
next_date = ifn( last.ID, (.), next_date );
if intck('month',date,next_date) lt 2;
run;
Arthur, That is nothing I would have ever come up with, but yes, that is exactly what I want. Put together very nicely.
Thank You,
Arthur and user24feb,
Thank you for your help on this. I was thinking about this last night and using thoughts from both of your solutions came up with another idea. I'm hoping you can poke a hole in this or confirm that this will work too. What are your thoughts on this?
data test;
Input new_date:MMDDYY8. acct_full_acct_id $; Format new_date MMDDYY8.;
Datalines;
10/01/07 A
09/01/07 A
08/01/07 A
06/01/07 A
05/01/07 A
04/01/07 A
10/01/07 B
09/01/07 B
08/01/07 B
06/01/07 B
05/01/07 B
04/01/07 B
;
proc sort data=test;
by acct_full_acct_id descending new_date ;
run;
data testII;
format ldate mmddyy8.;
set test;
by acct_full_acct_id;
ldate = lag(new_date);
if first.acct_full_acct_id then do;
ldate = .;
end;
if intck('month',new_date,ldate) lt 2;
run;
I can't poke any holes in it and it is more efficient than the solution I had proposed.
You could simplify the code a tad. e.g.:
data testII (drop=ldate);
set test;
by acct_full_acct_id;
ldate = ifn(first.acct_full_acct_id,.,lag(new_date));
if intck('month',new_date,ldate) lt 2;
run;
Awesome, Thank you for checking. I like the ifn/ifc function that is new to me. I always appreciate your thoughtful answers.
Thanks!
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.