Solved: Re: proc expand, too many warnings causes SAS to error

Steelers_In_DC · Posted 12-09-2014 10:08 AM

I just learned about proc expand, I have the code set up to run and get my desired results but the file seems too big and it's not working. I think my syntax is off.

I have attached what I have and what I'm looking for, followed by my proc expand syntax, followed by the warnings I'm seeing in the log.

I'm first using the proc expand to add missing rows in a time series file, then I'm deleting the missing row and preceding row. If there is a better solution I'm open to that also, but would still like to learn what I'm doing wrong with proc expand.

Thank You Very Much,

Mark

Steelers_In_DC · Posted 12-10-2014 09:44 AM

Arthur and user24feb,

Thank you for your help on this. I was thinking about this last night and using thoughts from both of your solutions came up with another idea. I'm hoping you can poke a hole in this or confirm that this will work too. What are your thoughts on this?

data test;

Input new_date:MMDDYY8. acct_full_acct_id $; Format new_date MMDDYY8.;

Datalines;

10/01/07 A

09/01/07 A

08/01/07 A

06/01/07 A

05/01/07 A

04/01/07 A

10/01/07 B

09/01/07 B

08/01/07 B

06/01/07 B

05/01/07 B

04/01/07 B

;

proc sort data=test;

by acct_full_acct_id descending new_date ;

run;

data testII;

format ldate mmddyy8.;

set test;

by acct_full_acct_id;

ldate = lag(new_date);

if first.acct_full_acct_id then do;

ldate = .;

end;

if intck('month',new_date,ldate) lt 2;

run;

View solution in original post

Steelers_In_DC · Posted 12-09-2014 10:23 AM

I added options nonotes; before and options notes; after the expand and it runs very smoothly but I'm still open to any other suggestions for the task, or corrections for the expand procedure.

user24feb · Posted 12-09-2014 12:03 PM

Try: (Previous version had a typo=

Data test;
Input date:MMDDYY8. Time_Key ID $; Format date MMDDYY8.;
Datalines;
10/01/07 20071031 A
09/01/07 20070930 A
08/01/07 20070831 A
06/01/07 20070630 A
05/01/07 20070531 A
04/01/07 20070430 A
10/01/07 20071031 B
09/01/07 20070930 B
08/01/07 20070831 B
06/01/07 20070630 B
05/01/07 20070531 B
04/01/07 20070430 B
;
Run;

Proc Sort Data=test;
By ID date;
Run;

Proc Expand Data=Test Out=TestII To=Month Method=None;
ID date;
By ID;
Run;

Data TestIII (Drop=Time_Key_L);
Obs=_N_+1;
Set TestII End=Eof;
By ID;
If not Eof Then Set TestII (Keep=Time_Key Rename=(Time_Key=Time_Key_L)) Point=Obs;
Else Call Missing (Time_Key_l);
If Last.ID Then Time_Key_L=1;
If Missing (Time_Key_L) or Missing (Time_Key) Then Delete;
Run;

Steelers_In_DC · Posted 12-09-2014 12:53 PM

In this version the values for id and by are reversed, it works the same but I'm still getting all of the warnings, log to big for sas. If I process too much it errors out with a pop up, no error to the log.

Steelers_In_DC · Posted 12-09-2014 01:03 PM

I was wrong, the option nonotes does not work with the entire file. Same issue, I added:

proc printto log= "location/filename.log";

run;

This works but I'd much rather have another solution.

user24feb · Posted 12-09-2014 01:14 PM

Wait. Could it be that simply need to fill in the "Time_Key" variable. If I run the program below, I get the same Warnings you get. I think you need to insert the modify step, then it sould work:

Data test;
   Input date:MMDDYY8. Time_Key ID $; Format date MMDDYY8.;
   Datalines;
10/01/07 . A
09/01/07 . A
10/01/07 20071031 B
09/01/07 20070930 B
08/01/07 20070831 B
06/01/07 20070630 B
05/01/07 20070531 B
04/01/07 20070430 B
;
Run;
/*
Data test;
Modify test;
If Missing (Time_Key) Then Do;
    Time_Key=IntNx('month',Date,1,'same')-1;
    Time_Key=Year(Time_Key)*10000+Month(Time_key)*100+Day(Time_Key); * this depends if time_key is a number or a date;
End;
Run;
*/
Proc Sort Data=test;
By ID date;
Run;

Proc Expand Data=Test Out=TestII To=Month Method=None;
ID date;
By ID;
Run;

WARNING: The variable Time_Key has only 0 nonmissing observations, which is too few to apply the conversion method. The result
series is set to missing.

Steelers_In_DC · Posted 12-09-2014 01:29 PM

the time_key doesn't have any missing values, there are many missing values in this table but none for the id or the date. Because of the word method I am looking up what 'method=' options there are, That's just a guess though.

user24feb · Posted 12-09-2014 03:03 PM

I guess the solution has already been posted. The only other way I could think about is to create a complete "date-dataset" then merge it with the original data and finally remove the missing values and their predecessor. But of course this is much longer than the solution above.

Data test;
Input date:MMDDYY8. Time_Key ID $ ; Format date MMDDYY8.;
Datalines;
12/01/07 20071031 A
09/01/07 20070930 A
08/01/07 20070831 A
06/01/07 20070630 A
05/01/07 20070531 A
04/01/07 20070430 A
10/01/07 20071031 B
09/01/07 20070930 B
08/01/07 20070831 B
06/01/07 20070630 B
05/01/07 20070531 B
04/01/07 20070430 B
;
Run;

Proc Sort Data=Test;
By ID Date;
Run;

Data TestII;
Format First_Last Date9. ;
Do i=1 By 1 Until (Last.ID);
Set Test;
By ID;
If First.ID Then Do; First_Last=Date; Output; End;
Else If Last.ID Then Do; First_Last=Date; Output; End;
End;
Run;

Proc Sort Data=TestII;
By ID First_Last;
Run;

Proc Expand Data=TestII Out=TestIII (Keep=First_Last ID Rename=(First_Last=Date)) To=Month Method=None;
ID First_Last;
By ID;
Run;

Data Test;
Merge Test TestIII (in=in3);
By ID Date;
Run;

Data TestIV (Drop=Time_Key_L);
Obs=_N_+1;
Set Test End=Eof;
By ID;
If not Eof Then Set Test (Keep=Time_Key Rename=(Time_Key=Time_Key_L)) Point=Obs;
Else Call Missing (Time_Key_l);
If Last.ID Then Time_Key_L=1;
If Missing (Time_Key_L) or Missing (Time_Key) Then Delete;
Run;

art297 · Posted 12-09-2014 02:49 PM

If I correctly understand what you are trying to do, then I think that proc expand might be overkill. Does the following accomplish what you want?:

data test;

Input date:MMDDYY8. Time_Key ID $; Format date MMDDYY8.;

Datalines;

10/01/07 20071031 A

09/01/07 20070930 A

08/01/07 20070831 A

06/01/07 20070630 A

05/01/07 20070531 A

04/01/07 20070430 A

10/01/07 20071031 B

09/01/07 20070930 B

08/01/07 20070831 B

06/01/07 20070630 B

05/01/07 20070531 B

04/01/07 20070430 B

;

proc sort data=test;

by ID date;

run;

data want (drop=next_date);

set test;

by id;

set test ( firstobs = 2 keep = date rename = (date = next_date) )

test ( obs = 1 drop = _all_ );

next_date = ifn( last.ID, (.), next_date );

if intck('month',date,next_date) lt 2;

run;

Steelers_In_DC · Posted 12-09-2014 04:17 PM

Arthur, That is nothing I would have ever come up with, but yes, that is exactly what I want. Put together very nicely.

Thank You,

Steelers_In_DC · Posted 12-10-2014 09:44 AM

Arthur and user24feb,

Thank you for your help on this. I was thinking about this last night and using thoughts from both of your solutions came up with another idea. I'm hoping you can poke a hole in this or confirm that this will work too. What are your thoughts on this?

data test;

Input new_date:MMDDYY8. acct_full_acct_id $; Format new_date MMDDYY8.;

Datalines;

10/01/07 A

09/01/07 A

08/01/07 A

06/01/07 A

05/01/07 A

04/01/07 A

10/01/07 B

09/01/07 B

08/01/07 B

06/01/07 B

05/01/07 B

04/01/07 B

;

proc sort data=test;

by acct_full_acct_id descending new_date ;

run;

data testII;

format ldate mmddyy8.;

set test;

by acct_full_acct_id;

ldate = lag(new_date);

if first.acct_full_acct_id then do;

ldate = .;

end;

if intck('month',new_date,ldate) lt 2;

run;

art297 · Posted 12-10-2014 10:11 AM

I can't poke any holes in it and it is more efficient than the solution I had proposed.

You could simplify the code a tad. e.g.:

data testII (drop=ldate);

set test;

by acct_full_acct_id;

ldate = ifn(first.acct_full_acct_id,.,lag(new_date));

if intck('month',new_date,ldate) lt 2;

run;

Steelers_In_DC · Posted 12-10-2014 10:25 AM

Awesome, Thank you for checking. I like the ifn/ifc function that is new to me. I always appreciate your thoughtful answers.

Thanks!

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away