BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Steelers_In_DC
Barite | Level 11

I just learned about proc expand, I have the code set up to run and get my desired results but the file seems too big and it's not working.  I think my syntax is off.

I have attached what I have and what I'm looking for, followed by my proc expand syntax, followed by the warnings I'm seeing in the log.

I'm first using the proc expand to add missing rows in a time series file, then I'm deleting the missing row and preceding row.  If there is a better solution I'm open to that also, but would still like to learn what I'm doing wrong with proc expand.

Thank You Very Much,

Mark

1 ACCEPTED SOLUTION

Accepted Solutions
Steelers_In_DC
Barite | Level 11

Arthur and user24feb,

Thank you for your help on this.  I was thinking about this last night and using thoughts from both of your solutions came up with another idea.  I'm hoping you can poke a hole in this or confirm that this will work too.  What are your thoughts on this?

data test;

  Input new_date:MMDDYY8. acct_full_acct_id $; Format new_date MMDDYY8.;

  Datalines;

10/01/07 A

09/01/07 A

08/01/07 A

06/01/07 A

05/01/07 A

04/01/07 A

10/01/07 B

09/01/07 B

08/01/07 B

06/01/07 B

05/01/07 B

04/01/07 B

;

proc sort data=test;

  by acct_full_acct_id descending new_date ;

run;

data testII;

format ldate mmddyy8.;

set test;

by acct_full_acct_id;

ldate = lag(new_date);

if first.acct_full_acct_id then do;

ldate = .;

end;

if intck('month',new_date,ldate) lt 2;

run;

View solution in original post

12 REPLIES 12
Steelers_In_DC
Barite | Level 11

I added options nonotes; before and options notes; after the expand and it runs very smoothly but I'm still open to any other suggestions for the task, or corrections for the expand procedure.

user24feb
Barite | Level 11

Try: (Previous version had a typo=


Data test;
   Input date:MMDDYY8. Time_Key ID $; Format date MMDDYY8.;
   Datalines;
10/01/07 20071031 A
09/01/07 20070930 A
08/01/07 20070831 A
06/01/07 20070630 A
05/01/07 20070531 A
04/01/07 20070430 A
10/01/07 20071031 B
09/01/07 20070930 B
08/01/07 20070831 B
06/01/07 20070630 B
05/01/07 20070531 B
04/01/07 20070430 B
;
Run;

Proc Sort Data=test;
  By ID date;
Run;

Proc Expand Data=Test Out=TestII To=Month Method=None;
  ID date;
  By ID;
Run;

Data TestIII (Drop=Time_Key_L);
  Obs=_N_+1;
  Set TestII End=Eof;
  By ID;
  If not Eof Then Set TestII (Keep=Time_Key Rename=(Time_Key=Time_Key_L)) Point=Obs;
  Else Call Missing (Time_Key_l);
  If Last.ID Then Time_Key_L=1;
  If Missing (Time_Key_L) or Missing (Time_Key) Then Delete;
Run;

Steelers_In_DC
Barite | Level 11

In this version the values for id and by are reversed, it works the same but I'm still getting all of the warnings, log to big for sas.  If I process too much it errors out with a pop up, no error to the log.

Steelers_In_DC
Barite | Level 11

I was wrong, the option nonotes does not work with the entire file.  Same issue, I added:

proc printto log= "location/filename.log";

run;

This works but I'd much rather have another solution.

user24feb
Barite | Level 11

Wait. Could it be that simply need to fill in the "Time_Key" variable. If I run the program below, I get the same Warnings you get. I think you need to insert the modify step, then it sould work:


Data test;
   Input date:MMDDYY8. Time_Key ID $; Format date MMDDYY8.;
   Datalines;
10/01/07 . A
09/01/07 . A
10/01/07 20071031 B
09/01/07 20070930 B
08/01/07 20070831 B
06/01/07 20070630 B
05/01/07 20070531 B
04/01/07 20070430 B
;
Run;
/*
Data test;
  Modify test;
  If Missing (Time_Key) Then Do;
    Time_Key=IntNx('month',Date,1,'same')-1;
    Time_Key=Year(Time_Key)*10000+Month(Time_key)*100+Day(Time_Key); * this depends if time_key is a number or a date;
  End;
Run;
*/
Proc Sort Data=test;
  By ID date;
Run;

Proc Expand Data=Test Out=TestII To=Month Method=None;
  ID date;
  By ID;
Run;


WARNING: The variable Time_Key has only 0 nonmissing observations, which is too few to apply the conversion method. The result
         series is set to missing.

Steelers_In_DC
Barite | Level 11

the time_key doesn't have any missing values, there are many missing values in this table but none for the id or the date.   Because of the word method I am looking up what 'method=' options there are, That's just a guess though.

user24feb
Barite | Level 11

I guess the solution has already been posted. The only other way I could think about is to create a complete "date-dataset" then merge it with the original data and finally remove the missing values and their predecessor. But of course this is much longer than the solution above.


Data test;
   Input date:MMDDYY8. Time_Key ID $ ; Format date MMDDYY8.;
   Datalines;
12/01/07 20071031 A
09/01/07 20070930 A
08/01/07 20070831 A
06/01/07 20070630 A
05/01/07 20070531 A
04/01/07 20070430 A
10/01/07 20071031 B
09/01/07 20070930 B
08/01/07 20070831 B
06/01/07 20070630 B
05/01/07 20070531 B
04/01/07 20070430 B
;
Run;

Proc Sort Data=Test;
  By ID Date;
Run;

Data TestII;
  Format First_Last Date9. ;
  Do i=1 By 1  Until (Last.ID);
    Set Test;
By ID;
If First.ID Then Do; First_Last=Date; Output; End;
Else If Last.ID Then Do; First_Last=Date; Output; End;
  End;
Run;

Proc Sort Data=TestII;
  By ID First_Last;
Run;

Proc Expand Data=TestII Out=TestIII (Keep=First_Last ID Rename=(First_Last=Date)) To=Month Method=None;
  ID First_Last;
  By ID;
Run;

Data Test;
  Merge Test TestIII (in=in3);
  By ID Date;
Run;

Data TestIV (Drop=Time_Key_L);
  Obs=_N_+1;
  Set Test End=Eof;
  By ID;
  If not Eof Then Set Test (Keep=Time_Key Rename=(Time_Key=Time_Key_L)) Point=Obs;
  Else Call Missing (Time_Key_l);
  If Last.ID Then Time_Key_L=1;
  If Missing (Time_Key_L) or Missing (Time_Key) Then Delete;
Run;

art297
Opal | Level 21

If I correctly understand what you are trying to do, then I think that proc expand might be overkill. Does the following accomplish what you want?:

data test;

  Input date:MMDDYY8. Time_Key ID $; Format date MMDDYY8.;

  Datalines;

10/01/07 20071031 A

09/01/07 20070930 A

08/01/07 20070831 A

06/01/07 20070630 A

05/01/07 20070531 A

04/01/07 20070430 A

10/01/07 20071031 B

09/01/07 20070930 B

08/01/07 20070831 B

06/01/07 20070630 B

05/01/07 20070531 B

04/01/07 20070430 B

;

proc sort data=test;

  by ID date;

run;

data want (drop=next_date);

  set test;

  by id;

  set test ( firstobs = 2 keep = date rename = (date = next_date) )

      test (      obs = 1 drop = _all_ );

    next_date = ifn(  last.ID, (.), next_date );

  if intck('month',date,next_date) lt 2;

run;

Steelers_In_DC
Barite | Level 11

Arthur, That is nothing I would have ever come up with, but yes, that is exactly what I want.  Put together very nicely. 

Thank You,

Steelers_In_DC
Barite | Level 11

Arthur and user24feb,

Thank you for your help on this.  I was thinking about this last night and using thoughts from both of your solutions came up with another idea.  I'm hoping you can poke a hole in this or confirm that this will work too.  What are your thoughts on this?

data test;

  Input new_date:MMDDYY8. acct_full_acct_id $; Format new_date MMDDYY8.;

  Datalines;

10/01/07 A

09/01/07 A

08/01/07 A

06/01/07 A

05/01/07 A

04/01/07 A

10/01/07 B

09/01/07 B

08/01/07 B

06/01/07 B

05/01/07 B

04/01/07 B

;

proc sort data=test;

  by acct_full_acct_id descending new_date ;

run;

data testII;

format ldate mmddyy8.;

set test;

by acct_full_acct_id;

ldate = lag(new_date);

if first.acct_full_acct_id then do;

ldate = .;

end;

if intck('month',new_date,ldate) lt 2;

run;

art297
Opal | Level 21

I can't poke any holes in it and it is more efficient than the solution I had proposed.

You could simplify the code a tad. e.g.:

data testII (drop=ldate);

  set test;

  by acct_full_acct_id;

  ldate = ifn(first.acct_full_acct_id,.,lag(new_date));

  if intck('month',new_date,ldate) lt 2;

run;

Steelers_In_DC
Barite | Level 11

Awesome,  Thank you for checking.  I like the ifn/ifc function that is new to me.  I always appreciate your thoughtful answers.

Thanks!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 3956 views
  • 6 likes
  • 3 in conversation