BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
LzEr23
Obsidian | Level 7

Dear all,

 

The subject I wrote seem a bit unclear; however, what I want to ask is very simple.

 

What I'm trying to do is to cut a certain period of dataset with the same number of observations, around a certain reference point.

For example, the following is how my dataset looks:

 

data work.item_output;
   infile datalines truncover;
   input date: yymmddn8. volume;
   ;
   datalines;
    20190101 30;
    20190103 40;
    20190104 25;
...
    20190302 20;
    20190304 20;
    20190706 10;     

 

What I want to do is to take a reference date, say, 20190203, and extract the data containing of observations around that date, for example (-5 days) ~ (+5 days). And repeat this process for other reference dates as well.

 

But, as you can notice, the date variable is not always present. For example, there are 20190101 and 20190103 observations, but there isn't 20190102, and I want to count simply by the number of observations, not by the actual date.

 

So, for example, if I take +5 days data from 20190101, it would look something like this:

20190101 -> reference point

20190103

20190104

20190105

20190106

20190107  -> +5 days after the reference point

 

So, by saying '+5 days,' I'm actually meaning '+5 observations.'

And I want to extract that certain period of data as a separate dataset table in sas.

 

For now, since I don't know how to make sas do this job automatically,

I manually opened the original table, looked for the reference points, looked for the date that's 5 observations away from the reference point, copied that and extract that period of data manually by using data procedure.

However, it is too time-consuming and burdensome because there are more than just one or two reference periods.

So, I'd be great to know how to code this to work automatically (at least less manually than what I did), if there is a way to do so.

 

I hope what I wrote made sense.

It'd be great if anyone can help me out with this.

1 ACCEPTED SOLUTION

Accepted Solutions
hashman
Ammonite | Level 13

@PGStats:

Neato! Yet since you've mentioned efficiency, why not eschew the writing and reading of POS altogether and do it all in a single step? E.g.:

 

data have ;                                                    
  input date: yymmdd8. volume;                                 
  format date yymmdd10.;                                       
  cards ;                                                      
20190101 30                                                    
20190103 40                                                    
20190104 25                                                    
20190105 99                                                    
20190301 99                                                    
20190302 20                                                    
20190304 20                                                    
20190706 10                                                    
run ;                                                          
                                                               
data ref ;                                                     
  format date yymmdd10.;                                       
  input date yymmdd10.;                                        
  cards ;                                                      
20190103                                                       
20190304                                                       
run ;                                                          
                                                               
%let w = 1 ;                                                   
                                                               
data want ;                                                    
  merge have ref (in = ref) ;                                  
  by date ;                                                    
  if ref then do _n_ = max (1, _n_ - &w) to min (n, _n_ + &w) ;
    set have point = _n_ nobs = n ;                            
    output ;                                                   
  end ;                                                        
run ;                                                          

The above assumes, of course, that REF is sorted by DATE. If not:

 

 

data want ;                                                              
  if _n_ = 1 then do ;                                                   
    dcl hash h (dataset: "ref") ;                                        
    h.definekey ("date") ;                                               
    h.definedone () ;                                                    
  end ;                                                                  
  set have ;                                                             
  if h.check() = 0 then do _n_ = max (1, _n_ - &w) to min (n, _n_ + &w) ;
    set have point = _n_ nobs = n ;                                      
    output ;                                                             
  end ;                                                                  
run ;                                                                    

As a side note, I've noticed that your code works when the data set option RENAME with a single variable being renamed is coded without parentheses as:

 

pos(rename=date=refDate)  

 

I didn't know SAS can swallow that; but now I do. Thanks.

 

Kind regards

Paul D.

 

 

 

 

View solution in original post

7 REPLIES 7
Astounding
PROC Star
Let's begin by assuming you have a SAS data set holding a set of values for REFDATE. Then you could use:

data want;
set ref_point_list;
found=0;
do _n_=1 to ndates;
set item_output nobs=ndates point=_n_ ;
if date > refdate then found + 1;
if (1 <= found <= 5) then output;
else if found = 6 then delete;
end;
run;
hashman
Ammonite | Level 13

@LzEr23:

If you mean to get meaningful assistance, please provide:

 

- a representative sample data set HAVE

- a sample data set containing reference dates REF

- a sample data set WANT illustrating the output you expect from processing HAVE and REF

 

To anyone looking at your problem, it would mean a whole lot more than verbose explanations of your input and output, though a terse meaningful intelligible explanation is always welcome.

 

Kind regards

Paul D.

 

 

PGStats
Opal | Level 21

Here is an efficient way to do this (for a window of + or - 1 obs):

 

data have;
input date: yymmdd8. volume;
format date yymmdd10.;
datalines;     
20190101 30     
20190103 40     
20190104 25  
20190105 99  
20190301 99     
20190302 20     
20190304 20     
20190706 10
;

data ref;
format date yymmdd10.;
input date yymmdd10.;
datalines;
20190103
20190304
;

data pos;
merge have ref(in=ref); by date;
if ref then do;
    pos = _n_;
    output;
    end;
keep date pos;
run;

%let window=1;

data want;
set pos(rename=date=refDate);
do point = max(1, pos - &window) to min(nobs, pos + &window);
    set have point=point nobs=nobs;
    output;
    end;
drop pos;
run;
PG
hashman
Ammonite | Level 13

@PGStats:

Neato! Yet since you've mentioned efficiency, why not eschew the writing and reading of POS altogether and do it all in a single step? E.g.:

 

data have ;                                                    
  input date: yymmdd8. volume;                                 
  format date yymmdd10.;                                       
  cards ;                                                      
20190101 30                                                    
20190103 40                                                    
20190104 25                                                    
20190105 99                                                    
20190301 99                                                    
20190302 20                                                    
20190304 20                                                    
20190706 10                                                    
run ;                                                          
                                                               
data ref ;                                                     
  format date yymmdd10.;                                       
  input date yymmdd10.;                                        
  cards ;                                                      
20190103                                                       
20190304                                                       
run ;                                                          
                                                               
%let w = 1 ;                                                   
                                                               
data want ;                                                    
  merge have ref (in = ref) ;                                  
  by date ;                                                    
  if ref then do _n_ = max (1, _n_ - &w) to min (n, _n_ + &w) ;
    set have point = _n_ nobs = n ;                            
    output ;                                                   
  end ;                                                        
run ;                                                          

The above assumes, of course, that REF is sorted by DATE. If not:

 

 

data want ;                                                              
  if _n_ = 1 then do ;                                                   
    dcl hash h (dataset: "ref") ;                                        
    h.definekey ("date") ;                                               
    h.definedone () ;                                                    
  end ;                                                                  
  set have ;                                                             
  if h.check() = 0 then do _n_ = max (1, _n_ - &w) to min (n, _n_ + &w) ;
    set have point = _n_ nobs = n ;                                      
    output ;                                                             
  end ;                                                                  
run ;                                                                    

As a side note, I've noticed that your code works when the data set option RENAME with a single variable being renamed is coded without parentheses as:

 

pos(rename=date=refDate)  

 

I didn't know SAS can swallow that; but now I do. Thanks.

 

Kind regards

Paul D.

 

 

 

 

PGStats
Opal | Level 21

I would add that your hash based code is more robust to missing and duplicate dates in either dataset.

 

Cheers!

PG
hashman
Ammonite | Level 13

@PGStats:

Agree. In fact, any lookup table approach (hash table, key-indexed table, bitmap, binary-searched array, etc.) vs the sorted serial match approach (aka merge) has the advantages you've pointed out.

 

Gaudeamus igitur!

Paul D.    

LzEr23
Obsidian | Level 7
Sorry for the delay. I forgot I posted this here. I've tried your way, just to see if it works, and it worked perfectly. Thank you for your help! It really was helpful.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 788 views
  • 4 likes
  • 4 in conversation