BookmarkSubscribeRSS Feed
deleted_user
Not applicable
I have a data set that has cross-sectional (state level) and time-series (quarterly) elements. It is arranged as follows:
Date State CPI
1976:1 CA 100
1976:2 CA 101
1976:3 CA 102
1976:4 CA 103
1977:1 CA 104
1977:3 CA 106
1977:4 CA 107
1976:1 PA 100
1976:3 PA 102
1976:4 PA 103
1977:1 PA 104
1977:3 PA 106
1977:4 PA 107

Note there are missing observations for both states (1977:2 for CA and 1976:2 and 1976:3 for PA). I would like to create a lag variable that is the "true lag" for each observation (and not just the previous value in the input stream). Note the first observation of the lagged variable for each state would be missing as well as the lag for the 1977:3 for CA and 1976:3 and 1977:3 for PA). The resulting dataset should look as follows:
Date State CPI CPI_l1
1976:1 CA 100 .
1976:2 CA 101 100
1976:3 CA 102 101
1976:4 CA 103 102
1977:1 CA 104 103
1977:3 CA 106 .
1977:4 CA 107 106
1976:1 PA 100 .
1976:3 PA 102 .
1976:4 PA 103 102
1977:1 PA 104 103
1977:3 PA 106 .
1977:4 PA 107 106

Is there an easy way to do this?

Thank you
7 REPLIES 7
ArtC
Rhodochrosite | Level 12
One way is to first sparse the data. You can do this by creating a template of all combinations of state and date (I hate using the variable name date for non-sas dates so I have used YRQTR here).
[pre]* Build template of data;
data allvals(keep=yrqtr state);
do state = 'CA', 'PA';
do year = 1976 to 1977 by 1;
do qtr = 1 to 4;
yrqtr = cats(put(year,4.),':',put(qtr,1.));
output allvals;
end;
end;
end;
run;[/pre]
The resulting dataset WORK.ALLVARS can now be merged or joined onto your original data. Now working with the sparsed data (like with the LAG function) will be a lot easier.
deleted_user
Not applicable
Thank you for the advice--I really like the idea that you have suggested, unfortunately the implementation is a little more difficult than the example I gave. I actually have a dataset that has more than 1.5 million observations. There are approximately 20,000 cross-sectional components (i.e. "states"). Some cross-sectional elements may have more time-series elements (quarters of data) than other cross-sectional elements. Can you think of a way to use the first.yrqtr and last.yrqtr to generate the sparsed dataset.

Also, assume that the date is a SAS date variable.

Thank you.
data_null__
Jade | Level 19
If you make DATE a SAS date then you can use a simple look ahead read and some date functions to generate the missing observations(quarters). Computing the LAG as you go.

[pre]
data CPI;
infile cards;
input @;
_infile_ = translate(_infile_,'Q',':');
input Date :yyq. State:$1. CPI;
format date yyq.;
cards;
1976:1 CA 100
1976:2 CA 101
1976:3 CA 102
1976:4 CA 103
1977:1 CA 104
1977:3 CA 106
1977:4 CA 107
1976:1 PA 100
1976:3 PA 102
1976:4 PA 103
1977:1 PA 104
1977:3 PA 106
1977:4 PA 107
;;;;
run;
proc print;
run;
data CPI;
set CPI end=eof;
by state date;
if not eof then set CPI(firstobs=2 keep=date rename=date=nextQ);
q = ifN(not last.state,intCK('QTR',date,nextQ),1);
do i = 1 to q;
date = intNX('QTR',date,i-1);
CPI_L1 = lag1(CPI);
if first.state then CPI_L1 = .;
output;
CPI = .;
end;
*drop nextQ q i;
run;
proc print;
by state;
id state;
run;
[/pre]
deleted_user
Not applicable
Thank you again for your help. I noticed just one "minor" problem with the code that was provided. If the observation is the first observation for a state and it is then followed by a missing quarter, the reported lagged value of the variable (CPI) for the newly generated quarter (1976:2) is missing instead of being equal to the previous quarters value. In my example, there is no observation for CPI for PA in 1976:2. When this observation is generated in the final dataset, the lagged value of CPI is missing and it should be the value of CPI in 1976:1, which is 100.
deleted_user
Not applicable
I believe that I have a fix to that problem. In the statement checking if it is the first observation for the state, you should only set the lag cpi to missing if it is the first time through the loop. I have modified the statement as follows:

if (first.state and i=1) then CPI_L1 = .;

Thank you.
deleted_user
Not applicable
I found another problem for the situation when there is more than one quarter of missing data between valid observations. In this case, the counter did not augment the date variable correctly. the following code works.

data CPI;
infile cards;
input @;
_infile_ = translate(_infile_,'Q',':');
input Date :yyq. State:$1. CPI;
format date yyq.;
cards;
1976:1 CA 100
1976:2 CA 101
1976:3 CA 102
1976:4 CA 103
1977:1 CA 104
1977:3 CA 106
1977:4 CA 107
1976:1 PA 100
1976:4 PA 103
1977:1 PA 104
1977:3 PA 106
1977:4 PA 107
;;;;
run;
proc print;
run;
data CPI;
set CPI end=eof;
by state date;
if not eof then set CPI(firstobs=2 keep=date rename=date=nextQ);
q = ifN(not last.state,intCK('QTR',date,nextQ),1);
do i = 1 to q;
if i=1 then date = intNX('QTR',date,0);
if i>1 then date = intNX('QTR',date,1);
CPI_L1 = lag1(CPI);
if (first.state and i=1) then CPI_L1 = .;
output;
CPI = .;
end;
*drop nextQ q i;
run;
data_null__
Jade | Level 19
Sorry I did not check this more closely.

For DATE it is probably easier if you just create a new variable from DATE by INTNXing DATE.

[pre]
data CPI;
infile cards;
input @;
_infile_ = translate(_infile_,'Q',':');
input Date :yyq. State:$2. CPI;
format date yyq.;
cards;
1976:1 CA 100
1976:2 CA 101
1976:3 CA 102
1976:4 CA 103
1977:1 CA 104
1977:3 CA 106
1977:4 CA 107
1976:1 PA 100
1976:4 PA 103
1977:1 PA 104
1977:3 PA 106
1977:4 PA 107
;;;;
run;
proc print;
run;
data CPI;
set CPI end=eof;
by state date;
if not eof then set CPI(firstobs=2 keep=date rename=date=nextQ);
q = ifN(not last.state,intCK('QTR',date,nextQ),1)-1;
do i = 0 to q;
ndate = intNX('QTR',date,i);
CPI_L1 = lag1(CPI);
if first.state and i eq 0 then CPI_L1 = .;
output;
CPI = .;
end;
drop nextQ q i date;
format ndate yyq.;
run;
proc print;
by state;
id state;
run;
[/pre]

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 2180 views
  • 0 likes
  • 3 in conversation