Hi,
I am trying to compute the mid-value of the value of two years.
For table 1,
country | year | CC |
AFGHANISTAN | 1996 | -1.29 |
AFGHANISTAN | 1998 | -1.18 |
AFGHANISTAN | 2000 | -1.30 |
AFGHANISTAN | 2002 | -1.26 |
ALBANIA | 1996 | -0.89 |
ALBANIA | 1998 | -1.03 |
ALBANIA | 2000 | -0.86 |
ALBANIA | 2002 | -0.87 |
for each country, I have the value of 1996, 1998, 2000, 2002.
I am trying to get the value of 1997 by using the mid-value of 1996 and 1998, get the value of 1999 by using the mid-value of 1998 and 2000,
get the value of 2001 by using the mid-value of 2000 and 2001.
for example, the CC of AFGHANISTAN in 1997 is expect to equal (-1.29+-1.18)/2 =-1.24
the final result of AFGHANISTAN should look like below
country | year | CC |
AFGHANISTAN | 1996 | -1.29 |
AFGHANISTAN | 1997 | -1.24 |
AFGHANISTAN | 1998 | -1.18 |
AFGHANISTAN | 1999 | -1.24 |
AFGHANISTAN | 2000 | -1.30 |
AFGHANISTAN | 2000 | -1.28 |
AFGHANISTAN | 2002 | -1.26 |
Could you please give me some suggestion about this?
thank in advance.
data table1; infile cards dsd dlm=","; input country :$50. year :8. CC :8. ; cards; AFGHANISTAN,1996,-1.29 AFGHANISTAN,1998,-1.18 AFGHANISTAN,2000,-1.30 AFGHANISTAN,2002,-1.26 ALBANIA,1996,-0.89 ALBANIA,1998,-1.03 ALBANIA,2000,-0.86 ALBANIA,2002,-0.87 ;;;; run;
Hi @Alexxxxxxx Basically,- Lag, plug, re-read the record that's plugged using POINT=. That's just it
data table1;
infile cards dsd dlm=",";
input
country :$50.
year :8.
CC :8.
;
cards;
AFGHANISTAN,1996,-1.29
AFGHANISTAN,1998,-1.18
AFGHANISTAN,2000,-1.30
AFGHANISTAN,2002,-1.26
ALBANIA,1996,-0.89
ALBANIA,1998,-1.03
ALBANIA,2000,-0.86
ALBANIA,2002,-0.87
;;;;
run;
data want;
do until(last.country);
set table1 ;
by country;
n+1;
l_cc = lag(cc);
l_year = lag(year);
if not first.country then do;
year=l_year+1;
cc=round(mean(l_cc,cc),.01);
output;
set table1 point=n;
output;
end;
else output;
end;
drop l_:;
format cc 8.2;
run;
Data step is pretty simple.
data want;
set table;
by country year;
prev_value = lag(cc);
prev_year = lag(year);
if not first.country then do;
output;
year = year+1;
cc = mean(cc, prev_value);
if not last.country then output;
end;
*not sure if this is needed;
else output;
run;
You will want to sort afterwards, untested.
Hi @Reeza
Many thanks for your reply.
But I do not get the value of 1997.
Could you please have a look?
country | year | CC | prev_value | prev_year |
AFGHANISTAN | 1996 | -1.29 | . | . |
AFGHANISTAN | 1998 | -1.18 | -1.29 | 1996 |
AFGHANISTAN | 1999 | -1.235 | -1.29 | 1996 |
AFGHANISTAN | 2000 | -1.3 | -1.18 | 1998 |
AFGHANISTAN | 2001 | -1.24 | -1.18 | 1998 |
AFGHANISTAN | 2002 | -1.26 | -1.3 | 2000 |
ALBANIA | 1996 | -0.89 | -1.26 | 2002 |
ALBANIA | 1998 | -1.03 | -0.89 | 1996 |
ALBANIA | 1999 | -0.96 | -0.89 | 1996 |
ALBANIA | 2000 | -0.86 | -1.03 | 1998 |
ALBANIA | 2001 | -0.945 | -1.03 | 1998 |
ALBANIA | 2002 | -0.87 | -0.86 | 2000 |
Hi @Alexxxxxxx Basically,- Lag, plug, re-read the record that's plugged using POINT=. That's just it
data table1;
infile cards dsd dlm=",";
input
country :$50.
year :8.
CC :8.
;
cards;
AFGHANISTAN,1996,-1.29
AFGHANISTAN,1998,-1.18
AFGHANISTAN,2000,-1.30
AFGHANISTAN,2002,-1.26
ALBANIA,1996,-0.89
ALBANIA,1998,-1.03
ALBANIA,2000,-0.86
ALBANIA,2002,-0.87
;;;;
run;
data want;
do until(last.country);
set table1 ;
by country;
n+1;
l_cc = lag(cc);
l_year = lag(year);
if not first.country then do;
year=l_year+1;
cc=round(mean(l_cc,cc),.01);
output;
set table1 point=n;
output;
end;
else output;
end;
drop l_:;
format cc 8.2;
run;
Hi @novinosrin ,
Many thanks for your reply.
May I ask what is the logic to use following code,
set table1 point=n; output; end;
I saw when I delete these code, I lose the observation of the year 1998, 2000 and 2002. Point refers to a specific observation in a data set. but why it generates the obs of year 1998, 2000 and 2002 rather than the other year?
Could you please explain this for me?
Hi @Alexxxxxxx Basically the POINT= directly reads an observation from a SAS dataset that is the value of the variable specified in the (POINT= variable) option. You would have noticed, processing is required for only those records of a by group that is not the 1st record. Therefore, while processing these(2nd to last record of a by group) records using the LAG information, the original information that was read into the PDV is lost for that particular record. Rather, we would only have the computed information. In order to fetch the original information we need a way to re-read the same record that gets our original information for these records viz. in your example 1998,2000,2002. I hope this makes sense?
Perhaps, the following depiction may clarify better-
if not first.country then do;
year=l_year+1; /*This computation resets to computed value using lags, thus losing original record*/
cc=round(mean(l_cc,cc),.01);/*This computation resets to computed value using lags, thus losing original record*/
output;
set table1 point=n;/*This computation resets to original value by re-reading the record the dataset again*/
output;
end;
You can interpolate with proc expand:
proc expand data=table1 out=table2 factor=(2:1);
by country;
id year;
convert CC=CCC / method=join;
run;
proc print data=table2 noobs; run;
My preferred method: do a look-ahead MERGE:
data table1;
infile cards dsd dlm=",";
input
country :$50.
year :8.
CC :8.
;
cards;
AFGHANISTAN,1996,-1.29
AFGHANISTAN,1998,-1.18
AFGHANISTAN,2000,-1.30
AFGHANISTAN,2002,-1.26
ALBANIA,1996,-0.89
ALBANIA,1998,-1.03
ALBANIA,2000,-0.86
ALBANIA,2002,-0.87
;
data want;
merge
table1
table1 (
firstobs=2
rename=(
country=_country
year=_year
cc=_cc
)
)
;
output;
if country = _country
then do;
cc = (cc + _cc) / 2;
do year = year + 1 to _year - 1;
output;
end;
end;
drop _:;
run;
data table1; infile cards dsd dlm=","; input country :$50. year :8. CC :8. ; cards; AFGHANISTAN,1996,-1.29 AFGHANISTAN,1998,-1.18 AFGHANISTAN,2000,-1.30 AFGHANISTAN,2002,-1.26 ALBANIA,1996,-0.89 ALBANIA,1998,-1.03 ALBANIA,2000,-0.86 ALBANIA,2002,-0.87 ;;;; run; data want; merge table1 table1(firstobs=2 rename=(country=_country year=_year cc=_cc)); output; if country=_country then do; do i=year+1 to _year-1; cc=(_cc+cc)/(_year-year); year=i;output; end; end; drop _: i ; run;
Hi @Alexxxxxxx,
I like @Reeza's idea of reading the input dataset only once. Here's another (similar) implementation of that idea:
data want(drop=_:);
set table1;
by country year;
_cc=round(mean(cc, lag(cc)), 1e-9);
if first.country then output;
else do;
_c=cc; cc=_cc; year+-1; output;
cc=_c; year+ 1; output;
end;
format cc 8.2;
run;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.