I have two data points (1990 & 2018) for each variable and each cluster. I want to create another row for each cluster and each variable for the year 2047 by using this formula: 2047 val = (2018 val - 1990 val) + 2018 val for emp, lq, and emp_share_perc.
My current data set:
| cluster | emp | lq | year2 | emp_share_perc |
| Accommodation and Food Svcs | 12868 | 1.074518089 | 01Jan1990 | 6.2331372 |
| Accommodation and Food Svcs | 31869.5 | 1.281039607 | 01Jan2018 | 10.3577914 |
| Advanced Materials | 7477.416667 | 0.6539712 | 01Jan1990 | 3.6219897 |
| Advanced Materials | 3339.25 | 0.3381481 | 01Jan2018 | 1.0852776 |
| Agribusiness | 5567.5 | 0.928833971 | 01Jan1990 | 2.6968442 |
| Agribusiness | 3619.5 | 0.494140709 | 01Jan2018 | 1.1763607 |
| Biomedical | 673.8333333 | 0.279438995 | 01Jan1990 | 0.3263985 |
| Biomedical | 863.5 | 0.236455726 | 01Jan2018 | 0.280643 |
My desired output:
| cluster | emp | lq | year2 | emp_share_perc |
| Accommodation and Food Svcs | 12868 | 1.074518 | 01Jan1990 | 6.233137 |
| Accommodation and Food Svcs | 31869.5 | 1.28104 | 01Jan2018 | 10.35779 |
| Accommodation and Food Svcs | 50871 | 1.487561 | 01Jan2047 | 14.48245 |
| Advanced Materials | 7477.417 | 0.653971 | 01Jan1990 | 3.62199 |
| Advanced Materials | 3339.25 | 0.338148 | 01Jan2018 | 1.085278 |
| Advanced Materials | -798.917 | 0.022325 | 01Jan2047 | -1.45143 |
| Agribusiness | 5567.5 | 0.928834 | 01Jan1990 | 2.696844 |
| Agribusiness | 3619.5 | 0.494141 | 01Jan2018 | 1.176361 |
| Agribusiness | 1671.5 | 0.059447 | 01Jan2047 | -0.34412 |
| Biomedical | 673.8333 | 0.279439 | 01Jan1990 | 0.326399 |
| Biomedical | 863.5 | 0.236456 | 01Jan2018 | 0.280643 |
| Biomedical | 1053.167 | 0.193472 | 01Jan2047 | 0.234888 |
data EP_data_2047;
set EP_data_1d;
LENGTH year3 8;
FORMAT year3 DATE9.;
INFORMAT year3 DATE9.;
by cluster;
if cluster = cluster then lq_2047 = (lq - lag(lq)) + lq;
year3 = '01JAN2047'd;
run;
You can use a _TEMPORARY_ array to hold the 1990 values (_TEMPORARY_ values are automatically retained). Then after output the 2018 value, you can calculate and output the 2047 values:
data have;
input cluster $27. emp lq year2 :date9. emp_share_perc;
format year2 date9. ;
datalines;
Accommodation and Food Svcs 12868 1.074518089 01Jan1990 6.2331372
Accommodation and Food Svcs 31869.5 1.281039607 01Jan2018 10.3577914
Advanced Materials 7477.416667 0.6539712 01Jan1990 3.6219897
Advanced Materials 3339.25 0.3381481 01Jan2018 1.0852776
Agribusiness 5567.5 0.928833971 01Jan1990 2.6968442
Agribusiness 3619.5 0.494140709 01Jan2018 1.1763607
Biomedical 673.8333333 0.279438995 01Jan1990 0.3263985
Biomedical 863.5 0.236455726 01Jan2018 0.280643
run;
data want (drop=i);
set have;
by cluster;
output;
array tmp {3} _temporary_;
array var {3} emp lq emp_share_perc;
if year2='01jan1990'd then do i=1 to 3;
tmp{i}=var{i};
end;
if year2='01jan2018'd;
year2='01jan2047'd;
do i=1 to 3;
var{i}=2*var{i}-tmp{i};
end;
output;
run;
This program assumes each cluster has one 1990 record followed by one 2018 record.
You can use a _TEMPORARY_ array to hold the 1990 values (_TEMPORARY_ values are automatically retained). Then after output the 2018 value, you can calculate and output the 2047 values:
data have;
input cluster $27. emp lq year2 :date9. emp_share_perc;
format year2 date9. ;
datalines;
Accommodation and Food Svcs 12868 1.074518089 01Jan1990 6.2331372
Accommodation and Food Svcs 31869.5 1.281039607 01Jan2018 10.3577914
Advanced Materials 7477.416667 0.6539712 01Jan1990 3.6219897
Advanced Materials 3339.25 0.3381481 01Jan2018 1.0852776
Agribusiness 5567.5 0.928833971 01Jan1990 2.6968442
Agribusiness 3619.5 0.494140709 01Jan2018 1.1763607
Biomedical 673.8333333 0.279438995 01Jan1990 0.3263985
Biomedical 863.5 0.236455726 01Jan2018 0.280643
run;
data want (drop=i);
set have;
by cluster;
output;
array tmp {3} _temporary_;
array var {3} emp lq emp_share_perc;
if year2='01jan1990'd then do i=1 to 3;
tmp{i}=var{i};
end;
if year2='01jan2018'd;
year2='01jan2047'd;
do i=1 to 3;
var{i}=2*var{i}-tmp{i};
end;
output;
run;
This program assumes each cluster has one 1990 record followed by one 2018 record.
Nearly 200 sessions are now available on demand with the SAS Innovate Digital Pass.
Explore Now →SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.