BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
KRA15
Fluorite | Level 6

I have two data points (1990 & 2018) for each variable and each cluster. I want to create another row for each cluster and each variable for the year 2047 by using this formula: 2047 val =  (2018 val - 1990 val) + 2018 val for emp, lq, and emp_share_perc. 

 

My current data set: 

 

clusteremplqyear2emp_share_perc
Accommodation and Food Svcs128681.07451808901Jan19906.2331372
Accommodation and Food Svcs31869.51.28103960701Jan201810.3577914
Advanced Materials7477.4166670.653971201Jan19903.6219897
Advanced Materials3339.250.338148101Jan20181.0852776
Agribusiness5567.50.92883397101Jan19902.6968442
Agribusiness3619.50.49414070901Jan20181.1763607
Biomedical673.83333330.27943899501Jan19900.3263985
Biomedical863.50.23645572601Jan20180.280643

 

My desired output:

clusteremplqyear2emp_share_perc
Accommodation and Food Svcs128681.07451801Jan19906.233137
Accommodation and Food Svcs31869.51.2810401Jan201810.35779
Accommodation and Food Svcs508711.48756101Jan204714.48245
Advanced Materials7477.4170.65397101Jan19903.62199
Advanced Materials3339.250.33814801Jan20181.085278
Advanced Materials-798.9170.02232501Jan2047-1.45143
Agribusiness5567.50.92883401Jan19902.696844
Agribusiness3619.50.49414101Jan20181.176361
Agribusiness1671.50.05944701Jan2047-0.34412
Biomedical673.83330.27943901Jan19900.326399
Biomedical863.50.23645601Jan20180.280643
Biomedical1053.1670.19347201Jan20470.234888

 

data EP_data_2047;   
	set EP_data_1d;
	LENGTH year3 8;
	FORMAT year3 DATE9.;
	INFORMAT year3 DATE9.; 
	by cluster; 
	if cluster = cluster then lq_2047 = (lq - lag(lq)) + lq;
	year3 = '01JAN2047'd;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

You can use a _TEMPORARY_ array to hold the 1990 values (_TEMPORARY_ values are automatically retained).  Then after output the 2018 value, you can calculate and output the 2047 values:

 

data have;
  input cluster  $27.      emp         lq          year2 :date9.   emp_share_perc;
  format year2 date9. ;
datalines;
Accommodation and Food Svcs 12868       1.074518089 01Jan1990    6.2331372
Accommodation and Food Svcs 31869.5     1.281039607 01Jan2018   10.3577914
Advanced Materials          7477.416667 0.6539712   01Jan1990    3.6219897
Advanced Materials          3339.25     0.3381481   01Jan2018    1.0852776
Agribusiness                5567.5      0.928833971 01Jan1990    2.6968442
Agribusiness                3619.5      0.494140709 01Jan2018    1.1763607
Biomedical                  673.8333333 0.279438995 01Jan1990    0.3263985
Biomedical                  863.5       0.236455726 01Jan2018    0.280643
run;

data want (drop=i);
  set have;
  by cluster;
  output;

  array tmp {3} _temporary_;
  array var {3} emp lq emp_share_perc;

  if year2='01jan1990'd then do i=1 to 3;
    tmp{i}=var{i};
  end;
  if year2='01jan2018'd;
  year2='01jan2047'd;
  do i=1 to 3;
    var{i}=2*var{i}-tmp{i};
  end;
  output;
run;

  

This program assumes each cluster has one 1990 record followed by one 2018 record.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

1 REPLY 1
mkeintz
PROC Star

You can use a _TEMPORARY_ array to hold the 1990 values (_TEMPORARY_ values are automatically retained).  Then after output the 2018 value, you can calculate and output the 2047 values:

 

data have;
  input cluster  $27.      emp         lq          year2 :date9.   emp_share_perc;
  format year2 date9. ;
datalines;
Accommodation and Food Svcs 12868       1.074518089 01Jan1990    6.2331372
Accommodation and Food Svcs 31869.5     1.281039607 01Jan2018   10.3577914
Advanced Materials          7477.416667 0.6539712   01Jan1990    3.6219897
Advanced Materials          3339.25     0.3381481   01Jan2018    1.0852776
Agribusiness                5567.5      0.928833971 01Jan1990    2.6968442
Agribusiness                3619.5      0.494140709 01Jan2018    1.1763607
Biomedical                  673.8333333 0.279438995 01Jan1990    0.3263985
Biomedical                  863.5       0.236455726 01Jan2018    0.280643
run;

data want (drop=i);
  set have;
  by cluster;
  output;

  array tmp {3} _temporary_;
  array var {3} emp lq emp_share_perc;

  if year2='01jan1990'd then do i=1 to 3;
    tmp{i}=var{i};
  end;
  if year2='01jan2018'd;
  year2='01jan2047'd;
  do i=1 to 3;
    var{i}=2*var{i}-tmp{i};
  end;
  output;
run;

  

This program assumes each cluster has one 1990 record followed by one 2018 record.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 849 views
  • 1 like
  • 2 in conversation