Solved: Re: Insert new rows by ID with change in previous values + last value

KRA15 · Posted 10-19-2019 02:46 PM

I have two data points (1990 & 2018) for each variable and each cluster. I want to create another row for each cluster and each variable for the year 2047 by using this formula: 2047 val = (2018 val - 1990 val) + 2018 val for emp, lq, and emp_share_perc.

My current data set:

cluster	emp	lq	year2	emp_share_perc
Accommodation and Food Svcs	12868	1.074518089	01Jan1990	6.2331372
Accommodation and Food Svcs	31869.5	1.281039607	01Jan2018	10.3577914
Advanced Materials	7477.416667	0.6539712	01Jan1990	3.6219897
Advanced Materials	3339.25	0.3381481	01Jan2018	1.0852776
Agribusiness	5567.5	0.928833971	01Jan1990	2.6968442
Agribusiness	3619.5	0.494140709	01Jan2018	1.1763607
Biomedical	673.8333333	0.279438995	01Jan1990	0.3263985
Biomedical	863.5	0.236455726	01Jan2018	0.280643

My desired output:

cluster	emp	lq	year2	emp_share_perc
Accommodation and Food Svcs	12868	1.074518	01Jan1990	6.233137
Accommodation and Food Svcs	31869.5	1.28104	01Jan2018	10.35779
Accommodation and Food Svcs	50871	1.487561	01Jan2047	14.48245
Advanced Materials	7477.417	0.653971	01Jan1990	3.62199
Advanced Materials	3339.25	0.338148	01Jan2018	1.085278
Advanced Materials	-798.917	0.022325	01Jan2047	-1.45143
Agribusiness	5567.5	0.928834	01Jan1990	2.696844
Agribusiness	3619.5	0.494141	01Jan2018	1.176361
Agribusiness	1671.5	0.059447	01Jan2047	-0.34412
Biomedical	673.8333	0.279439	01Jan1990	0.326399
Biomedical	863.5	0.236456	01Jan2018	0.280643
Biomedical	1053.167	0.193472	01Jan2047	0.234888

data EP_data_2047;   
	set EP_data_1d;
	LENGTH year3 8;
	FORMAT year3 DATE9.;
	INFORMAT year3 DATE9.; 
	by cluster; 
	if cluster = cluster then lq_2047 = (lq - lag(lq)) + lq;
	year3 = '01JAN2047'd;
run;

mkeintz · Posted 10-19-2019 10:11 PM

You can use a _TEMPORARY_ array to hold the 1990 values (_TEMPORARY_ values are automatically retained). Then after output the 2018 value, you can calculate and output the 2047 values:

data have;
  input cluster  $27.      emp         lq          year2 :date9.   emp_share_perc;
  format year2 date9. ;
datalines;
Accommodation and Food Svcs 12868       1.074518089 01Jan1990    6.2331372
Accommodation and Food Svcs 31869.5     1.281039607 01Jan2018   10.3577914
Advanced Materials          7477.416667 0.6539712   01Jan1990    3.6219897
Advanced Materials          3339.25     0.3381481   01Jan2018    1.0852776
Agribusiness                5567.5      0.928833971 01Jan1990    2.6968442
Agribusiness                3619.5      0.494140709 01Jan2018    1.1763607
Biomedical                  673.8333333 0.279438995 01Jan1990    0.3263985
Biomedical                  863.5       0.236455726 01Jan2018    0.280643
run;

data want (drop=i);
  set have;
  by cluster;
  output;

  array tmp {3} _temporary_;
  array var {3} emp lq emp_share_perc;

  if year2='01jan1990'd then do i=1 to 3;
    tmp{i}=var{i};
  end;
  if year2='01jan2018'd;
  year2='01jan2047'd;
  do i=1 to 3;
    var{i}=2*var{i}-tmp{i};
  end;
  output;
run;

This program assumes each cluster has one 1990 record followed by one 2018 record.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

mkeintz · Posted 10-19-2019 10:11 PM

You can use a _TEMPORARY_ array to hold the 1990 values (_TEMPORARY_ values are automatically retained). Then after output the 2018 value, you can calculate and output the 2047 values:

data have;
  input cluster  $27.      emp         lq          year2 :date9.   emp_share_perc;
  format year2 date9. ;
datalines;
Accommodation and Food Svcs 12868       1.074518089 01Jan1990    6.2331372
Accommodation and Food Svcs 31869.5     1.281039607 01Jan2018   10.3577914
Advanced Materials          7477.416667 0.6539712   01Jan1990    3.6219897
Advanced Materials          3339.25     0.3381481   01Jan2018    1.0852776
Agribusiness                5567.5      0.928833971 01Jan1990    2.6968442
Agribusiness                3619.5      0.494140709 01Jan2018    1.1763607
Biomedical                  673.8333333 0.279438995 01Jan1990    0.3263985
Biomedical                  863.5       0.236455726 01Jan2018    0.280643
run;

data want (drop=i);
  set have;
  by cluster;
  output;

  array tmp {3} _temporary_;
  array var {3} emp lq emp_share_perc;

  if year2='01jan1990'd then do i=1 to 3;
    tmp{i}=var{i};
  end;
  if year2='01jan2018'd;
  year2='01jan2047'd;
  do i=1 to 3;
    var{i}=2*var{i}-tmp{i};
  end;
  output;
run;

This program assumes each cluster has one 1990 record followed by one 2018 record.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Insert new rows by ID with change in previous values + last value

Re: Insert new rows by ID with change in previous values + last value

Re: Insert new rows by ID with change in previous values + last value

Catch up on SAS Innovate 2026