Re: Creating a new record based on a subsequent record

malena · Posted 01-20-2017 10:09 AM

I will like to write a program that creates a file with 19 records with several columns. The issue is that in some cases a column willb e based on a value for a subsequent record, not yet created.

For example: here for the 3rd record (ageg=5-9), the value of dx is based on the lx value for the 4th record - the lx value for the 3rd record. Does SAS allow you to some these kind of calculations?

Ageg

x

n

a_x

pop

death

M_x

q_x

p_x

l_x

d_x

<1

0

1

0.1

73200

321

0.004385

0.0043680

0.995632

100000

437

1-4

1

4

0.5

292855

57

0.000195

0.0007782

0.999222

99563

77

5-9

5

0.5

376322

32

0.000085

0.0004251

0.999575

99486

42

10-14

10

5

0.5

398163

40

0.000100

0.0005022

0.999498

99443

50

15-19

15

5

0.5

457196

131

0.000287

0.0014316

0.998568

99393

142

20-24

20

5

0.5

491392

310

0.000631

0.0031493

0.996851

99251

313

25-29

25

5

0.5

482730

497

0.001030

0.0051346

0.994865

98939

508

30-34

30

5

0.5

447672

508

0.001135

0.0056577

0.994342

98431

557

35-39

35

5

0.5

404949

521

0.001287

0.0064123

0.993588

97874

628

40-44

40

5

0.5

432783

709

0.001638

0.0081578

0.991842

97246

793

45-49

45

5

0.5

475213

1233

0.002595

0.0128895

0.987110

96453

1243

50-54

50

5

0.5

511398

1975

0.003862

0.0191252

0.980875

95210

1821

55-59

55

5

0.5

479748

2787

0.005809

0.0286307

0.971369

93389

2674

60-64

60

5

0.5

405550

3219

0.007937

0.0389146

0.961085

90715

3530

65-69

65

5

0.5

330937

4086

0.012347

0.0598853

0.940115

87185

5221

70-74

70

5

0.5

231859

4592

0.019805

0.0943540

0.905646

81964

7734

75-79

75

5

0.5

165604

5300

0.032004

0.1481655

0.851834

74230

10998

80-85

80

5

0.5

130643

7484

0.057286

0.2505474

0.749453

63232

15843

85+

85

15

0.5

157194

21355

0.135851

1.0000000

0.000000

47389

ballardw · Posted 01-20-2017 10:35 AM

Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.

I'm not going to try to guess from an Excel file what the needed processing may be.

How are you going to calculate them? Any calculation code based on missing values will generally generate a missing value.

Since you don't show any missing values in the example data or which variables are dependent on others it is a bit difficult to see what you are doing.

PeterClemmensen · Posted 01-20-2017 10:37 AM

How many of the variables to you have as a starting point and how many do you want to calculate from the data you have? 🙂

The DATA to DATA Step Macro
Blog: SASnrd

lmignone · Posted 01-20-2017 10:50 AM

Hi Malena,

You can create a new variable with the value of the next record doing this:

data have;
length ageg $5. dx 8.;
infile datalines dlm="|";
input ageg dx;
datalines;
<1|436.800659691464
1-4|77.4839981349214
5-9|42.2891335719469
10-14|49.9385713503871
15-19|142.293749725199
20-24|312.575508974929
25-29|508.009167195865
30-34|556.895428355652
35-39|627.594050495114
40-44|793.310623042926
45-49|1243.23053041441
50-54|1820.89859881431
55-59|2673.78223114848
60-64|3530.13806725502
65-69|5221.08807892018
70-74|7733.59757376662
75-79|10998.3392139526
80-85|15842.5442862508
85+|47389.1905289392
;
run;

data want ;
retain dx;
set have (rename=(dx=next_dx)) end=end ;
if _n_ > 1 then output;
dx = next_dx;
if not end then return;
call missing(next_dx);
output;
run ;

This example creates a variable, next_dx, with the subsequent value of dx. Later you can use if-then logic according the age group to keep that value or not.

Regards,

mkeintz · Posted 01-20-2017 05:11 PM

You have a table of survivor counts (Lx) for each age group and you want the count of deaths (dx) in each age group calculated by subtracting Lx for next age group from Lx for the current age group. So "merge" the dataset have with itself, in which the 2nd reference to have has only one variable: Lx renamed to next_lx. Also offset the second reference to have by one record ("firstobs=2").

data want;

merge have

have (firstobs=2 keep=Lx rename=(Lx=next_Lx));

dx= Lx-coalesce(next_lx,0);

drop next_lx;

run;

The coalesce function returns the first non-missing value of next_lx and zero. This is relevant only for the last record, in which there is no next_lx (i.e. it has a missing value). Since everybody must die in the oldest age group, subtract zero from Lx to get 100% deaths in the 85+ cohorts.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Creating a new record based on a subsequent record