Solved: Setting weekly variables in relation to the birth of a child

RasmusT45 · Posted 03-12-2021 06:26 AM

Hi guys

I have the following register-based dataset. It offers information about parents employment history (e.g. y_1452 is week 52 in 2014, etc.) What I need to do is to set this employment data in relation to the birth of a child.

pnr	week of child-birth (for children born in 2015)	y_1452	y_1501	y_1502	y_1503	y_1504
X	1
Y	2

So for example when a child is born in the first week of 2015 week_0 must be equal to y_1501 and week_1 must be equal to y_1502 and so on and soforth. This means that the data-set above have to look something like this:

pnr	birth	week_prior to birth	week_0	week_1	week_2	week_3	week_4
X	1	value of y_1452	value of y_1501	value of y_1502	value of y_1503	value of y_1504	value of y_1505
Y	2	value of y_1501	value of y_1502	value of y_1503	value of y_1504	value of y_1505	value of y_1506

I am rather new to SAS so what i have tried so far is to use the rename-function, so for example for children born in the first week of 2015 i have written the following code:

data want;
set have;
if week=1;
rename y_1452-y_1401 = week__1 - week__52;
rename y_1501-y_1553 = week_0 - week_52;
rename y_1601-1651 = week_53-week_103;
run;

where_ is after birth and __ is before birth. Maybe a bad naming convention btw.

So I can potentially do this for the 53 weeks in 2015, and after that merge the datasets. However, it is a lot of work, so maybe any of you have a smart solution?

Thanks in advance.

Best regards

Rasmus Thomsen

Ksharp · Posted 03-12-2021 07:36 AM

data have;
input pnr $ week y_1452	y_1501	y_1502	y_1503	y_1504 y_1505;
cards;
X	1	 1 2 3 4 5	6 	 	 	 
Y	2	 1 2 3 4 5 6
;

data want;
 set have;
week_0=vvaluex(cats('y_15',put(week,z2.)));
week_1=vvaluex(cats('y_15',put(week+1,z2.)));
week_2=vvaluex(cats('y_15',put(week+2,z2.)));
week_3=vvaluex(cats('y_15',put(week+3,z2.)));
drop y_:;
run;

View solution in original post

PaigeMiller · Posted 03-12-2021 06:33 AM

This is a great example of a situation where a LONG dataset would be easier to program and easier to analyze, than your proposed WIDE data set. See Maxim 19. Among other benefits, no renaming is needed now.

Example:

pnr    birth       week              value
a         1        29DEC2014         Y
a         1        03JAN2015         N
a         1        10JAN2015         Y
b         2        03JAN2015         Y
b         2        10JAN2015         Y

If I am understanding you properly, and you need to compute weeks after birth, then you can do something like this:

data want;
    set have;
    retain birth_week;
    by pnr;
    if first.pnr then birth_week = intnx('week',mdy(1,1,2015)+birth*7,-1,'b');
    weeks_after_birth=floor((week-birth_week)/7);
run;

If your data arrives in this unfortunate wide format, then transpose it to long.

--
Paige Miller

ballardw · Posted 03-12-2021 07:27 AM

Renaming is not going to help because you have data in the variable names, a year and week as a minimum. And you attempt to compound things by creating a "before" using a hyphen in the name. To quote Bill the Cat "Gaack!"

One thing dealing with "week" is defining when a week starts and how to deal with things crossing a year boundary. Because there are not exactly 52 weeks your y_1452 is somewhat questionable as to how it would mean that.

Merge what datasets?????

One question, is the "week of child-birth" ONLY and ALWAYS in 2015? Or will you have to deal with 2016, 2017 etc?

Here's a stub of a program that eventually makes a report similar to what you request with only a few variables. I am not going make 100's of variables in a poor data structure.

The first is just to make a data set that looks something roughly like what you have.

Then it reshapes the data to one observation per "pnr" whatever that may be, birth week and week value. This results in a VARIABLE that holds the week of the "value".

Very important: IF you have duplicate PNR values with the same birthweek this is not going to work because you did not provide any actual example. This approach works with one unique PNR Birthweek combination. If that is not the case in your data then you need additional identification variable(s).

Proc Transpose is designed to some pretty nifty things with changing data from wide to long or long to wide.

With the transposed data the next data step uses the information to get an approximate date of birth using a fixed year and the birthweek variable, and an approximate date of the value by parsing values from the week name.

The function INTNX shifts a date by a given interval. Then use the INTCK function, that returns the number of intervals between two dates find the "week of the value".

A report writes out something similar to your requested result.

An exercise for the interested reader is to use Proc Transpose to make that hard to work with wide format.

Hint: ID BW, Prefix=week

Data stupid;
   input pnr $  birthweek  y_1452  y_1501-y_1506;
datalines;
a 1  1 2 3 4 5 6 7
b 2  11 22 33 44 55 66 77
;

proc sort data=stupid;
   by pnr birthweek;
run;

proc transpose data=stupid
   out=trans prefix=value;
 by pnr birthweek;
 var y_:;
run;

data dated;
   set trans;
   /* approximate dob calculation Assumes ALL 
   the births are in 2015*/
   DOB = intnx('week','01JAN2015'd,birthweek);
   format dob date9.;
   /* and an approximate date of the value*/
   /* have to make the dangerous assumption your values
   are all past year 2000, need numeric year and week values
   */
   Vyear = input(substr(_name_,3,2),f2.) + 2000;
   Vweek = input(substr(_name_,5,2),f2.);
   DoV = intnx('week',mdy(1,1,vyear),vweek);
   format Dov date9.;
   /* and now we can calculate weeks between birth and the value*/
   Bw = intck('week',dob,dov);

   /* assume you only want values from BW=-1 (week prior to birth) and greater*/
   if bw ge -1;
   /* for most purposes you can drop the vyear, vweek and possibly DOB and DoV
     but leave them here so you can see the process
   */
run;

/* REPORT to look at the valus instead of creating another poorly structured data set*/
proc format;
value wkp
-1='Prior'
;
run;

options missing=' ';
proc report data=dated;
   column pnr birthweek bw, value1;
   define pnr/group;
   define birthweek /group;
   define bw/order=data across "Week" format=wkp.;
   define value1/max " ";
run;

It appears that you have data to y_1651. Why not y_1652? y_1701? Exactly how many variables are involved?

Using 2 digit years for anything should have died 21 years ago as that often complicates issues.

It is MUCH preferred to have an actual date for activities if possible. The SAS provides functions that will allow you to determine differences. Often "wide" data as you show is hard to work with. Partially because any code that is written needs to be completely rewritten when you move to the next "year", or birth year.

If you have the opportunity I might suggest going back to a different stage where you actually have dates of information collected.

Ksharp · Posted 03-12-2021 07:36 AM

data have;
input pnr $ week y_1452	y_1501	y_1502	y_1503	y_1504 y_1505;
cards;
X	1	 1 2 3 4 5	6 	 	 	 
Y	2	 1 2 3 4 5 6
;

data want;
 set have;
week_0=vvaluex(cats('y_15',put(week,z2.)));
week_1=vvaluex(cats('y_15',put(week+1,z2.)));
week_2=vvaluex(cats('y_15',put(week+2,z2.)));
week_3=vvaluex(cats('y_15',put(week+3,z2.)));
drop y_:;
run;

Setting weekly variables in relation to the birth of a child

Re: Setting weekly variables in relation to the birth of a child

Re: Setting weekly variables in relation to the birth of a child

Re: Setting weekly variables in relation to the birth of a child

Re: Setting weekly variables in relation to the birth of a child

Registration is open

Setting weekly variables in relation to the birth of a child

Re: Setting weekly variables in relation to the birth of a child

Re: Setting weekly variables in relation to the birth of a child

Re: Setting weekly variables in relation to the birth of a child

Re: Setting weekly variables in relation to the birth of a child

Registration is open

SAS Training: Just a Click Away