Hi guys
I have the following register-based dataset. It offers information about parents employment history (e.g. y_1452 is week 52 in 2014, etc.) What I need to do is to set this employment data in relation to the birth of a child.
pnr | week of child-birth (for children born in 2015) | y_1452 | y_1501 | y_1502 | y_1503 | y_1504 |
X | 1 | |||||
Y | 2 |
So for example when a child is born in the first week of 2015 week_0 must be equal to y_1501 and week_1 must be equal to y_1502 and so on and soforth. This means that the data-set above have to look something like this:
pnr | birth | week_prior to birth | week_0 | week_1 | week_2 | week_3 | week_4 |
X | 1 | value of y_1452 | value of y_1501 | value of y_1502 | value of y_1503 | value of y_1504 | value of y_1505 |
Y | 2 | value of y_1501 | value of y_1502 | value of y_1503 | value of y_1504 | value of y_1505 | value of y_1506 |
I am rather new to SAS so what i have tried so far is to use the rename-function, so for example for children born in the first week of 2015 i have written the following code:
data want;
set have;
if week=1;
rename y_1452-y_1401 = week__1 - week__52;
rename y_1501-y_1553 = week_0 - week_52;
rename y_1601-1651 = week_53-week_103;
run;
where_ is after birth and __ is before birth. Maybe a bad naming convention btw.
So I can potentially do this for the 53 weeks in 2015, and after that merge the datasets. However, it is a lot of work, so maybe any of you have a smart solution?
Thanks in advance.
Best regards
Rasmus Thomsen
data have;
input pnr $ week y_1452 y_1501 y_1502 y_1503 y_1504 y_1505;
cards;
X 1 1 2 3 4 5 6
Y 2 1 2 3 4 5 6
;
data want;
set have;
week_0=vvaluex(cats('y_15',put(week,z2.)));
week_1=vvaluex(cats('y_15',put(week+1,z2.)));
week_2=vvaluex(cats('y_15',put(week+2,z2.)));
week_3=vvaluex(cats('y_15',put(week+3,z2.)));
drop y_:;
run;
This is a great example of a situation where a LONG dataset would be easier to program and easier to analyze, than your proposed WIDE data set. See Maxim 19. Among other benefits, no renaming is needed now.
Example:
pnr birth week value a 1 29DEC2014 Y a 1 03JAN2015 N a 1 10JAN2015 Y b 2 03JAN2015 Y b 2 10JAN2015 Y
If I am understanding you properly, and you need to compute weeks after birth, then you can do something like this:
data want;
set have;
retain birth_week;
by pnr;
if first.pnr then birth_week = intnx('week',mdy(1,1,2015)+birth*7,-1,'b');
weeks_after_birth=floor((week-birth_week)/7);
run;
If your data arrives in this unfortunate wide format, then transpose it to long.
Renaming is not going to help because you have data in the variable names, a year and week as a minimum. And you attempt to compound things by creating a "before" using a hyphen in the name. To quote Bill the Cat "Gaack!"
One thing dealing with "week" is defining when a week starts and how to deal with things crossing a year boundary. Because there are not exactly 52 weeks your y_1452 is somewhat questionable as to how it would mean that.
Merge what datasets?????
One question, is the "week of child-birth" ONLY and ALWAYS in 2015? Or will you have to deal with 2016, 2017 etc?
Here's a stub of a program that eventually makes a report similar to what you request with only a few variables. I am not going make 100's of variables in a poor data structure.
The first is just to make a data set that looks something roughly like what you have.
Then it reshapes the data to one observation per "pnr" whatever that may be, birth week and week value. This results in a VARIABLE that holds the week of the "value".
Very important: IF you have duplicate PNR values with the same birthweek this is not going to work because you did not provide any actual example. This approach works with one unique PNR Birthweek combination. If that is not the case in your data then you need additional identification variable(s).
Proc Transpose is designed to some pretty nifty things with changing data from wide to long or long to wide.
With the transposed data the next data step uses the information to get an approximate date of birth using a fixed year and the birthweek variable, and an approximate date of the value by parsing values from the week name.
The function INTNX shifts a date by a given interval. Then use the INTCK function, that returns the number of intervals between two dates find the "week of the value".
A report writes out something similar to your requested result.
An exercise for the interested reader is to use Proc Transpose to make that hard to work with wide format.
Hint: ID BW, Prefix=week
Data stupid; input pnr $ birthweek y_1452 y_1501-y_1506; datalines; a 1 1 2 3 4 5 6 7 b 2 11 22 33 44 55 66 77 ; proc sort data=stupid; by pnr birthweek; run; proc transpose data=stupid out=trans prefix=value; by pnr birthweek; var y_:; run; data dated; set trans; /* approximate dob calculation Assumes ALL the births are in 2015*/ DOB = intnx('week','01JAN2015'd,birthweek); format dob date9.; /* and an approximate date of the value*/ /* have to make the dangerous assumption your values are all past year 2000, need numeric year and week values */ Vyear = input(substr(_name_,3,2),f2.) + 2000; Vweek = input(substr(_name_,5,2),f2.); DoV = intnx('week',mdy(1,1,vyear),vweek); format Dov date9.; /* and now we can calculate weeks between birth and the value*/ Bw = intck('week',dob,dov); /* assume you only want values from BW=-1 (week prior to birth) and greater*/ if bw ge -1; /* for most purposes you can drop the vyear, vweek and possibly DOB and DoV but leave them here so you can see the process */ run; /* REPORT to look at the valus instead of creating another poorly structured data set*/ proc format; value wkp -1='Prior' ; run; options missing=' '; proc report data=dated; column pnr birthweek bw, value1; define pnr/group; define birthweek /group; define bw/order=data across "Week" format=wkp.; define value1/max " "; run;
It appears that you have data to y_1651. Why not y_1652? y_1701? Exactly how many variables are involved?
Using 2 digit years for anything should have died 21 years ago as that often complicates issues.
It is MUCH preferred to have an actual date for activities if possible. The SAS provides functions that will allow you to determine differences. Often "wide" data as you show is hard to work with. Partially because any code that is written needs to be completely rewritten when you move to the next "year", or birth year.
If you have the opportunity I might suggest going back to a different stage where you actually have dates of information collected.
data have;
input pnr $ week y_1452 y_1501 y_1502 y_1503 y_1504 y_1505;
cards;
X 1 1 2 3 4 5 6
Y 2 1 2 3 4 5 6
;
data want;
set have;
week_0=vvaluex(cats('y_15',put(week,z2.)));
week_1=vvaluex(cats('y_15',put(week+1,z2.)));
week_2=vvaluex(cats('y_15',put(week+2,z2.)));
week_3=vvaluex(cats('y_15',put(week+3,z2.)));
drop y_:;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.