BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
RasmusT45
Calcite | Level 5

Hi guys

 

I have the following register-based dataset. It offers information about parents employment history (e.g. y_1452 is week 52 in 2014, etc.) What I need to do is to set this employment data in relation to the birth of a child.

pnrweek of child-birth (for children born in 2015)y_1452y_1501y_1502y_1503y_1504
X1     
Y2     

 

So for example when a child is born in the first week of 2015 week_0 must be equal to y_1501 and week_1 must be equal to y_1502 and so on and soforth. This means that the data-set above have to look something like this:

pnrbirthweek_prior to birthweek_0week_1week_2week_3week_4
X1value of y_1452value of y_1501value of y_1502value of y_1503value of y_1504value of y_1505
Y2value of y_1501value of y_1502value of y_1503value of y_1504value of y_1505value of y_1506

 

I am rather new to SAS so what i have tried so far is to use the rename-function, so for example for children born in the first week of 2015 i have written the following code:

data want;
set have;
if week=1;
rename y_1452-y_1401 = week__1 - week__52;
rename y_1501-y_1553 = week_0 - week_52;
rename y_1601-1651 = week_53-week_103;
run;

where_ is after birth and __ is before birth. Maybe a bad naming convention btw.

So I can potentially do this for the 53 weeks in 2015, and after that merge the datasets. However, it is a lot of work, so maybe any of you have a smart solution?

 

Thanks in advance.

 

Best regards

Rasmus Thomsen

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data have;
input pnr $ week y_1452	y_1501	y_1502	y_1503	y_1504 y_1505;
cards;
X	1	 1 2 3 4 5	6 	 	 	 
Y	2	 1 2 3 4 5 6
;

data want;
 set have;
week_0=vvaluex(cats('y_15',put(week,z2.)));
week_1=vvaluex(cats('y_15',put(week+1,z2.)));
week_2=vvaluex(cats('y_15',put(week+2,z2.)));
week_3=vvaluex(cats('y_15',put(week+3,z2.)));
drop y_:;
run;

View solution in original post

3 REPLIES 3
PaigeMiller
Diamond | Level 26

This is a great example of a situation where a LONG dataset would be easier to program and easier to analyze, than your proposed WIDE data set. See Maxim 19. Among other benefits, no renaming is needed now.

 

Example:

 

pnr    birth       week              value
a         1        29DEC2014         Y
a         1        03JAN2015         N
a         1        10JAN2015         Y
b         2        03JAN2015         Y
b         2        10JAN2015         Y

If I am understanding you properly, and you need to compute weeks after birth, then you can do something like this:

 

data want;
    set have;
    retain birth_week;
    by pnr;
    if first.pnr then birth_week = intnx('week',mdy(1,1,2015)+birth*7,-1,'b');
    weeks_after_birth=floor((week-birth_week)/7);
run;

 

If your data arrives in this unfortunate wide format, then transpose it to long.

 

--
Paige Miller
ballardw
Super User

Renaming is not going to help because you have data in the variable names, a year and week as a minimum. And you attempt to compound things by creating a "before" using a hyphen in the name. To quote Bill the Cat "Gaack!"

 

One thing dealing with "week" is defining when a week starts and how to deal with things crossing a year boundary. Because there are not exactly 52 weeks your y_1452 is somewhat questionable as to how it would mean that.

 

Merge what datasets?????

 

One question, is the "week of child-birth" ONLY and ALWAYS in 2015? Or will you have to deal with 2016, 2017 etc?

 

Here's a stub of a program that eventually makes a report similar to what you request with only a few variables. I am not going make 100's of variables in a poor data structure.

The first is just to make a data set that looks something roughly like what you have.

Then it reshapes the data to one observation per "pnr" whatever that may be, birth week and week value. This results in a VARIABLE that holds the week of the "value".

Very important: IF you have duplicate PNR values with the same birthweek this is not going to work because you did not provide any actual example. This approach works with one unique PNR Birthweek combination. If that is not the case in your data then you need additional identification variable(s).

Proc Transpose is designed to some pretty nifty things with changing data from wide to long or long to wide.

With the transposed data the next data step uses the information to get an approximate date of birth using a fixed year and the birthweek variable, and an approximate date of the value by parsing values from the week name.

The function INTNX shifts a date by a given interval. Then use the INTCK function, that returns the number of intervals between two dates find the "week of the value".

A report writes out something similar to your requested result.

An exercise for the interested reader is to use Proc Transpose to make that hard to work with wide format.

Hint: ID BW, Prefix=week

 

 

Data stupid;
   input pnr $  birthweek  y_1452  y_1501-y_1506;
datalines;
a 1  1 2 3 4 5 6 7
b 2  11 22 33 44 55 66 77
;

proc sort data=stupid;
   by pnr birthweek;
run;

proc transpose data=stupid
   out=trans prefix=value;
 by pnr birthweek;
 var y_:;
run;

data dated;
   set trans;
   /* approximate dob calculation Assumes ALL 
   the births are in 2015*/
   DOB = intnx('week','01JAN2015'd,birthweek);
   format dob date9.;
   /* and an approximate date of the value*/
   /* have to make the dangerous assumption your values
   are all past year 2000, need numeric year and week values
   */
   Vyear = input(substr(_name_,3,2),f2.) + 2000;
   Vweek = input(substr(_name_,5,2),f2.);
   DoV = intnx('week',mdy(1,1,vyear),vweek);
   format Dov date9.;
   /* and now we can calculate weeks between birth and the value*/
   Bw = intck('week',dob,dov);

   /* assume you only want values from BW=-1 (week prior to birth) and greater*/
   if bw ge -1;
   /* for most purposes you can drop the vyear, vweek and possibly DOB and DoV
     but leave them here so you can see the process
   */
run;

/* REPORT to look at the valus instead of creating another poorly structured data set*/
proc format;
value wkp
-1='Prior'
;
run;

options missing=' ';
proc report data=dated;
   column pnr birthweek bw, value1;
   define pnr/group;
   define birthweek /group;
   define bw/order=data across "Week" format=wkp.;
   define value1/max " ";
run;

 

 

It appears that you have data to y_1651. Why not y_1652? y_1701? Exactly how many variables are involved?

 

 

Using 2 digit years for anything should have died 21 years ago as that often complicates issues.

 

It is MUCH preferred to have an actual date for activities if possible. The SAS provides functions that will allow you to determine differences. Often "wide" data as you show is hard to work with. Partially because any code that is written needs to be completely rewritten when you move to the next "year", or birth year.

 

If you have the opportunity I might suggest going back to a different stage where you actually have dates of information collected. 

Ksharp
Super User
data have;
input pnr $ week y_1452	y_1501	y_1502	y_1503	y_1504 y_1505;
cards;
X	1	 1 2 3 4 5	6 	 	 	 
Y	2	 1 2 3 4 5 6
;

data want;
 set have;
week_0=vvaluex(cats('y_15',put(week,z2.)));
week_1=vvaluex(cats('y_15',put(week+1,z2.)));
week_2=vvaluex(cats('y_15',put(week+2,z2.)));
week_3=vvaluex(cats('y_15',put(week+3,z2.)));
drop y_:;
run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1095 views
  • 1 like
  • 4 in conversation