Hi there
I have a relatively simple problem, or at least i think so, but after two hours of search, i must admit that i still can't solve it.
What I have is a dataset looking like:
id week_start week_end year x1
1 1 52 1998 0
1 53 79 1999 1
1 78 104 1999 1
2 . . .
And so on for the all my id. So I have that each observation have som characterisk for some interval of time (week_end-week_start).
What I want is to "split each observation by into monthly interval:
id week_start week_end year x1
1 1 4 1998 0
1 5 8 1998 0
1 9 12 1998 0
1 13 16 1998 0
1 17 20 1998 0
1 21 24 1998 0
1 25 28 1998 0
1 29 32 1998 0
1 33 36 1998 0
1 37 41 1998 0
1 42 46 1998 0
1 47 52 1998 0
1 53 56 1999 1
And so on...
I do not have a clue, what i am going to do. It could be really nice, if someone could help me.
Thanks
Data want;
set have;
do week_start = 1 to 47 by 4;
week_end = week_start + 3;
output;
end;
run;
Should do it.
This almost solve my problem.
But what do i do with the years after that?
I have tried:
if year=1998 then do:
week_start = 1 to 47 by 4;
week_end = week_start + 3;
output;
end;
if year=1999 then do
week_start = 53 to 99 by 4;
week_end = week_start + 3;
output;
end;
run;
But it does not seem work.
run;
I think you may want to back up a bit and actually use DATE values as there are functions to do manipulations and formats for display that work. There are several definitions of WEEK, especially as to how the begining and end of year weeks are treated. Please see for example the online documentation of the formats WEEK, WEEKU, WEEKV and WEEKW, or the WEEK function and the options u, v and w.
Choices may well depend on what you are doing next.
I think a slight variation on ballardw's program would do it:
data want;
set have;
original_week_end = week_end;
do week_start = week_start to original_week_end by 4;
week_end = min(original_week_end, week_start + 3);
output;
end;
drop original_week_end;
run;
I'm not sure that you're actually looking at my program. I haven't hard-coded any part of the loop such as:
1 to 47
Instead, I start the loop at the value of WEEK_START.
The only issue that I didn't try to address was when to keep 5 weeks in a goup instead of 4. The rules really aren't specified, so I left it as all 4-week groups, and the leftover week(s) go into their own observation even if it's only 1 week that is left over.
Try it and see how close it comes to what you need.
Hi again Astounding
I'm sorry i got the copy-paste wrong. So my reply to you was a little different.
I am close to the goal with your code
What i need (and what i have not specified!) is each observation to "dividided" into 12 month - some consistent of 4 others of 5.
I have not adressed which to consist of 5 because i did not believe that it could be coded?
But okay: Each year i know which month to consist of 5 weeks.
Lets say that in first year: 1-52 i know that it is month 5 and 7,8 which should contain 5 weeks,
and it is different in the next year.
Is it possible to code that?
If not. I just want 12 observations per year, and the last month to contain 4 weeks.
PS: Thanks for the quick reply!!
Anything is possible. But there are a couple of missing ingredients here.
(1) How is it possible that your second observation goes from week 53 to week 79?
(2) Is the information about which months have 4 weeks and which have 5 in your head, or is it contained in a data set? (if in a data set, what does the data set look like?)
There may be further small pieces to fill in here and there, but those two are the dealbreakers.
Fantastic response!
1)It is because there are two observations on id=1 in the following year (1999), which i may forgot to point out.
2) It is not explicitly in the dataset, the observations per id is now sat so that they "split" each year, and maybe
if they change their x1-state (as id=1 do in 1999) in a given year.
I dont know if that makes sense. If not, just say so and i would like to try a formulate it again.
With that being said it is not completely random which month to have 4 and which to have 5,
it follows the year calender/a specific year calender.
Do you need more information?
Thank you!!!
I think the best way to proceed would be for you to create a SAS data set that indicates which time periods contain 5 weeks. After what I hope is enough thought, the structure should be like this:
(a) Just two variables: year and week
(b) Just one observation indicating each 5-week time period
(c) The value for WEEK should be the week that is the 4th week in the 5-week time period
It would also simplify things slightly if you could supply the range of years in the data.
But my dataset still look like:
id week_start week_end year x1
1 1 52 1998 0
1 53 79 1999 1
1 78 104 1999 1
2 1 52 1998 1
So how do i create these observations indicating each 5-week time period?
What if i choose the first 9 period to contain 4 weeks, and the last 3 to contain 5 weeks,
would that be possible??
Im sorry. Im new to coding. I have tried to search on "do statement sas examples", but none seems to match my problem here.
You're jumping a step ahead, worrying about how to program using all the data. For now, start with getting the 5-week periods defined. For example (fictitious example):
Year Week
1999 23
1999 34
1999 41
2000 14
2000 25
The first record indicates that in 1999, the first five-week grouping is from week 20 through week 24. (When week=23 in the data, it is the fourth week of a five-week group.)
I you can create a data set that defines all the five-week groupings, we'll figure out a way to use it.
I misunderstood you.
I have created that now. Should I merge the new dataset with the old?
OK, let's give this a shot. This is untested code, so I hope it requires a minimum of debugging.
data want;
array addaweek {1998:2000, 53};
if _n_=1 then do until (done);
set _5_week_periods end=done;
addaweek{year, week}=1;
retain addaweek:;
end;
set have;
original_week_end = week_end;
do week_start = week_start to original_week_end by 4;
week_end = min(original_week_end, week_start + 3);
add1 = 0;
do week = week_start to week_end;
add1 + addaweek{year, week};
end;
if add1=0 then output;
else do;
week_end = min(original_week_end, week_end + 1);
output;
week_start = week_start + 1;
end;
end;
drop original_week_end add1 addaweek:;
run;
I think it works, but like I said it's untested. The idea is that the top DO loop loads the additional data set (the one that indicates the 5-week time periods) into an array. The rest of the DATA step breaks up the time periods into blocks of 4 (or 5) weeks. There are still loose ends such as what should happen when a block of 4 weeks ends in the middle of a 5-week month. But the code at least does make a decision about that.
See how close this comes to what you need. Good luck.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.