I am stuck on how to create a new dataset based on a few variables I have now.
What I have now is this:
data have;
input ID DATE :mmddyy10. PERIOD_01 PERIOD_02 PERIOD_03;
FORMAT ID z14.
DATE mmddyy10.;
datalines;
01133244410000 01/03/2000 0 0 2
02019220960000 02/05/2000 0 2 2
07167297020000 03/15/2000 2 2 2
17067210480000 04/02/2000 0 0 0
;
run;
I want the dataset structure below where DAY_01 = January 1, 2000 and DAY_N = December 31, 2015.
And I want to incorporate two levels of coding:
Level One:
Level Two:
data want;
input ID DAY_01 DAY_02 DAY_03 DAY_04 DAY_N;
FORMAT ID z14.;
datalines;
01133244410000 0 1 1 2 2
02019220960000 0 0 1 2 2
07167297020000 1 1 2 2 0
17067210480000 1 2 0 0 0
;
run;
Any help would be really appreciated!
This does the calculations in a few steps of code, creating a long data set. The similar wide data set would take lots more coding.
data have;
input ID $14. +1 DATE :mmddyy10. ;
FORMAT DATE mmddyy10.;
datalines;
01133244410000 01/03/2000
02019220960000 02/05/2000
07167297020000 03/15/2000
17067210480000 04/02/2000
;
data want;
set have;
do all_dates='01JAN2000'd to '31DEC2015'd;
if date>all_dates then indicator=2;
else if date=all_dates then indicator=1;
else if date<all_dates then indicator=0;
output;
end;
format all_dates mmddyyd10.;
run;
I want the dataset structure below where DAY_01 = January 1, 2000 and DAY_N = December 31, 2015.
This is a truly horrible data structure, and one that will be a nightmare to program. Not only will programming be much easier if you leave the data in a long data set (rather than the proposed long data set), and then all of your calculations become simple. Maybe it works in Excel, but it doesn't work in SAS.
This does the calculations in a few steps of code, creating a long data set. The similar wide data set would take lots more coding.
data have;
input ID $14. +1 DATE :mmddyy10. ;
FORMAT DATE mmddyy10.;
datalines;
01133244410000 01/03/2000
02019220960000 02/05/2000
07167297020000 03/15/2000
17067210480000 04/02/2000
;
data want;
set have;
do all_dates='01JAN2000'd to '31DEC2015'd;
if date>all_dates then indicator=2;
else if date=all_dates then indicator=1;
else if date<all_dates then indicator=0;
output;
end;
format all_dates mmddyyd10.;
run;
Hi Paige,
I see what you're saying about the benefits of a long vs wide dataset. Thank you for the suggestion!
I have reorganized my dataset, and was hoping you could help with my last hurdle.
I want all WELL_API per ID where PHASE = 1 or 2 for any day between EXPOSURE_START and EXPOSURE_END.
I'm stuck on how to do it with a long dataset.
data have;
input WELL_API $14. ID $11. +4 DAY :mmddyy10. EXPOSURE_START :mmddyy10. EXPOSURE_END :mmddyy10. PHASE $1.;
format DAY mmddyy10.
EXPOSURE_START mmddyy10.
EXPOSURE_END mmddyy10.;
datalines;
01133244410000 B1392272014 01/01/2010 01/01/2000 12/23/2014 0
01133244410000 B1392272014 01/02/2010 01/01/2000 12/23/2014 1
01133244410000 B1392272014 01/03/2010 01/01/2000 12/23/2014 2
01133244410000 B1392272014 01/04/2010 01/01/2000 12/23/2014 3
01133244410000 B1392272014 01/05/2010 01/01/2000 12/23/2014 0
;
run;
I want all WELL_API per ID where PHASE = 1 or 2 for any day between EXPOSURE_START and EXPOSURE_END.
I don't know what this means. Show me the output you want.
Given this:
data have;
input WELL_API $ ID $ DAY :mmddyy10. EXPOSURE_START :mmddyy10. EXPOSURE_END :mmddyy10. PHASE :best.;
format DAY mmddyy10.
EXPOSURE_START mmddyy10.
EXPOSURE_END mmddyy10.;
datalines;
01133244410000 B1392272014 01/01/2010 01/01/2000 01/05/2000 0
01133244410000 B1392272014 01/02/2010 01/01/2000 01/05/2000 1
01133244410000 B1392272014 01/03/2010 01/01/2000 01/05/2000 2
01133244410000 B1392272014 01/04/2010 01/01/2000 01/05/2000 3
01133244410000 B1392272014 01/05/2010 01/01/2000 01/05/2000 2
01133244410000 B1392272014 01/06/2010 01/01/2000 01/05/2000 2
01133244410000 B1392272014 01/07/2010 01/01/2000 01/05/2000 2
01133244410000 B1392272014 01/08/2010 01/01/2000 01/05/2000 2
01133244410000 B1392272014 01/09/2010 01/01/2000 01/05/2000 2
01133244410000 B1392272014 01/10/2010 01/01/2000 01/05/2000 2
;
run;
I want this:
data want;
input WELL_API $ ID $ DAY :mmddyy10. EXPOSURE_START :mmddyy10. EXPOSURE_END :mmddyy10. PHASE best.;
format DAY mmddyy10.
EXPOSURE_START mmddyy10.
EXPOSURE_END mmddyy10.;
datalines;
01133244410000 B1392272014 01/03/2010 01/01/2000 01/05/2000 2
01133244410000 B1392272014 01/04/2010 01/01/2000 01/05/2000 3
01133244410000 B1392272014 01/05/2010 01/01/2000 01/05/2000 2
;
run;
I have assumed that the column DAY is actually days in year 2000 instead of days in year 2010. Otherwise your output data set does not agree with your words.
data want;
set have;
where phase in (2,3) and exposure_start<=day<=exposure_end;
run;
The day variable refers to days, I just truncated it to the first 5 days in 2000.
The code is not working for me.
Please do not ever say "the code is not working for me" and then provide no additional information. We cannot help you if all you say is "the code is not working for me".
Show us the log, by pasting the log as text into the window that appears when you click on </>. Show us the ENTIRE log of the DATA step, do not chop anything out.
You also need to address how day can be in year 2010 but somehow winds up in the output.
Ahh I see, there was a typo in the day variable. I wanted the first 5 days in 2000, but I put the first 5 days in 2010.
Everything works great now, thank you!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.