BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
JJ_83
Obsidian | Level 7

I am stuck on how to create a new dataset based on a few variables I have now. 

What I have now is this:

have.JPG

 

data have;
	input ID DATE :mmddyy10. PERIOD_01 PERIOD_02 PERIOD_03;
	FORMAT	ID z14.
			DATE mmddyy10.;
	datalines;
		01133244410000	01/03/2000	0	0	2
		02019220960000 	02/05/2000	0	2	2
		07167297020000  03/15/2000	2	2	2
		17067210480000  04/02/2000	0	0	0
	;
run;

 

I want the dataset structure below where DAY_01 = January 1, 2000 and DAY_N = December 31, 2015.

And I want to incorporate two levels of coding:

Level One:

  • DAY_01 - DAY_31 = PERIOD_01
  • DAY_32 - DAY_60 = PERIOD_02
  • DAY_61 - DAY_91 = PERIOD_03

Level Two:

  • All days prior to DATE = 0
  • The day corresponding to DATE = 1
  • All days after DATE = 1 until the first day that = 2

want.JPG

data want;
	input ID DAY_01 DAY_02 DAY_03 DAY_04 DAY_N;
	FORMAT ID z14.;
	datalines;
		01133244410000 0 1 1 2 2
		02019220960000 0 0 1 2 2
		07167297020000 1 1 2 2 0
		17067210480000 1 2 0 0 0
	;
run;

Any help would be really appreciated!

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

This does the calculations in a few steps of code, creating a long data set. The similar wide data set would take lots more coding.

 

data have;
	input ID $14. +1 DATE :mmddyy10. ;
	FORMAT DATE mmddyy10.;
	datalines;
01133244410000 01/03/2000
02019220960000 02/05/2000 
07167297020000 03/15/2000
17067210480000 04/02/2000
;

data want;
    set have;
    do all_dates='01JAN2000'd to '31DEC2015'd;
        if date>all_dates then indicator=2;
        else if date=all_dates then indicator=1;
        else if date<all_dates then indicator=0;
    	output;
	end;
	format all_dates mmddyyd10.;
run;
--
Paige Miller

View solution in original post

9 REPLIES 9
PaigeMiller
Diamond | Level 26

I want the dataset structure below where DAY_01 = January 1, 2000 and DAY_N = December 31, 2015.

This is a truly horrible data structure, and one that will be a nightmare to program. Not only will programming be much easier if you leave the data in a long data set (rather than the proposed long data set), and then all of your calculations become simple. Maybe it works in Excel, but it doesn't work in SAS.

 

 

--
Paige Miller
PaigeMiller
Diamond | Level 26

This does the calculations in a few steps of code, creating a long data set. The similar wide data set would take lots more coding.

 

data have;
	input ID $14. +1 DATE :mmddyy10. ;
	FORMAT DATE mmddyy10.;
	datalines;
01133244410000 01/03/2000
02019220960000 02/05/2000 
07167297020000 03/15/2000
17067210480000 04/02/2000
;

data want;
    set have;
    do all_dates='01JAN2000'd to '31DEC2015'd;
        if date>all_dates then indicator=2;
        else if date=all_dates then indicator=1;
        else if date<all_dates then indicator=0;
    	output;
	end;
	format all_dates mmddyyd10.;
run;
--
Paige Miller
JJ_83
Obsidian | Level 7

Hi Paige, 

 

I see what you're saying about the benefits of a long vs wide dataset. Thank you for the suggestion!

 

I have reorganized my dataset, and was hoping you could help with my last hurdle.

 

I want all WELL_API per ID where PHASE = 1 or 2 for any day between EXPOSURE_START and EXPOSURE_END. 

 

I'm stuck on how to do it with a long dataset. 

 

data have;
	input WELL_API $14. ID $11. +4 DAY :mmddyy10. EXPOSURE_START :mmddyy10. EXPOSURE_END :mmddyy10. PHASE $1.;
	format	DAY mmddyy10.
			EXPOSURE_START mmddyy10.
			EXPOSURE_END mmddyy10.;
	datalines;
		01133244410000 B1392272014 01/01/2010 01/01/2000 12/23/2014 0
		01133244410000 B1392272014 01/02/2010 01/01/2000 12/23/2014 1
		01133244410000 B1392272014 01/03/2010 01/01/2000 12/23/2014 2
		01133244410000 B1392272014 01/04/2010 01/01/2000 12/23/2014 3
		01133244410000 B1392272014 01/05/2010 01/01/2000 12/23/2014 0
	;
run;
PaigeMiller
Diamond | Level 26

I want all WELL_API per ID where PHASE = 1 or 2 for any day between EXPOSURE_START and EXPOSURE_END.

I don't know what this means. Show me the output you want.

--
Paige Miller
JJ_83
Obsidian | Level 7

Given this:

have.JPG

data have;
	input WELL_API $ ID $ DAY :mmddyy10. EXPOSURE_START :mmddyy10. EXPOSURE_END :mmddyy10. PHASE :best.;
	format	DAY mmddyy10.
			EXPOSURE_START mmddyy10.
			EXPOSURE_END mmddyy10.;
	datalines;
		01133244410000 B1392272014 01/01/2010 01/01/2000 01/05/2000 0
		01133244410000 B1392272014 01/02/2010 01/01/2000 01/05/2000 1
		01133244410000 B1392272014 01/03/2010 01/01/2000 01/05/2000 2
		01133244410000 B1392272014 01/04/2010 01/01/2000 01/05/2000 3
		01133244410000 B1392272014 01/05/2010 01/01/2000 01/05/2000 2
		01133244410000 B1392272014 01/06/2010 01/01/2000 01/05/2000 2
		01133244410000 B1392272014 01/07/2010 01/01/2000 01/05/2000 2
		01133244410000 B1392272014 01/08/2010 01/01/2000 01/05/2000 2
		01133244410000 B1392272014 01/09/2010 01/01/2000 01/05/2000 2
		01133244410000 B1392272014 01/10/2010 01/01/2000 01/05/2000 2
	;
run;

I want this:

  • All entries where
    • Phase = (2 or 3)
    • Exposure_start <= Day <= Exposure end

want.JPG

data want;
	input WELL_API $ ID $ DAY :mmddyy10. EXPOSURE_START :mmddyy10. EXPOSURE_END :mmddyy10. PHASE best.;
	format	DAY mmddyy10.
			EXPOSURE_START mmddyy10.
			EXPOSURE_END mmddyy10.;
	datalines;
		01133244410000 B1392272014 01/03/2010 01/01/2000 01/05/2000 2
		01133244410000 B1392272014 01/04/2010 01/01/2000 01/05/2000 3
		01133244410000 B1392272014 01/05/2010 01/01/2000 01/05/2000 2
	;
run;
PaigeMiller
Diamond | Level 26

I have assumed that the column DAY is actually days in year 2000 instead of days in year 2010. Otherwise your output data set does not agree with your words.

 

data want;
    set have;
    where phase in (2,3) and exposure_start<=day<=exposure_end;    
run;

 

--
Paige Miller
JJ_83
Obsidian | Level 7

The day variable refers to days, I just truncated it to the first 5 days in 2000. 

 

The code is not working for me. 

PaigeMiller
Diamond | Level 26

Please do not ever say "the code is not working for me" and then provide no additional information. We cannot help you if all you say is "the code is not working for me".

 

Show us the log, by pasting the log as text into the window that appears when you click on </>. Show us the ENTIRE log of the DATA step, do not chop anything out.

 

You also need to address how day can be in year 2010 but somehow winds up in the output.

--
Paige Miller
JJ_83
Obsidian | Level 7

Ahh I see, there was a typo in the day variable. I wanted the first 5 days in 2000, but I put the first 5 days in 2010. 

 

Everything works great now, thank you!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 1483 views
  • 2 likes
  • 2 in conversation