DATA Step, Macro, Functions and more

Identifying minimums and creating a new dataset

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 16
Accepted Solution

Identifying minimums and creating a new dataset

Sorry to bother, but if anyone has a suggestion I'd be very grateful!  I know I've recieved a lot of help here before, and I'm hoping someone can let me know what's happening here.

 

I have a dataset that looks like the following

 

ID, MonthYear_r2, ODED_date1, ODHD_date1, ODOO_date1

1, 1, , , 

1, 2, , ,

1, 3, 3060, ,

1, 4, , , 

2, 1, , 3040, 

2, 2, ,

3, 1, , 2945, 2965

3, 2, , , 

 

If this makes sense.  There's an ID, months, and then numbers representing the date of an event.  I'm looking to identify the maximum number across the three event columns (ODED_date1, ODHD_date1, and ODOO_date1) by making an indicator variable.  Then use this indicator to select the following 24 months after this event to keep in a dataset. 

 

Does anyone have an idea where this code might be having a hiccup?  I'm recieving an "expression using equals has components that are of different data types" as response in my log.  Or does anyone have a better idea on how to achieve my result?

 

Thank you so much in advance!

 

 

PROC SQL;
	CREATE TABLE test4 as
	SELECT *,
	case
		when (ODED_date1 eq max(of ODED_date1 ODHD_date1 ODOO_date1)) OR (ODHD_date1 eq max(of ODED_date1 ODHD_date1 ODOO_date1)) OR (ODOO_date1 eq max(of ODED_date1 ODHD_date1 ODOO_date1)) then 1
		else 0
	end as max_OD_date1
		from test3
			group by ID
	;
QUIT;


** keeps only observations 24 months after the initial event **;
DATA test5;
	SET test4;
	BY ID;
	IF FIRST.ID THEN COUNTER=0;
	IF COUNTER EQ 0 and max_OD_date1 EQ 1 THEN DO;
		COUNTER=1;
		OUTPUT;
	END;
	ELSE IF 1<=COUNTER<=24 THEN DO;
		COUNTER+1;
		OUTPUT;
	END;
RUN;

 


Accepted Solutions
Solution
‎04-15-2017 03:49 PM
PROC Star
Posts: 7,356

Re: Identifying minimums and creating a new dataset

Then I think that something like the following does what you want:

 

data test3;
  infile cards dlm=',';
  input ID MonthYear_r2  ODED_date1  ODHD_date1  ODOO_date1;
  cards;
1, 1, , , 
1, 2, , ,
1, 3, 3060, ,
1, 4, , , 
2, 1, , 3040, 
2, 2, ,
3, 1, , 2945, 2965
3, 2, , ,
;

DATA test5 (drop=max counter);
  do until (last.id);
	SET test3;
	BY ID;
	if first.id then max=max(of ODED_date1  ODHD_date1  ODOO_date1);
	else max=max(of max ODED_date1  ODHD_date1  ODOO_date1);
  end;
	
  do until (last.id);
    SET test3;
    BY ID;
    IF FIRST.ID THEN COUNTER=0;
    IF COUNTER eq 0 and max=max(of ODED_date1  ODHD_date1  ODOO_date1) THEN DO;
      COUNTER=1;
      OUTPUT;
    END;
    ELSE IF 1<=COUNTER<=24 THEN DO;
      COUNTER+1;
      OUTPUT;
    END;
  end;
RUN;

Art, CEO, AnalystFinder.com

 

View solution in original post


All Replies
Super User
Posts: 10,466

Re: Identifying minimums and creating a new dataset

Please post the log with the code and error message into a code box opened with the forum {i} icon. The error message may provide details of where the error is occuring and posting in the code box reduces the likelihood of hte forum reformatting the text and removing some of the diagnostic information.

 

A likely issue revolves around this type of code:

max(of ODED_date1 ODHD_date1 ODOO_date1)

I would expect an error message underlining the first variable as the MAX function in SQL is not the same as the max function in a data step that uses "of".

Occasional Contributor
Posts: 16

Re: Identifying minimums and creating a new dataset

Hi! Sorry- I'm not sure why this didn't post my reply yesterday... but I think you're right about the "of" statement in the maximum, do you have any ideas how to get around this?

 

85
86
87   PROC SQL;
88       CREATE TABLE test4 as
89       SELECT *,
90       case
91           when (ODED_date1 eq max(of ODED_date1 ODHD_date1 ODOO_date1)) OR (ODHD_date1 eq
                                        ----------
----------
                                        22
22
                                        202
202
91 ! max(of ODED_date1 ODHD_date1 ODOO_date1)) OR (ODOO_date1 eq max(of ODED_date1 ODHD_date1
91 ! ODOO_date1)) then 1
91           when (ODED_date1 eq max(of ODED_date1 ODHD_date1 ODOO_date1)) OR (ODHD_date1 eq
91 ! max(of ODED_date1 ODHD_date1 ODOO_date1)) OR (ODOO_date1 eq max(of ODED_date1 ODHD_date1
                                                                        ----------
                                                                        22
91 ! ODOO_date1)) then 1
ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, (, ), *, **, +, ',', -,
              '.', /, <, <=, <>, =, >, >=, ?, AND, BETWEEN, CONTAINS, EQ, EQT, GE, GET, GT,
              GTT, IN, IS, LE, LET, LIKE, LT, LTT, NE, NET, NOT, NOTIN, OR, ^, ^=, |, ||, ~,
              ~=.

ERROR 202-322: The option or parameter is not recognized and will be ignored.

91           when (ODED_date1 eq max(of ODED_date1 ODHD_date1 ODOO_date1)) OR (ODHD_date1 eq
91 ! max(of ODED_date1 ODHD_date1 ODOO_date1)) OR (ODOO_date1 eq max(of ODED_date1 ODHD_date1
                                                                        ----------
                                                                        202
91 ! ODOO_date1)) then 1
ERROR 202-322: The option or parameter is not recognized and will be ignored.

92           else 0
93       end as max_OD_date1
94           from test3
95               group by ID
96       ;
97   QUIT;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.15 seconds
      cpu time            0.07 seconds

98
99
100  ** keeps only observations 24 months after the initial event **;


101  DATA test5;
102      SET test4;
ERROR: File WORK.TEST4.DATA does not exist.
103      BY ID;
104      IF FIRST.ID THEN COUNTER=0;
105      IF COUNTER EQ 0 and max_OD_date1 EQ 1 THEN DO;
106          COUNTER=1;
107          OUTPUT;
108      END;
109      ELSE IF 1<=COUNTER<=24 THEN DO;
110          COUNTER+1;
111          OUTPUT;
112      END;
113  RUN;

NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.TEST5 may be incomplete.  When this step was stopped there were 0
         observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.04 seconds
      cpu time            0.01 seconds


PROC Star
Posts: 7,356

Re: Identifying minimums and creating a new dataset

Are you simply trying to (1)identify the record (within an ID) that highest date value for that ID (regardless of whether that value is in ODED_date1, ODHD_date1, or ODOO_date1) and then (2) output that record the up to 24 records that follow it?

 

Art, CEO, AnalystFinder.com

 

Occasional Contributor
Posts: 16

Re: Identifying minimums and creating a new dataset

I am! Sorry for asking so many questions about it- I thought I could figure it out piecewise but I feel I'm making it more complicated.

Solution
‎04-15-2017 03:49 PM
PROC Star
Posts: 7,356

Re: Identifying minimums and creating a new dataset

Then I think that something like the following does what you want:

 

data test3;
  infile cards dlm=',';
  input ID MonthYear_r2  ODED_date1  ODHD_date1  ODOO_date1;
  cards;
1, 1, , , 
1, 2, , ,
1, 3, 3060, ,
1, 4, , , 
2, 1, , 3040, 
2, 2, ,
3, 1, , 2945, 2965
3, 2, , ,
;

DATA test5 (drop=max counter);
  do until (last.id);
	SET test3;
	BY ID;
	if first.id then max=max(of ODED_date1  ODHD_date1  ODOO_date1);
	else max=max(of max ODED_date1  ODHD_date1  ODOO_date1);
  end;
	
  do until (last.id);
    SET test3;
    BY ID;
    IF FIRST.ID THEN COUNTER=0;
    IF COUNTER eq 0 and max=max(of ODED_date1  ODHD_date1  ODOO_date1) THEN DO;
      COUNTER=1;
      OUTPUT;
    END;
    ELSE IF 1<=COUNTER<=24 THEN DO;
      COUNTER+1;
      OUTPUT;
    END;
  end;
RUN;

Art, CEO, AnalystFinder.com

 

Occasional Contributor
Posts: 16

Re: Identifying minimums and creating a new dataset

Wow, I would have never thought to run a do step until a last ID.  That worked perfectly!

 

Seriously, thank you so much!! I've spent the last couple days trying to figure this out.

PROC Star
Posts: 7,356

Re: Identifying minimums and creating a new dataset

It's known as a double DOW loop and seemed like a logical way to accomplish what you were trying to do.

 

Art, CEO, AnalystFinder.com

 

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 176 views
  • 3 likes
  • 3 in conversation