BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
hhchenfx
Barite | Level 11

Hi Everyone,

I have a FULL_data file in which each date has a number of characteristics.

I have another condition file that specifies condition for a give 2 characteristic value.

 

For any condition in condition file, I want to pull all date from FULL_data that meet this condition.

(in my real work, the condition could be 4 or 5 characteristics )

 

I don't know how to get that kind of merge.

Any help is very much appreciated.

 

HHC
The WANT file

condition_ID

condt1

value1

cond2

value2

date

1

a

4

b

5

11

1

a

4

b

5

12

2

c

1

d

2

14

 

data FULLDATA;
input date a b c d ;
datalines;
11 4 5 1 1
12 4 5 2 5
13 4 1 2 6
14 8 3 1 2
;run;

 

data condition;
input condition_ID cond1 $ value1 cond2 $ value2;
datalines;
1 a 4 b 5
2 c 1 d 2
;run;

 

*************************

Also, I want to have the WANT file for all records that meet 1 of the 2 conditions (OR), please give me a hint if it is not long.

Thank you.

 


*After getting the condition-combination file,
this code will help to merge the condition-combination file with the Full data (retunr) file

In the output file, each condition has all record that meet it. and can be use to analyze quality of condition;


data FULLDATA; 
input date a b c d ;
datalines;
11 4 5 1 1
12 4 5 2 5
13 0 1 2 2
14 8 5 1 2
20 4 5 1 2
;run;
 
data condition; 
input condition_ID cond1 $ value1 cond2 $ value2;
datalines;
1 a 4 b 5
2 c 1 d 2
;run;

******************************************************************************************
*Combination of AND condition1 and condition2;

filename tmp temp;
data _null_;
  set condition end=last_cond;
  file tmp;
	put 'if (' cond1 '=' value1 'AND ' cond2 '=' value2 ')'
	' then do; condition_id=' condition_id '; output;end;' ;
	*if last_cond=0  then put 'else';
run;

options source2;
data want_and;
  set fulldata;
  %include tmp;
run;
proc sort data=want_and; by condition_id date;run;

		*look into the temfile, the second code is like that;
		data want2;
		  set fulldata;
		if (a =4 and b =5)  then do; condition_id=1 ; output;end;
		if (c =1 and d =2)  then do; condition_id=2 ; output;end;
		run;

******************************************************************************************
*Combination of OR condition1 and condition2;

filename tmp temp;
data _null_;
  set condition end=last_cond;
  file tmp;
	put 'if (' cond1 '=' value1 'OR ' cond2 '=' value2 ')' 
	' then do; condition_id=' condition_id '; output;end;' ;
	*if last_cond=0  then put 'else';
run;


options source2;
data want_or;
  set fulldata;
  %include tmp;
run;
proc sort; by condition_id date;run;

		*look into the temfile, the second code is like that;
		data want2;
		  set fulldata;
		if (a =4 OR b =5)  then do; condition_id=1 ; output;end;
		if (c =1 OR d =2)  then do; condition_id=2 ; output;end;
		run;
		proc sort; by condition_id date;run;
1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Use  sql  "select into :macrovar" to construct the where condition from the CONDITION dataset. Then use it in a data step:

 

proc sql noprint;
   select distinct '(' || catx('=',cond1,value1) || ' and ' || catx('=',cond2,value2) || ')'
     into  :cond
     separated by ' or '
   from condition;
quit;

%put &=cond;
data want;
 set fulldata;
 where &cond ;
run;

 

And change the "AND" to "OR" in the  select distinct expression to satisfy your 2nd requirement.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

10 REPLIES 10
mkeintz
PROC Star

Use  sql  "select into :macrovar" to construct the where condition from the CONDITION dataset. Then use it in a data step:

 

proc sql noprint;
   select distinct '(' || catx('=',cond1,value1) || ' and ' || catx('=',cond2,value2) || ')'
     into  :cond
     separated by ' or '
   from condition;
quit;

%put &=cond;
data want;
 set fulldata;
 where &cond ;
run;

 

And change the "AND" to "OR" in the  select distinct expression to satisfy your 2nd requirement.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
hhchenfx
Barite | Level 11

Thank you for your help.

 

I wonder if there is any way to get the  Condition_ID in the want file?

 

HHC

hhchenfx
Barite | Level 11

Since my condition file is large, I create the below marco to run 1 condition at a time.

I wonder how I can tell SAS to do the:  DO i=1 to "END of File" in the first step.

Also, is there any way to include the condition_id from the original file into the Want file?

 

Thank you for your help.

HHC


%macro cond;
	%do i=1 %to 2;	*HOW TO MAKE IT: i=1 to END OF FILE ;
		data single_condition; set condition;
		if _N_=&i;
		run;
 
		proc sql ;
		   select distinct  '(' || catx('=',cond1,value1) || ' and ' || catx('=',cond2,value2) || ')'
		     into  :cond
		     separated by ' or '
		   from single_condition;
		quit;

		%put &=cond;
			data want_temp;
			 set fulldata;
			 where &cond ;
			 condition_id=&i; *I have to get the Condition_ID manually instead of pulling it from original file;
			run;

		%if &i=1 %then %do;
		data want; set want_temp;
		%end;
		%else %do;
		data want; set want want_temp;
		%end;
	%end;
%mend;

%cond;run;
mkeintz
PROC Star

If (1) there are too many conditions in the CONDITION datast, and (2) you want to label conditions, then you could forget the SQL approach and use a DATA _NULL_ step to write out a series of statements like

 

  if   x1=1 and y1=1 then condition_id=1;

  else   if   x2=12and y2=12then condition_id=2;

 

for use in the "data want" step.  Then a subsetting if "if condition_id>=1" gets you what you want.  There is definitely no need to iterate through the conditions, assuming they are mutually exclusive: 

 

filename tmp temp;
data _null_;
  set condition end=last_cond;
  file tmp;
  put 'if ' cond1 '=' value1 'and ' cond2 '=' value2  
      ' then condition_id=' condition_id ';' @;
  if last_cond=0  then put 'else';
run;

options source2;
data want;
  set fulldata;
  %include tmp;
  if condition_id >=1;
run;

 

Notes:

  1. The "filename tmp temp" says to assign the name "TMP" to a temporary file (to be automatically deleted at end of sas session.
  2. The FILE TMP tells SAS that the PUT statements will write to TMP in the first DATA step
  3. The "%include tmp;" statement tells sas to read statement from the TMP file  (and the "options source2" statement tells SAS to print to the log any %INCLUDEd statements, so you can see what it happening.
  4. If a record satisfies multiple conditions, only the first one will be identified in this program.  (You loop program would identify them all - and would also duplicate the corresponding records from FULLDATA.
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
hhchenfx
Barite | Level 11
I want to thank you first for details explanation.
I cannot comprehend it as I am on call right now.
I will study it.
HHC
hhchenfx
Barite | Level 11

Hi MKeintz,

The current code eliminates duplicate record from Fulldata.
I actually want to have that duplicate for each condition it satisfies.

In the below data, I add the last record (15) and it should be in both condition.

I am not sure how to change your code to get it.

Can you help me?

Thank you,

HCC


data condition;
input condition_ID cond1 :$1. value1 cond2 :$1. value2;
datalines;
1 a 4 b 5
2 c 1 d 2
;run;
data FULLDATA;
input date a b c d ;
datalines;
11 4 5 1 1
12 4 5 2 5
13 4 1 2 6
14 8 3 1 2
15 4 5 1 2
;run;

mkeintz
PROC Star

To get such "duplicates" change these two lines in the first data step:

 

put 'if ' cond1 '=' value1 'and ' cond2 '=' value2

   ' then condition_id=' condition_id ';' @;

if last_cond=0 then put 'else';

 

to this line:

 

put 'if ' cond1 '=' value1 'and ' cond2 '=' value2

' then do; condition_id=' condition_id '; output;end;' ;

 

 

And in the second data step, remove this line:

 

if condition_id >=1;

 

Result: if a single record satisfies multiple conditions, it will be output multiple times. 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
hhchenfx
Barite | Level 11

It works now, Mkeintz!!!

 

it is a powerful approach, I don't know that kind of method before.

 

I can read the tmp file to see what the "if then" command.

 

So basically, the %include tmp will turn into this 


data want;
  set fulldata;
if a =4 and b =5  then condition_id=1 ;if c =1 and d =2  then condition_id=2 ;
run;

But where the Merge happen? I do not see something like that. Where it is built in your code?


data want1;
  set fulldata;
if  a =4 and b =5 ;
run;

data want2;
  set fulldata;
if  c =1 and d =2  ;
run;

data want; set want1 want2;run;
mkeintz
PROC Star

xxx

 


@hhchenfx wrote:

It works now, Mkeintz!!!

 

it is a powerful approach, I don't know that kind of method before.

 

I can read the tmp file to see what the "if then" command.

 

So basically, the %include tmp will turn into this 


data want;
  set fulldata;
if a =4 and b =5  then condition_id=1 ;if c =1 and d =2  then condition_id=2 ;
run;

But where the Merge happen? I do not see something like that. Where it is built in your code?


data want1;
  set fulldata;
if  a =4 and b =5 ;
run;

data want2;
  set fulldata;
if  c =1 and d =2  ;
run;

data want; set want1 want2;run;

 

 

No.  Itr will turn into:

 

data want;

  set fulldata;

  if a=4 and b=5 then do; condition_id=1; output; end;

  if c=1 and d=2 then do; condition_id=2; output; end;

 run;

 

Notice there is an OUTPUT statement for every successful IF test.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
hhchenfx
Barite | Level 11

that's output command is so good.

Thanks,

HHC

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 1435 views
  • 0 likes
  • 2 in conversation