BookmarkSubscribeRSS Feed
Siva_Harish
Obsidian | Level 7

Hi,

I have two datasets one is raw_data and another is conditions dataset where conditions are to be applied on the raw data 

data condtns;
input F_st_dt date9. F_end_dt date9. fn$ 2. fc$9. fv$10.;format F_st_dt date9.;
datalines4;
01jan2023 31dec2023 CN in       'test'
01jan2024 01nov2024 CN not in   'devlop'
01jan2024 01dec2024 CN like     'validate'
01jan2025 31dec2025 CN not like 'check'
;;;;
run;
data raw_data;
input st_dt date9.  cn$10. ;format st_dt date9.;
datalines4;
01jan2023 test
01jan2024 test
01jan2024 devlop
01jan2024 check
01jan2024 validate
01jan2025 validate
;;;;
run;
proc sql;
		   create table s1 as
		       select a.*,b.* from  raw_data a , condtns b
			   where a.st_dt  <=  b.F_end_dt 			       			   
			   and a.st_dt >=  b.F_st_dt 			       			   
		;
		quit;

In the conditions dataset, the column name is specified under 'fn', which needs to be applied to the raw data. The 'cn' column is also present in the raw data. Based on the fc(field criteria) and fv(field value), I need to create a flag in the raw dataset indicating whether the condition is satisfied then 'Y'. If the condition is not satisfied, it should be marked as 'N'.

Note: Above is the sample data and dates can be changed to test as required

 

Any help would be greatly appreciated!

 

 

10 REPLIES 10
Tom
Super User Tom
Super User

Use the dataset with the metadata about the conditions to generate CODE.  

It looks like you want to generate code so you can run a step like this to create 4 new Y/N variables.

 

data want;
  set raw_data;
test1 =ifc(('01JAN2023'd <= st_dt <= '31DEC2023'd)
  and (CN in ('test') ),"Y","N");
test2 =ifc(('01JAN2024'd <= st_dt <= '01NOV2024'd)
  and (CN not in ('devlop') ),"Y","N");
test3 =ifc(('01JAN2024'd <= st_dt <= '01DEC2024'd)
  and (CN = 'validate' ),"Y","N");
test4 =ifc(('01JAN2025'd <= st_dt <= '31DEC2025'd)
  and (CN ne 'check' ),"Y","N");
run;

 

So just make a temporary file with the code and use %INCLUDE to make it part of the data step.

Example:

 

Spoiler
data tests;
  infile datalines truncover ;
  input F_st_dt :date. F_end_dt :date. test $50. ;
  format F_st_dt F_end_dt date9.;
datalines4;
01jan2023 31dec2023 CN in ('test')
01jan2024 01nov2024 CN not in ('devlop')
01jan2024 01dec2024 CN = 'validate'
01jan2025 31dec2025 CN ne 'check'
;;;;
options ls=132;
filename code temp;
data _null_;
  set tests;
  file code;
  put "test" _n_ "=ifc(('" f_st_dt date9. "'d <= st_dt <= '" f_end_dt date9. "'d)"
    / "  and (" test '),"Y","N");' ;
run;

data raw_data;
input st_dt date9.  cn$10. ;format st_dt date9.;
datalines4;
01jan2023 test
01jan2024 test
01jan2024 devlop
01jan2024 check
01jan2024 validate
01jan2025 validate
;;;;

data want;
  set raw_data;
  %include code/ source2;
run;

proc print;
run;

Results:

Tom_0-1742585736471.png

 

 

 

Note if you are going to generate SAS code (as opposed to SQL code) you cannot use LIKE as one your tests. (or if you already have that in your metadata you will have to make the translation when generating the code from the metadata).

 

 

Ksharp
Super User

I think the condition you wrote is not right. 

And you want this ?

data condtns;
input F_st_dt :date9. F_end_dt :date9. fn :$  fc$9. fv$18.;format F_st_dt F_end_dt date9.;
datalines4;
01jan2023 31dec2023 CN in       ('test')
01jan2024 01nov2024 CN not in   ('devlop')
01jan2024 01dec2024 CN like     '%validate%'
01jan2025 31dec2025 CN not like '%check%'
;;;;
run;
data raw_data;
input st_dt date9.  cn$10. ;format st_dt date9.;
datalines4;
01jan2023 test
01jan2024 test
01jan2024 devlop
01jan2024 check
01jan2024 validate
01jan2025 validate
;;;;
run;
proc sql;
		   create table s1 as
		       select a.*,b.* from  raw_data a , condtns b
			   where a.st_dt between  b.F_st_dt and  b.F_end_dt	   
		;
		quit;
data s1;
 set s1;
 n+1;
run;
filename x temp;
data _null_;
 set s1 end=last;
 file x;
 if _n_=1 then put 'proc sql; create table want as select *,case ';
 condition=catx(' ',fn,fc,fv);
 put 'when n=' n  ' and ' condition ' then "Y"';
 if last then put 'else "N" end as flag  from s1;quit;';
run;
 
 %include x/source;

Siva_Harish
Obsidian | Level 7

Hi @Ksharp ,

Thank you for sharing the code. I have been trying different ways to implement this for a long time, and I really appreciate your help.

I have made a small tweak to the code as " condition=catx(' ',field_name,field_criteria,'('||field_value||')') " since business will not populate braces for any of the condition but they do mentione '%' based on the condition in conditions dataset. However, for one of the time period(1stJan2022-31stDec2023) in conditions dataset, we have nulls in these columns(FN,FV,FC)  and it is throwing an error due to that.  Could you help me resolving this?

 

when n=5  and (                     )  then "Y"
        ----
        180

ERROR 180-322: Statement is not valid or it is used out of proper order.

I believe this works for all logical, comparison and arithmetic operators and not only for any specific operator. Please let me know if my understanding is correct.

Thank you very much again!

Ksharp
Super User

The key point here is to form the right expression by using FN,FV,FC , if that doesn't lead to the right one, you can adjust it manually by using IF statement.

 

data condtns;
input F_st_dt :date9. F_end_dt :date9. fn :$  fc$9. fv$18.;format F_st_dt F_end_dt date9.;
datalines4;
01jan2023 31dec2023 CN in       'test'
01jan2024 01nov2024 CN not in   'devlop'
01jan2024 01dec2024 CN like     'validate'
01jan2025 31dec2025 CN not like 'check'
;;;;
run;

data raw_data;
input st_dt date9.  cn$10. ;format st_dt date9.;
datalines4;
01jan2023 test
01jan2024 test
01jan2024 devlop
01jan2024 check
01jan2024 validate
01jan2025 validate
;;;;
run;

data condtns;
 length fv $ 200;
 set condtns;
 if cmiss(fn,fc,fv) then delete;
 fv=cats('(',fv,')');
run;

proc sql;
		   create table s1 as
		       select a.*,b.* from  raw_data a , condtns b
			   where a.st_dt between  b.F_st_dt and  b.F_end_dt	   
		;
		quit;
data s1;
 set s1;
 n+1;
run;
filename x temp;
data _null_;
 set s1 end=last;
 file x;
 if _n_=1 then put 'proc sql; create table want as select *,case ';
 condition=catx(' ',fn,fc,fv);
 put 'when n=' n  ' and ' condition ' then "Y"';
 if last then put 'else "N" end as flag  from s1;quit;';
run;
 
 %include x/source;

 

 

Siva_Harish
Obsidian | Level 7

Hi @Ksharp ,

Thanks for your inputs!

I have created two datasets to store nulls and non nulls and processing the above piece of code on non nulls dataset however I am receiving frequent insufficient memory error as my dataset(s) contains more than 4Lakhs records, is there any way to store fileref 'X' in library instead of in temp location, I tried multiple options but it did not work. Could you help if there is any option to optimize this to apply on huge data

Tom
Super User Tom
Super User

Not sure what a Lakhs is, but which dataset are you talking about?

Are you saying the RULES dataset is large?  Or the actual DATA that you want to apply the rules against?

 

Note that writing the code into a temporary work catalog instead of a temporary work file will not actually use any less temporary disk space.

Siva_Harish
Obsidian | Level 7

@Tom I have 0.5 millions of records in my raw dataset where I need to apply the conditions, so while applying the conditions to that raw dataset it is throwing insufficient memory error

Tom
Super User Tom
Super User

@Siva_Harish wrote:

@Tom I have 0.5 millions of records in my raw dataset where I need to apply the conditions, so while applying the conditions to that raw dataset it is throwing insufficient memory error


That is not a large number.  So unless you have many extremely long character variables it should not cause any issues with memory or disk space.

 

Are you generating a DATA step to apply the conditions?  A data step reading in and writing out 500K observations should run in just a few seconds.

 

Or did you generate SQL code?  If so then does it involve any joins?  SQL joins if not done properly could end up trying to perform a cartesian product which multiple the number of observations by a large amount.

 

Perhaps share a simple example of the code you ended up generating.

Siva_Harish
Obsidian | Level 7

I am using below step as shared by @Ksharp  in the latest thread

 

data s1;
 set s1;
 n+1;
run;
filename x temp;
data _null_;
 set s1 end=last;
 file x;
 if _n_=1 then put 'proc sql; create table want as select *,case ';
 condition=catx(' ',fn,fc,fv);
 put 'when n=' n  ' and ' condition ' then "Y"';
 if last then put 'else "N" end as flag  from s1;quit;';
run;
Ksharp
Super User

"I am receiving frequent insufficient memory error as my dataset(s)"

Yes. the code generated by sas is way too huge, you can't running/compile it under limited memory.

 

 

"is there any way to store fileref 'X' in library instead of in temp location,"

I don't understand, it is just a plain text file. you can store it at another path like:

 

filename x temp;
------>
filename x "c:\temp\code.sas";

 

 

 

"Could you help if there is any option to optimize this to apply on huge data"

You could split this big table into many small tables and walk through my code one by one by building macro.

And you could remove RESOURCE option to avoid to print the code in LOG.

Here is an example:

 

data condtns;
input F_st_dt :date9. F_end_dt :date9. fn :$  fc$9. fv$18.;format F_st_dt F_end_dt date9.;
datalines4;
01jan2023 31dec2023 CN in       'test'
01jan2024 01nov2024 CN not in   'devlop'
01jan2024 01dec2024 CN like     'validate'
01jan2025 31dec2025 CN not like 'check'
;;;;
run;

data raw_data;
input st_dt date9.  cn$10. ;format st_dt date9.;
datalines4;
01jan2023 test
01jan2024 test
01jan2024 devlop
01jan2024 check
01jan2024 validate
01jan2025 validate
;;;;
run;

data condtns;
 length fv $ 200;
 set condtns;
 if cmiss(fn,fc,fv) then delete;
 fv=cats('(',fv,')');
run;

proc sql;
		   create table s1 as
		       select a.*,b.* from  raw_data a , condtns b
			   where a.st_dt between  b.F_st_dt and  b.F_end_dt	   
		;
		quit;
data s1;
 set s1;
 n+1;
run;





%let n=10 ;  *split dataset S1 into 10 sub-tables;

%macro go_through;
proc surveyselect data=s1 out=_s1 groups=&n. seed=123;
run;
filename x temp;
%do i=1 %to &n.;
data x;
 set _s1(where=(GroupID=&i.));
run;
data _null_;
 set x end=last;
 file x;
 if _n_=1 then put "proc sql; create table _want&i. as select *,case ";
 condition=catx(' ',fn,fc,fv);
 put 'when n=' n  ' and ' condition ' then "Y"';
 if last then put 'else "N" end as flag  from x;quit;';
run;
 %include x;
%end;

data want;
 set _want:;
run;
/*proc sort data=want;by n;run;*/
%mend;
%go_through

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 1500 views
  • 1 like
  • 3 in conversation