Hi,
I have two datasets one is raw_data and another is conditions dataset where conditions are to be applied on the raw data
data condtns; input F_st_dt date9. F_end_dt date9. fn$ 2. fc$9. fv$10.;format F_st_dt date9.; datalines4; 01jan2023 31dec2023 CN in 'test' 01jan2024 01nov2024 CN not in 'devlop' 01jan2024 01dec2024 CN like 'validate' 01jan2025 31dec2025 CN not like 'check' ;;;; run; data raw_data; input st_dt date9. cn$10. ;format st_dt date9.; datalines4; 01jan2023 test 01jan2024 test 01jan2024 devlop 01jan2024 check 01jan2024 validate 01jan2025 validate ;;;; run; proc sql; create table s1 as select a.*,b.* from raw_data a , condtns b where a.st_dt <= b.F_end_dt and a.st_dt >= b.F_st_dt ; quit;
In the conditions dataset, the column name is specified under 'fn', which needs to be applied to the raw data. The 'cn' column is also present in the raw data. Based on the fc(field criteria) and fv(field value), I need to create a flag in the raw dataset indicating whether the condition is satisfied then 'Y'. If the condition is not satisfied, it should be marked as 'N'.
Note: Above is the sample data and dates can be changed to test as required
Any help would be greatly appreciated!
Use the dataset with the metadata about the conditions to generate CODE.
It looks like you want to generate code so you can run a step like this to create 4 new Y/N variables.
data want;
set raw_data;
test1 =ifc(('01JAN2023'd <= st_dt <= '31DEC2023'd)
and (CN in ('test') ),"Y","N");
test2 =ifc(('01JAN2024'd <= st_dt <= '01NOV2024'd)
and (CN not in ('devlop') ),"Y","N");
test3 =ifc(('01JAN2024'd <= st_dt <= '01DEC2024'd)
and (CN = 'validate' ),"Y","N");
test4 =ifc(('01JAN2025'd <= st_dt <= '31DEC2025'd)
and (CN ne 'check' ),"Y","N");
run;
So just make a temporary file with the code and use %INCLUDE to make it part of the data step.
Example:
data tests;
infile datalines truncover ;
input F_st_dt :date. F_end_dt :date. test $50. ;
format F_st_dt F_end_dt date9.;
datalines4;
01jan2023 31dec2023 CN in ('test')
01jan2024 01nov2024 CN not in ('devlop')
01jan2024 01dec2024 CN = 'validate'
01jan2025 31dec2025 CN ne 'check'
;;;;
options ls=132;
filename code temp;
data _null_;
set tests;
file code;
put "test" _n_ "=ifc(('" f_st_dt date9. "'d <= st_dt <= '" f_end_dt date9. "'d)"
/ " and (" test '),"Y","N");' ;
run;
data raw_data;
input st_dt date9. cn$10. ;format st_dt date9.;
datalines4;
01jan2023 test
01jan2024 test
01jan2024 devlop
01jan2024 check
01jan2024 validate
01jan2025 validate
;;;;
data want;
set raw_data;
%include code/ source2;
run;
proc print;
run;
Results:
Note if you are going to generate SAS code (as opposed to SQL code) you cannot use LIKE as one your tests. (or if you already have that in your metadata you will have to make the translation when generating the code from the metadata).
I think the condition you wrote is not right.
And you want this ?
data condtns; input F_st_dt :date9. F_end_dt :date9. fn :$ fc$9. fv$18.;format F_st_dt F_end_dt date9.; datalines4; 01jan2023 31dec2023 CN in ('test') 01jan2024 01nov2024 CN not in ('devlop') 01jan2024 01dec2024 CN like '%validate%' 01jan2025 31dec2025 CN not like '%check%' ;;;; run; data raw_data; input st_dt date9. cn$10. ;format st_dt date9.; datalines4; 01jan2023 test 01jan2024 test 01jan2024 devlop 01jan2024 check 01jan2024 validate 01jan2025 validate ;;;; run; proc sql; create table s1 as select a.*,b.* from raw_data a , condtns b where a.st_dt between b.F_st_dt and b.F_end_dt ; quit; data s1; set s1; n+1; run; filename x temp; data _null_; set s1 end=last; file x; if _n_=1 then put 'proc sql; create table want as select *,case '; condition=catx(' ',fn,fc,fv); put 'when n=' n ' and ' condition ' then "Y"'; if last then put 'else "N" end as flag from s1;quit;'; run; %include x/source;
Hi @Ksharp ,
Thank you for sharing the code. I have been trying different ways to implement this for a long time, and I really appreciate your help.
I have made a small tweak to the code as " condition=catx(' ',field_name,field_criteria,'('||field_value||')') " since business will not populate braces for any of the condition but they do mentione '%' based on the condition in conditions dataset. However, for one of the time period(1stJan2022-31stDec2023) in conditions dataset, we have nulls in these columns(FN,FV,FC) and it is throwing an error due to that. Could you help me resolving this?
when n=5 and ( ) then "Y" ---- 180 ERROR 180-322: Statement is not valid or it is used out of proper order.
I believe this works for all logical, comparison and arithmetic operators and not only for any specific operator. Please let me know if my understanding is correct.
Thank you very much again!
The key point here is to form the right expression by using FN,FV,FC , if that doesn't lead to the right one, you can adjust it manually by using IF statement.
data condtns; input F_st_dt :date9. F_end_dt :date9. fn :$ fc$9. fv$18.;format F_st_dt F_end_dt date9.; datalines4; 01jan2023 31dec2023 CN in 'test' 01jan2024 01nov2024 CN not in 'devlop' 01jan2024 01dec2024 CN like 'validate' 01jan2025 31dec2025 CN not like 'check' ;;;; run; data raw_data; input st_dt date9. cn$10. ;format st_dt date9.; datalines4; 01jan2023 test 01jan2024 test 01jan2024 devlop 01jan2024 check 01jan2024 validate 01jan2025 validate ;;;; run; data condtns; length fv $ 200; set condtns; if cmiss(fn,fc,fv) then delete; fv=cats('(',fv,')'); run; proc sql; create table s1 as select a.*,b.* from raw_data a , condtns b where a.st_dt between b.F_st_dt and b.F_end_dt ; quit; data s1; set s1; n+1; run; filename x temp; data _null_; set s1 end=last; file x; if _n_=1 then put 'proc sql; create table want as select *,case '; condition=catx(' ',fn,fc,fv); put 'when n=' n ' and ' condition ' then "Y"'; if last then put 'else "N" end as flag from s1;quit;'; run; %include x/source;
Hi @Ksharp ,
Thanks for your inputs!
I have created two datasets to store nulls and non nulls and processing the above piece of code on non nulls dataset however I am receiving frequent insufficient memory error as my dataset(s) contains more than 4Lakhs records, is there any way to store fileref 'X' in library instead of in temp location, I tried multiple options but it did not work. Could you help if there is any option to optimize this to apply on huge data
Not sure what a Lakhs is, but which dataset are you talking about?
Are you saying the RULES dataset is large? Or the actual DATA that you want to apply the rules against?
Note that writing the code into a temporary work catalog instead of a temporary work file will not actually use any less temporary disk space.
@Tom I have 0.5 millions of records in my raw dataset where I need to apply the conditions, so while applying the conditions to that raw dataset it is throwing insufficient memory error
@Siva_Harish wrote:
@Tom I have 0.5 millions of records in my raw dataset where I need to apply the conditions, so while applying the conditions to that raw dataset it is throwing insufficient memory error
That is not a large number. So unless you have many extremely long character variables it should not cause any issues with memory or disk space.
Are you generating a DATA step to apply the conditions? A data step reading in and writing out 500K observations should run in just a few seconds.
Or did you generate SQL code? If so then does it involve any joins? SQL joins if not done properly could end up trying to perform a cartesian product which multiple the number of observations by a large amount.
Perhaps share a simple example of the code you ended up generating.
I am using below step as shared by @Ksharp in the latest thread
data s1; set s1; n+1; run; filename x temp; data _null_; set s1 end=last; file x; if _n_=1 then put 'proc sql; create table want as select *,case '; condition=catx(' ',fn,fc,fv); put 'when n=' n ' and ' condition ' then "Y"'; if last then put 'else "N" end as flag from s1;quit;'; run;
"I am receiving frequent insufficient memory error as my dataset(s)"
Yes. the code generated by sas is way too huge, you can't running/compile it under limited memory.
"is there any way to store fileref 'X' in library instead of in temp location,"
I don't understand, it is just a plain text file. you can store it at another path like:
filename x temp; ------> filename x "c:\temp\code.sas";
"Could you help if there is any option to optimize this to apply on huge data"
You could split this big table into many small tables and walk through my code one by one by building macro.
And you could remove RESOURCE option to avoid to print the code in LOG.
Here is an example:
data condtns; input F_st_dt :date9. F_end_dt :date9. fn :$ fc$9. fv$18.;format F_st_dt F_end_dt date9.; datalines4; 01jan2023 31dec2023 CN in 'test' 01jan2024 01nov2024 CN not in 'devlop' 01jan2024 01dec2024 CN like 'validate' 01jan2025 31dec2025 CN not like 'check' ;;;; run; data raw_data; input st_dt date9. cn$10. ;format st_dt date9.; datalines4; 01jan2023 test 01jan2024 test 01jan2024 devlop 01jan2024 check 01jan2024 validate 01jan2025 validate ;;;; run; data condtns; length fv $ 200; set condtns; if cmiss(fn,fc,fv) then delete; fv=cats('(',fv,')'); run; proc sql; create table s1 as select a.*,b.* from raw_data a , condtns b where a.st_dt between b.F_st_dt and b.F_end_dt ; quit; data s1; set s1; n+1; run; %let n=10 ; *split dataset S1 into 10 sub-tables; %macro go_through; proc surveyselect data=s1 out=_s1 groups=&n. seed=123; run; filename x temp; %do i=1 %to &n.; data x; set _s1(where=(GroupID=&i.)); run; data _null_; set x end=last; file x; if _n_=1 then put "proc sql; create table _want&i. as select *,case "; condition=catx(' ',fn,fc,fv); put 'when n=' n ' and ' condition ' then "Y"'; if last then put 'else "N" end as flag from x;quit;'; run; %include x; %end; data want; set _want:; run; /*proc sort data=want;by n;run;*/ %mend; %go_through
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.