DATA Step, Macro, Functions and more

Need help with logic

Reply
Regular Contributor
Regular Contributor
Posts: 166

Need help with logic

Guys,
Need some help with a logic that i am trying to create.

The data that i am trying to work on is as below

data test;
infile datalines delimiter = '|';
input key: $5. order: 8. stage: $5. field_required : 8.;
datalines;
AA|1|A|1
AA|2|B|1
AA|3|C|1
AA|4|B|2
AA|5|D|2
AA|6|C|2
AA|7|D|3
AA|8|E|3
AB|9|B|1
AB|10|C|1
AB|11|D|1
AB|12|C|2
AB|13|D|2
AB|14|C|3
AB|15|D|3
;
RUN;


I am trying to derive the Field_required column . My logic is as follows.

I wish to find out the flow of my stage variable for each key.
In the key AA the first 3 stage A,B,C are distinct hence they get the value 1,
next the stage B repeats hence it get the Value 2 .
Now whatever follows B will get the value 2 until a repetition of stage is found under the value 2 . EX: Stage D occurs in value 2 hence the New stage D gets the value 3 and so on.

Note: In the Table you would see that the stage C (i.e order 7) gets a value 2 even though it is present in Value 1 this is because we only look at repetitions in the current value and not in previous values

Note: the ORDER variable would always be distinct.


I did try using the below macro to solve this. But i am sure that there is a simpler way of doing this.


If someone could Please Guide


%macro test(case);
proc sql;
select count(distinct order) into : cnt
from test
where KEY = "&case";
run;
%let cnt = %cmpres(&cnt);
proc sql;
select distinct order, stage
into : st1 -: st&cnt,
: cur1 -: cur&cnt
from test
where key= "&case"
order by order;
run;

%do i = 1 %to &cnt;
%if &i = 1 %then %do;
data test1;
set test;
format id 8.;
id = 1;
where key = "&CASE" AND order= &&st&i;
run;
%let iid = 1;
%end;

%if &i >1 %then %do;
proc sql noprint;
select count(stage)
into : check
from test1
where key = "&CASE" AND stage = "&&cur&i" and id = &iid;
run;
%if &check > 0 %then %do;
%let iid = %cmpres(&iid + 1);
%end;
data test2;
set test;
id = &iid;
where order = &&st&i;
run;
data test1;
set test1 test2;
run;
%end;
%end;

%mend;

%macro case_check;
PROC SQL NOPRINT;
SELECT COUNT(DISTINCT key)
INTO : CNTT
FROM TEST
RUN;
%LET CNTT = %CMPRES(&CNTT);
PROC SQL NOPRINT;
SELECT DISTINCT key
INTO : CAS1 -: CAS&CNTT
FROM TEST
;
RUN;

%DO J = 1 %TO &CNTT;
%test(CASE = &&CAS&J)
%if j = 1 %then %do;
DATA final;
SET TEST1;
RUN;
%end;
%if j>1 %then %do;
DATA final;
SET final TEST1;
RUN;


%END;
%END;
%MEND;
%CASE_CHECK
Super Contributor
Super Contributor
Posts: 3,174

Re: Need help with logic

Given your reference to "unique" and "repeat", review the topic BY GROUP PROCESSING within a DATA step (on a sorted input file). You will want to consider how to use IF FIRST. and/or LAST. to identify unique or repeat occurrences within your data, which is combined with a BY statement. Honestly, I have no clue why you embarked on the MACRO route to start....

Also, search the archives as we as SAS.COM on the topic for code examples and explanations -- this topic has been discussed before.

And, for self-initiated DATA step debugging, look at using PUTLOG _ALL_; where you will see the "1" and "0" conditions being set by SAS for the FIRST/LAST conditions when you code the BY statement with your "key" variables.

Scott Barry
SBBWorks, Inc.

Google advanced argument, this topic / post:

by group processing site:sas.com Message was edited by: sbb
Respected Advisor
Posts: 4,173

Re: Need help with logic

Hi NN

Another approach would be to use a dynamically built hash table.

You start with an empty hash table.

In each iteration of the data step you check if the value of "stage" also exists in the hash table. If the value of "stage" is not there then you add it to the hash table (first occurence), if the value of "stage" is already in the hash table then you add 1 to your fiel_required variable - and you empty the hash table by re-creating it.

Also if the key changes in your base data set you re-create the hash table and set the value for field_required to 1.

See code below:


data test;
infile datalines delimiter = '|';
input key: $5. order: 8. stage: $5. field_required : 8.;
datalines;
AA|1|A|1
AA|2|B|1
AA|3|C|1
AA|4|B|2
AA|5|D|2
AA|6|C|2
AA|7|D|3
AA|8|E|3
AB|9|B|1
AB|10|C|1
AB|11|D|1
AB|12|C|2
AB|13|D|2
AB|14|C|3
AB|15|D|3
;
RUN;


data want;
set test;
by key;
if first.key then field_required_2=1;

if first.key or lag(field_required_2) ne field_required_2 then do;
declare hash h ();
rc = h.defineKey('stage');
rc = h.defineDone();
end;

if (h.check() ne 0) then
do;
h.add();
end;
else
do;
field_required_2+1;
end;

run;

proc print data=want;
run;


The one thing I'm not sure about:
When executing a second declare statement with the same hash name does this destroy the previous hash or only disconnect the link to the first hash in memory.
If it only unlinks then memory will get more and more clogged with "zombies".

This shouldn't be a big issue with the code above if you're not processing millions of records as there is only one key loaded into the hash.

Does anyone know the answer? Is there some kind of a destroy method (haven't found it)?

Thanks
Patrick
Regular Contributor
Regular Contributor
Posts: 166

Re: Need help with logic

Hi Patrick,
Your use of hash definitely looks as per my need. But i have never used hash hence i would have to read about it before i use it in my query.

A samll point in the logic you have given. In the output for (order = 14) the Field_required and Field_required_2 are different. Maybe you could help correct this.

Thanks....


Scott,
Thanks for the suggestion ,
My lack of knowledge of Data step had forced me to use macros for this Job .
I am currently looking at By group Processing with Retain to try and solve this. Hope it would work.

Thanks ....
Super User
Posts: 10,035

Re: Need help with logic

There will always have a solution in SAS.;-)
Assuming variable 'stage' has five levels such as(A B C D E).

Hi. Mr.Patrick .Can you give me some documentation or files or HyperLink about hash table which I am not famillar with,I need to learn it ,just thought it would be really useful.Thx.

[pre]
data test;
infile datalines delimiter = '|';
input key : $5. order : 8. stage : $5.;
datalines;
AA|1|A
AA|2|B
AA|3|C
AA|4|B
AA|5|D
AA|6|C
AA|7|D
AA|8|E
AB|9|B
AB|10|C
AB|11|D
AB|12|C
AB|13|D
AB|14|C
AB|15|D
;
RUN;


data temp;
set test;
by key;
length join $20.;
retain join count;
if first.key then do;
count=1;
call missing(join);
end;
join=cats(join,stage);
do level='A' ,'B', 'C', 'D' ,'E' ;
num=countc(join,level);
if num ge 2 then do;
count+1;
join=trim( stage );
leave;
end;
end;
keep key order stage count;
run;
proc print;run;

[/pre]


Ksharp

Code is optimized.

Message was edited by: Ksharp Message was edited by: Ksharp
Respected Advisor
Posts: 4,173

Re: Need help with logic

Hi Ksharp

You'll find the hash object well documented in the SAS Online doc. There is also a bunch of good papers around.

Just "googling" fast with keywords "SAS hash table" I found for example this link: http://www2.sas.com/proceedings/sugi30/236-30.pdf
Considering the authors this paper is sure very worthwhile reading.

There have been some improvements to the hash object in SAS 9.2. But to get started the SAS 9.1 functionality is very o.k.

HTH
Patrick
Respected Advisor
Posts: 4,173

Re: Need help with logic

Hi NN

"In the output for (order = 14) the Field_required and Field_required_2 are different."
That's because the value of "stage" is "C" as it was already in row 12. According to "your" logic the value of "field_required_2" must be augmented - and this is what the code does.

"...hence i would have to read about it before i use it in my query."
Yep, that's how all of us improve our skills....

HTH
Patrick
Ask a Question
Discussion stats
  • 6 replies
  • 163 views
  • 0 likes
  • 4 in conversation