I am trying to derive the Field_required column . My logic is as follows.
I wish to find out the flow of my stage variable for each key.
In the key AA the first 3 stage A,B,C are distinct hence they get the value 1,
next the stage B repeats hence it get the Value 2 .
Now whatever follows B will get the value 2 until a repetition of stage is found under the value 2 . EX: Stage D occurs in value 2 hence the New stage D gets the value 3 and so on.
Note: In the Table you would see that the stage C (i.e order 7) gets a value 2 even though it is present in Value 1 this is because we only look at repetitions in the current value and not in previous values
Note: the ORDER variable would always be distinct.
I did try using the below macro to solve this. But i am sure that there is a simpler way of doing this.
If someone could Please Guide
select count(distinct order) into : cnt
where KEY = "&case";
%let cnt = %cmpres(&cnt);
select distinct order, stage
into : st1 -: st&cnt,
: cur1 -: cur&cnt
where key= "&case"
order by order;
%do i = 1 %to &cnt;
%if &i = 1 %then %do;
format id 8.;
id = 1;
where key = "&CASE" AND order= &&st&i;
%let iid = 1;
%if &i >1 %then %do;
proc sql noprint;
into : check
where key = "&CASE" AND stage = "&&cur&i" and id = &iid;
%if &check > 0 %then %do;
%let iid = %cmpres(&iid + 1);
id = &iid;
where order = &&st&i;
set test1 test2;
PROC SQL NOPRINT;
SELECT COUNT(DISTINCT key)
INTO : CNTT
%LET CNTT = %CMPRES(&CNTT);
PROC SQL NOPRINT;
SELECT DISTINCT key
INTO : CAS1 -: CAS&CNTT
%DO J = 1 %TO &CNTT;
%test(CASE = &&CAS&J)
%if j = 1 %then %do;
%if j>1 %then %do;
SET final TEST1;
Given your reference to "unique" and "repeat", review the topic BY GROUP PROCESSING within a DATA step (on a sorted input file). You will want to consider how to use IF FIRST. and/or LAST. to identify unique or repeat occurrences within your data, which is combined with a BY statement. Honestly, I have no clue why you embarked on the MACRO route to start....
Also, search the archives as we as SAS.COM on the topic for code examples and explanations -- this topic has been discussed before.
And, for self-initiated DATA step debugging, look at using PUTLOG _ALL_; where you will see the "1" and "0" conditions being set by SAS for the FIRST/LAST conditions when you code the BY statement with your "key" variables.
Google advanced argument, this topic / post:
by group processing site:sas.com
Message was edited by: sbb
Another approach would be to use a dynamically built hash table.
You start with an empty hash table.
In each iteration of the data step you check if the value of "stage" also exists in the hash table. If the value of "stage" is not there then you add it to the hash table (first occurence), if the value of "stage" is already in the hash table then you add 1 to your fiel_required variable - and you empty the hash table by re-creating it.
Also if the key changes in your base data set you re-create the hash table and set the value for field_required to 1.
if first.key then field_required_2=1;
if first.key or lag(field_required_2) ne field_required_2 then do;
declare hash h ();
rc = h.defineKey('stage');
rc = h.defineDone();
if (h.check() ne 0) then
proc print data=want;
The one thing I'm not sure about:
When executing a second declare statement with the same hash name does this destroy the previous hash or only disconnect the link to the first hash in memory.
If it only unlinks then memory will get more and more clogged with "zombies".
This shouldn't be a big issue with the code above if you're not processing millions of records as there is only one key loaded into the hash.
Does anyone know the answer? Is there some kind of a destroy method (haven't found it)?
Your use of hash definitely looks as per my need. But i have never used hash hence i would have to read about it before i use it in my query.
A samll point in the logic you have given. In the output for (order = 14) the Field_required and Field_required_2 are different. Maybe you could help correct this.
Thanks for the suggestion ,
My lack of knowledge of Data step had forced me to use macros for this Job .
I am currently looking at By group Processing with Retain to try and solve this. Hope it would work.
length join $20.;
retain join count;
if first.key then do;
do level='A' ,'B', 'C', 'D' ,'E' ;
if num ge 2 then do;
join=trim( stage );
keep key order stage count;
Code is optimized.
Message was edited by: Ksharp
Message was edited by: Ksharp
"In the output for (order = 14) the Field_required and Field_required_2 are different."
That's because the value of "stage" is "C" as it was already in row 12. According to "your" logic the value of "field_required_2" must be augmented - and this is what the code does.
"...hence i would have to read about it before i use it in my query."
Yep, that's how all of us improve our skills....