Hello everyone,
I want to replace/change variables values depending on their orders.
I have a sample code as below, I want to replace Order variable values with each other but depending on "DEV" and "REC" values.
If you can examine my followimg sample, you can understand what I try to do. I would also like to point out that "Have" data set can include more than for rows. So Order of "REC" value should replace with latest value of "Order"
Can somebody help me, please?
Data Have;
Length DataSet $ 32 Preffix $ 32 Order 8;
Infile Datalines Missover;
Input DataSet Preffix Order;
Datalines;
Have Mdl 1
Want Dev 2
Last Rec 3
Desired Val 4
;
Run;
Data Want;
Set Have;
If Upcase(Preffix)="DEV" Then Order=1;
If Upcase(Preffix)="REC" Then Order=4;
/*There can be more than four data set
so I need to assign this value to latest "Order" value*/
/*If Upcase(Preffix)="DEV" Then Do;*/
/*Order=TranWrd(Order,2,1);*/
/*Order=TranWrd(Order,1,2);*/
/*End;*/
/*If Upcase(Preffix)="REC" Then Do;*/
/*Order=TranWrd(Order,3,4);*/
/*Order=TranWrd(Order,4,3);*/
/*End;*/
Run;
Desired ->
What if "Have" data set will be as below, how can I replace order of "REC" with latest row?
Data Have;
Length DataSet $ 32 Preffix $ 32 Order 8;
Infile Datalines Missover;
Input DataSet Preffix Order;
Datalines;
Have Mdl 1
Want Dev 2
Last Rec 3
Want2 Val2 4
Desired Val 5
Have2 Mdl2 6
;
Run;
Thank you
It's a mistake to try to do this in one step. Just assign some extreme values:
data want;
set have;
if upcase(preffix)='DEV' then order=-5;
else if upcase(preffix)='REC' then order=99999;
run;
proc sort data=want;
by order;
run;
data want;
set want;
order = _n_;
run;
The data set is small, so take advantage of that. Use multiple easy programming steps.
Yes, this is what I want -> "Are you saying that no matter what the value of the internal order variable is, if the value of Preffix is 'Dev' then the order changes to 1 and if the value is 'Rec', then that is always the Last order???"
Actually, there is no logic, these values into the "Have" data set is an input data. Some person will fill the values then I will order depending on "DEV" and "REC". "DEV" should be first and "REC" should be last.
It's a mistake to try to do this in one step. Just assign some extreme values:
data want;
set have;
if upcase(preffix)='DEV' then order=-5;
else if upcase(preffix)='REC' then order=99999;
run;
proc sort data=want;
by order;
run;
data want;
set want;
order = _n_;
run;
The data set is small, so take advantage of that. Use multiple easy programming steps.
Your reply gives me what I want, Thank you.
To decrease data step count, if I write Proc Sql as below, Is it affect the performance positive or negative?
Also is it possible for me to add your Order=_N_ step into the PROC SQL step?
Proc Sql;
Create Table Want As
Select Dataset
,Preffix
,Case
When Upcase(Preffix)="DEV" Then -5
WHEN Upcase(Preffix)="REC" Then 99999
Else Order
End AS Order
From Have
Order By Order;
Quit;
Data Want2;
Set Want;
Order=_N_;
Run;
Thank you
Yes, assuming your SQL code is working you can cut out a step that way. But these data sets are small. The most efficient method doesn't depend on the number of steps. Use whatever you would find easy to understand when you are reviewing the program a year from now.
Here is an Array way.
I use array to keep the Observation Number(Record ID). Then I exchange the positions of Record IDs depending on 'Dev' and 'Rec'.
But the data set is used 3 times which I think is a limitation unless proved otherwise.
data want;
if _N_ = 1 then do;
array k[7] _temporary_;
do i = 1 by 1 until(eof);
set have end = eof;
k[Order] = i;
end;
end;
do i = 1 by 1 until(last);
set have end = last;
if upcase(Preffix) = 'DEV' then do;
k[dim(k)] = k[1];
k[1] = i;
k[i] = k[dim(k)];
end;
if upcase(Preffix) = 'REC' then do;
k[dim(k)] = k[4];
k[4] = i;
k[i] = k[dim(k)];
end;
end;
if eof;
do i = 1 to dim(k) - 1;
pt = k[i];
set have point = pt;
Order = i;
output;
end;
drop i;
run;
For 6 observations, the array is sized to 7. The 7th cell is used for exchanges. The first do-loop assigns the RID to the array cells. The second do-loop exchanges the RIDs. The third do-loop uses POINT= option to get the desired observation.
We can avoid the 3 passes over the data set by holding other two variables in two separate arrays and do the exchanges on them whenever there is exchange in ORDER.
Anyway this fun with Array.
Thank you very much for your detailed information. But I did not understand what should I do, if there are six observation?
In your method, I need to know count of observations number, is it possible to perform your method without knowing the observation count. Because the observation count can change.
Thank you
A small program get it.
data _null_;
if 0 then set have nobs = n;
call symputx('nplusone', n + 1);
stop;
run;
%put &nplusone;
Place &nplusone wherever 7 appears.
Thank you very much,
If I change data set as below and replace 7 with &nplusone. the result doesn't become what I want.
Data Have;
Length DataSet $ 32 Preffix $ 32 Order 8;
Infile Datalines Missover;
Input DataSet Preffix Order;
Datalines;
Have Mdl 1
Want Dev 2
Last Rec 3
Want2 Val2 4
Desired Val 5
Have2 Mdl2 6
;
Run;
data _null_;
if 0 then set have nobs = n;
call symputx('nplusone', n + 1);
stop;
run;
%put &nplusone;
data want;
if _N_ = 1 then do;
array k[&nplusone.] _temporary_;
do i = 1 by 1 until(eof);
set have end = eof;
k[Order] = i;
end;
end;
do i = 1 by 1 until(last);
set have end = last;
if upcase(Preffix) = 'DEV' then do;
k[dim(k)] = k[1];
k[1] = i;
k[i] = k[dim(k)];
end;
if upcase(Preffix) = 'REC' then do;
k[dim(k)] = k[4];
k[4] = i;
k[i] = k[dim(k)];
end;
end;
if eof;
do i = 1 to dim(k) - 1;
pt = k[i];
set have point = pt;
Order = i;
output;
end;
drop i;
run;
Thank you
This is the output I am getting. Is this OK or not?
Obs DataSet Preffix Order
1 Want Dev 1
2 Have Mdl 2
3 Want2 Val2 3
4 Last Rec 4
5 Desired Val 5
6 Have2 Mdl2 6
Actually, it should be the following table;
According to your response, I would like to ask one more question to you. What if I don't have Order variable in the data set, can I still order the table the way I want?
Thank you
What if have following table;
Data Have;
Length DataSet $ 32 Preffix $ 32;
Infile Datalines Missover;
Input DataSet Preffix ;
Datalines;
Have Mdl
Want Dev
Last Rec
Want2 Val
Desired Val
Have2 Mdl2
;
Run;
Yes.
Replace Order by _N_ in the first do-loop whic is i here.
Edited.....
do i = 1 by 1 until(eof);
set have end = eof;
k[i] = i;
end;
In previous post input data set had 3 passes. In the following program, it needs one pass to store the Record IDs in an array(K) and another pass to directly to go to the required record for output. As records are processed the Preffix variable is checked for 'DEV' and 'REC' and if they found, their RID is saved in another array(P). Once the RIDs are in the array(K), comparing with (P), exchanges are made. Finally the array(K) has the required order of RIDs. The final do-loop uses cell values of K to directly hit the record for the output.
A small modification to the data set is made. No need for the ORDER variable in the input data set. Here is everything:
Data have;
Length DataSet $ 32 Preffix $ 32;
Infile Datalines Missover;
Input DataSet Preffix;
datalines;
Have Mdl
Want2 Val2
Desired Val
Have2 Mdl2
have3 Mdl3
want2 Val3
have4 Mdl4
have5 Mdl5
Last Rec
Want Dev
;
run;
data _null_;
if 0 then set have nobs = n;
call symputx('nplusone', n + 1);
stop;
run;
%put &nplusone;
data want;
if _N_ = 1 then do;
array k[&nplusone] _temporary_;
array p[&nplusone] _temporary_;
do i = 1 by 1 until(eof);
set have end = eof;
k[i] = i;
if upcase(Preffix) = 'DEV' then p[i] = 1;
else if upcase(Preffix) = 'REC' then p[i] = 4;
end;
do i = 1 to dim(k) - 1;
if not missing(p[i]) then do;
k[dim(k)] = p[i];
k[p[i]] = i;
k[i] = k[dim(k)];
end;
end;
end;
if eof;
do i = 1 to dim(k) - 1;
pt = k[i];
set have point = pt;
Order = i;
output;
end;
stop;
drop i;
run;
proc print data = want;
run;
DataSet Preffix Order
Want Dev 1
Want2 Val2 2
Desired Val 3
Last Rec 4
have3 Mdl3 5
want2 Val3 6
have4 Mdl4 7
have5 Mdl5 8
Have2 Mdl2 9
Have Mdl 10
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.