Hello all,
Could anyone help me to figure out this problem? Thanks in advance!
My original dataset is like this:
ID | A | B | C |
---|---|---|---|
1 | 2 | 3 | 4 |
2 | 5 | . | 3 |
3 | 9 | 5 | 1 |
4 | 5 | . | . |
5 | . | . | . |
I'd like to create a new variable that indicates the order pattern from the minimum to the maximum. So the new dataset should look like this:
ID | A | B | C | Pattern |
---|---|---|---|---|
1 | 2 | 3 | 4 | A_B_C |
2 | 5 | . | 3 | C_A |
3 | 9 | 5 | 1 | C_B_A |
4 | 5 | . | . | A |
5 | . | . | . | . |
Any suggestion is appreciated! Thanks!
A couple of details may need some clarification. Is this for only 3 variables or do you need something that works for an abitrary number of values? Are the actual variable names A B C or are you using a placeholder names for example? This is somewhat important as the length of Pattern needs to be set large enough to hold the longest possible string.
Also, rules for tie breaking? What is the result if A and C are the same value? What if all variables have the same value? And do you want an actual period '.' or a missing for pattern when all values are missing?
Ballardw, thanks for bring these to my attention! You are absolutely right. I just made this as a simple example. In my real dataset, I have 8 variables and the actual names are 2-4 letters. To make the 'pattern' variable short, I could use numbers to replace the letters, say if pattern is A_B_C then I could use 1_2_3 instead.
And for the equal value, if A, B, and C are 1, 1, and 2 respectively, then the pattern could be AB_C.
ballardw is right; more info is helpful... sounds like you want to retain names of nonmissing variables; maybe you can build on this...
data have;
input id a b c d;
cards ;
1 2 3 4 .
2 . 5 2 5
3 . . . .
4 . 3 2 .
5 . . 1 7
;
run;
data want;
set have;
array _vars a b c d;
array _varN $ x1-x4;
do over _varN;
if _vars ne . then _varN= VNAME (_vars);
end;
pattern=catx ("_",OF X1-x4);
run;
proc print ;
var id a b c d pattern;
run;
Hello Altal,
Thanks for your answer. It is helpful. The new variable 'pattern' you make indicates the pattern of present/absent. But how to make the 'pattern' variable reflect the order?
There may be a smarter way, but you can use something like this... still some coded is needed for tied values..
data have;
input id a b c d;
cards ;
1 2 3 4 .
2 . 5 2 5
3 . . . .
4 . 3 2 .
5 . . 1 7
6 1 1 . .
7 6 5 4 1
; run;
data want;
set have;
array _vars a b c d;
array _varN $ x1-x4;
do over _varN;
if _vars ne . then _varN= VNAME (_vars);
end;
run;
proc transpose data=want out=temp1;
var a b c d;
by id;
run;
proc transpose data=want out=temp2;
var X1-X4;
by id;
run;
data a;
merge temp1 temp2 (rename=(col1=col2));
by id;
run;
proc sort data=a;
by id col1;
run;
proc transpose data=a out=temp3 (drop=_NAME_) prefix=col;
var col2; by id;
run;
data temp3;
set temp3; pattern=catx ("_",OF col1-col4);
run;
proc print ;run;
Hi Altal,
Your codes are very helpful! I still have one question about the equal value, for instance, id=2 the pattern should be 'c_bd' or 'c_db', because b=d=5. Is there an easier way to deal with that? Thanks!
I would probably take the obvious approach.
1 | 2 | 3 | 4 | . | a_b_c |
2 | . | 5 | 2 | 5 | c_bd |
3 | . | . | . | . | |
4 | . | 3 | 2 | . | c_b |
5 | . | . | 1 | 7 | c_d |
6 | 1 | 1 | . | . | ab |
7 | 6 | 5 | 4 | 1 | d_c_b_a |
Just another approach with a single pass through the data:
data have;
input id a b c d;
cards;
1 2 3 4 .
2 . 5 2 5
3 . . . .
4 . 3 2 .
5 . . 1 7
6 1 1 . .
7 6 5 4 1
;;;;
run;
data want(drop= _:);
set have;
length pattern $32.;
array myarr {*} a b c d;
if _n_=1 then
do;
length _value 8 _varname $32;
dcl hash h (multidata:'y', ordered:'y', hashexp:3);
dcl hiter hit ('h');
_rc=h.defineKey('_value');
_rc=h.defineData('_varname','_value');
_rc=h.defineDone();
end;
_rc=h.clear();
do _i=1 to dim(myarr);
_value=myarr[_i];
if missing(_value) then continue;
_varname=vname(myarr[_i]);
_rc=h.add();
end;
_rc = hit.first();
_value_old=_value;
do while (_rc = 0);
if _value_old=_value then
do;
pattern=cats(pattern,_varname);
end;
else
pattern=cats(pattern,'_',_varname);
_value_old=_value;
_rc = hit.next();
end;
run;
Hi ..
Small update to the @data_null_; code..
data pattern;
do until(last.id);
do until(last.col1);
set ranks;
by id col1;
length p pattern $128;
p = catX('_',p,_name_);/*Updated cats(p,_name_) to catX('_',p,_name_)*/
end;
pattern = catx('_',pattern,p);
call missing(p);
end;
drop col1 _name_ p;
run;
Hope this will helps..
Thanks,Yash.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.