DATA Step, Macro, Functions and more

Data Transformation

Reply
Occasional Contributor
Posts: 5

Data Transformation

Hello all,

As part of my data cleaning routine I would like to convert a matrix in the SAS data set below.

DATA Expected_Interaction;

INFORMAT
expected_interaction $CHAR2.
a $CHAR3.
b $CHAR3.
c $CHAR3.
d $CHAR3.
e $CHAR3.
f $CHAR3.
g $CHAR3.
h $CHAR3.
i $CHAR3.
j $CHAR3.
k $CHAR3.
l $CHAR3.
m $CHAR3.
n $CHAR3.
o $CHAR3.
p $CHAR3.
q $CHAR3.
r $CHAR3.
s $CHAR3.
t $CHAR3.
u $CHAR3.
v $CHAR3.
w $CHAR3.
x $CHAR3.
y $CHAR3.
z $CHAR3.
aa $CHAR3.
ab $CHAR3.
ac $CHAR3.
ad $CHAR3.
ae $CHAR3.
af $CHAR3.
ag $CHAR3.
ah $CHAR3.
aj $CHAR3.
ak $CHAR3.
al $CHAR3.
am $CHAR3.
an $CHAR3.
ao $CHAR3. ;
INFILE DATALINES4
DLM='7F'x
MISSOVER
DSD ;
INPUT
expected_interaction : $CHAR2.
a : $CHAR3.
b : $CHAR3.
c : $CHAR3.
d : $CHAR3.
e : $CHAR3.
f : $CHAR3.
g : $CHAR3.
h : $CHAR3.
i : $CHAR3.
j : $CHAR3.
k : $CHAR3.
l : $CHAR3.
m : $CHAR3.
n : $CHAR3.
o : $CHAR3.
p : $CHAR3.
q : $CHAR3.
r : $CHAR3.
s : $CHAR3.
t : $CHAR3.
u : $CHAR3.
v : $CHAR3.
w : $CHAR3.
x : $CHAR3.
y : $CHAR3.
z : $CHAR3.
aa : $CHAR3.
ab : $CHAR3.
ac : $CHAR3.
ad : $CHAR3.
ae : $CHAR3.
af : $CHAR3.
ag : $CHAR3.
ah : $CHAR3.
aj : $CHAR3.
ak : $CHAR3.
al : $CHAR3.
am : $CHAR3.
an : $CHAR3.
ao : $CHAR3. ;
DATALINES4;
A YES        YESYESYESYESYES YES YES     YESYES      YESYESYESYESYESYESYESYES
BYES         YESYESYES YES YES YES     YES             YESYES
CYESYES YES       YESYESYESYESYES     YES   YES  YES YESYES  YESYESYESYESYES
D YES      YES                      YES       
E        YES                              
F      YESYES          YES   YESYES     YES         YES
G     YES YES                     YES         
H     YESYES           YES   YESYES     YES         YES
I   YESYES    YESYESYESYES YESYESYESYES  YESYES                 
J   YESYES     YESYESYES YESYESYESYES YESYESYES                 
K           YESYESYES YES YES YES    YES  YES           YES
L          YES YESYES YES YES YES    YES  YES           YES
MYESYES        YESYES YESYESYESYESYESYES     YESYESYESYES           YES
N  YESYES        YES YESYESYES   YESYES   YESYES YES YESYESYESYESYESYESYESYESYES
OYES  YESYES     YESYESYESYES YESYES        YESYES     YESYESYESYESYESYESYESYES
P   YESYES     YESYESYESYES   YES YESYESYES  YES  YES       YESYESYESYESYES
QYES  YESYES     YESYESYESYESYESYES YES YES    YESYESYESYES  YES YESYESYESYESYESYESYESYES
R   YESYES     YESYESYES  YES   YESYESYES  YES  YES  YES   YESYESYESYESYES
SYESYES        YESYESYES YES YES       YES             YESYES
TYESYES       YESYESYES     YES  YESYES  YESYESYESYESYES  YES       
U YES YESYES    YES         YES YES  YESYESYESYESYES  YESYESYES     
V YES YESYES    YES         YESYES   YESYESYESYESYES  YESYESYES     
W                  YES    YES     YES         
X                  YES   YES      YES         
YYESYES        YESYES YES   YES YESYESYES   YESYESYESYES  YES       YES
ZYESYES           YESYESYESYESYES  YESYES    YESYESYES YES YESYESYESYESYESYES 
AAYESYES YESYES       YESYES  YES  YESYESYES  YESYES  YES  YES  YESYESYESYES 
ABYESYES YESYES     YESYESYES   YESYES YESYESYES  YESYESYES YES  YES  YESYESYESYES 
ACYESYES      YESYES   YESYESYESYESYES YESYESYES  YESYESYESYES  YES YESYESYESYESYESYES 
AD                                       
AE                               YES       
AF                    YESYES   YES  YES          
AGYESYES           YESYESYESYESYES  YESYES   YESYESYESYES YES  YESYESYESYESYES 
AHYESYES           YESYESYESYESYES  YESYES   YESYESYESYES YES YES YESYESYESYES 
AJYESYES           YESYESYESYESYES  YESYES   YESYESYESYES YESYESYESYES YESYESYES 
AKYESYES           YESYESYESYESYES  YESYES   YESYESYESYES YES YESYESYES YESYES 
ALYESYES           YESYESYESYESYES  YESYES   YESYESYESYES YES YESYESYESYES YES 
AMYESYES           YESYESYESYESYES  YESYES   YESYESYESYES YESYESYESYESYESYESYES  
AN YES                  YESYES        YESYES  YES  YES 
AOYESYES        YESYESYES YES YES YES     YES             YES
;;;;
run;

something a data set with only 2 columns

Cause Impact
product A product A
product A product B
product A product D
. .
. .
. .
product A product Z
product B product A
product B product B
product B product D
. .
. .
. .
product B product Z
product C product A
product C product B
product C product D
. .
. .
. .
product C product Z

Any suggestions to speed up this part of data cleaning would be much appreciated.
Respected Advisor
Posts: 4,173

Re: Data Transformation

Hi

May be this should be posted in the statistical forum?

I sure don't have enough knowledge in this field to interprete your problem.

If it's "only" about data transformation:
Would the cause/impact value pairs only have values for cases where there is a "yes" in the source data (i.e. "product a"/"product b" but not "product a"/"product c")?
Regular Contributor
Posts: 241

Re: Data Transformation

Here is one way. HTH.

[pre]
/* test data */
data mat;
infile cards firstobs=2;
input id $ (a b c) ($);
cards;
a b c
a y y n
b y n n
c n n n
;
run;

/* matrix to edge-list */
data edgelist;
set mat;

length from to $1;
array val[1:3] $ a--c;
retain nodes "a b c";

do idx = 1 to dim(val);
if val[idx] ^= "y" then continue;
from = id;
to = scan(nodes, idx);
output;
end;

keep from to;
run;

/* check */
proc print data=edgelist noobs;
run;
/* on log
from to
a a
a b
b a
*/
[/pre]
Ask a Question
Discussion stats
  • 2 replies
  • 120 views
  • 0 likes
  • 3 in conversation