BookmarkSubscribeRSS Feed
psquares
Calcite | Level 5
Hello all,

As part of my data cleaning routine I would like to convert a matrix in the SAS data set below.

DATA Expected_Interaction;

INFORMAT
expected_interaction $CHAR2.
a $CHAR3.
b $CHAR3.
c $CHAR3.
d $CHAR3.
e $CHAR3.
f $CHAR3.
g $CHAR3.
h $CHAR3.
i $CHAR3.
j $CHAR3.
k $CHAR3.
l $CHAR3.
m $CHAR3.
n $CHAR3.
o $CHAR3.
p $CHAR3.
q $CHAR3.
r $CHAR3.
s $CHAR3.
t $CHAR3.
u $CHAR3.
v $CHAR3.
w $CHAR3.
x $CHAR3.
y $CHAR3.
z $CHAR3.
aa $CHAR3.
ab $CHAR3.
ac $CHAR3.
ad $CHAR3.
ae $CHAR3.
af $CHAR3.
ag $CHAR3.
ah $CHAR3.
aj $CHAR3.
ak $CHAR3.
al $CHAR3.
am $CHAR3.
an $CHAR3.
ao $CHAR3. ;
INFILE DATALINES4
DLM='7F'x
MISSOVER
DSD ;
INPUT
expected_interaction : $CHAR2.
a : $CHAR3.
b : $CHAR3.
c : $CHAR3.
d : $CHAR3.
e : $CHAR3.
f : $CHAR3.
g : $CHAR3.
h : $CHAR3.
i : $CHAR3.
j : $CHAR3.
k : $CHAR3.
l : $CHAR3.
m : $CHAR3.
n : $CHAR3.
o : $CHAR3.
p : $CHAR3.
q : $CHAR3.
r : $CHAR3.
s : $CHAR3.
t : $CHAR3.
u : $CHAR3.
v : $CHAR3.
w : $CHAR3.
x : $CHAR3.
y : $CHAR3.
z : $CHAR3.
aa : $CHAR3.
ab : $CHAR3.
ac : $CHAR3.
ad : $CHAR3.
ae : $CHAR3.
af : $CHAR3.
ag : $CHAR3.
ah : $CHAR3.
aj : $CHAR3.
ak : $CHAR3.
al : $CHAR3.
am : $CHAR3.
an : $CHAR3.
ao : $CHAR3. ;
DATALINES4;
A YES        YESYESYESYESYES YES YES     YESYES      YESYESYESYESYESYESYESYES
BYES         YESYESYES YES YES YES     YES             YESYES
CYESYES YES       YESYESYESYESYES     YES   YES  YES YESYES  YESYESYESYESYES
D YES      YES                      YES       
E        YES                              
F      YESYES          YES   YESYES     YES         YES
G     YES YES                     YES         
H     YESYES           YES   YESYES     YES         YES
I   YESYES    YESYESYESYES YESYESYESYES  YESYES                 
J   YESYES     YESYESYES YESYESYESYES YESYESYES                 
K           YESYESYES YES YES YES    YES  YES           YES
L          YES YESYES YES YES YES    YES  YES           YES
MYESYES        YESYES YESYESYESYESYESYES     YESYESYESYES           YES
N  YESYES        YES YESYESYES   YESYES   YESYES YES YESYESYESYESYESYESYESYESYES
OYES  YESYES     YESYESYESYES YESYES        YESYES     YESYESYESYESYESYESYESYES
P   YESYES     YESYESYESYES   YES YESYESYES  YES  YES       YESYESYESYESYES
QYES  YESYES     YESYESYESYESYESYES YES YES    YESYESYESYES  YES YESYESYESYESYESYESYESYES
R   YESYES     YESYESYES  YES   YESYESYES  YES  YES  YES   YESYESYESYESYES
SYESYES        YESYESYES YES YES       YES             YESYES
TYESYES       YESYESYES     YES  YESYES  YESYESYESYESYES  YES       
U YES YESYES    YES         YES YES  YESYESYESYESYES  YESYESYES     
V YES YESYES    YES         YESYES   YESYESYESYESYES  YESYESYES     
W                  YES    YES     YES         
X                  YES   YES      YES         
YYESYES        YESYES YES   YES YESYESYES   YESYESYESYES  YES       YES
ZYESYES           YESYESYESYESYES  YESYES    YESYESYES YES YESYESYESYESYESYES 
AAYESYES YESYES       YESYES  YES  YESYESYES  YESYES  YES  YES  YESYESYESYES 
ABYESYES YESYES     YESYESYES   YESYES YESYESYES  YESYESYES YES  YES  YESYESYESYES 
ACYESYES      YESYES   YESYESYESYESYES YESYESYES  YESYESYESYES  YES YESYESYESYESYESYES 
AD                                       
AE                               YES       
AF                    YESYES   YES  YES          
AGYESYES           YESYESYESYESYES  YESYES   YESYESYESYES YES  YESYESYESYESYES 
AHYESYES           YESYESYESYESYES  YESYES   YESYESYESYES YES YES YESYESYESYES 
AJYESYES           YESYESYESYESYES  YESYES   YESYESYESYES YESYESYESYES YESYESYES 
AKYESYES           YESYESYESYESYES  YESYES   YESYESYESYES YES YESYESYES YESYES 
ALYESYES           YESYESYESYESYES  YESYES   YESYESYESYES YES YESYESYESYES YES 
AMYESYES           YESYESYESYESYES  YESYES   YESYESYESYES YESYESYESYESYESYESYES  
AN YES                  YESYES        YESYES  YES  YES 
AOYESYES        YESYESYES YES YES YES     YES             YES
;;;;
run;

something a data set with only 2 columns

Cause Impact
product A product A
product A product B
product A product D
. .
. .
. .
product A product Z
product B product A
product B product B
product B product D
. .
. .
. .
product B product Z
product C product A
product C product B
product C product D
. .
. .
. .
product C product Z

Any suggestions to speed up this part of data cleaning would be much appreciated.
2 REPLIES 2
Patrick
Opal | Level 21
Hi

May be this should be posted in the statistical forum?

I sure don't have enough knowledge in this field to interprete your problem.

If it's "only" about data transformation:
Would the cause/impact value pairs only have values for cases where there is a "yes" in the source data (i.e. "product a"/"product b" but not "product a"/"product c")?
chang_y_chung_hotmail_com
Obsidian | Level 7
Here is one way. HTH.

[pre]
/* test data */
data mat;
infile cards firstobs=2;
input id $ (a b c) ($);
cards;
a b c
a y y n
b y n n
c n n n
;
run;

/* matrix to edge-list */
data edgelist;
set mat;

length from to $1;
array val[1:3] $ a--c;
retain nodes "a b c";

do idx = 1 to dim(val);
if val[idx] ^= "y" then continue;
from = id;
to = scan(nodes, idx);
output;
end;

keep from to;
run;

/* check */
proc print data=edgelist noobs;
run;
/* on log
from to
a a
a b
b a
*/
[/pre]

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 722 views
  • 0 likes
  • 3 in conversation