BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
TomiKong
Fluorite | Level 6

Hi all,

I'm performing one study about social network analysis. Before analysis, I need to transform my data in the appropriate format. My raw data are formatted like this:

TomJack
JackMike
TomHellen
DoraHellen
RossJack
WillPaul
PaulHoward
PaulAndy
AndyKim
AndyTom

One row means the two guys in the two columns are directly conneceted  (No matter left-right connected or right-left connected). I need to transform my data like this( if the two guys are directly connectted, the content in the matrix are 1. Finally,one symmetric matrxi will be created):

namesAndyDoraHellenHowardJackKimMikePaulRossTomWill
Andy10000101010
Dora01000000000
Hellen00100000010
Howard00010001000
Jack00001010110
Kim10000100000
Mike00001010000
Paul10010001000
Ross00001000100
Tom10101000010
Will00000000001

How to achieve this? Thanks in advance.

Tom

1 ACCEPTED SOLUTION

Accepted Solutions
Haikuo
Onyx | Level 15

Obviously this is matrix problem, so Proc IML would probably be my first choice as well. However, Data step can also address this problem using Hash(). This may look a little bit complex, but if you want to build a social network deeper than just one-layer direct connection, then Hash() will  get  very handy quickly.

data have;

input var1 $ var2 $;

cards;

Tom   Jack

Jack  Mike

Tom   Hellen

Dora  Hellen

Ross  Jack

Will  Paul

Paul  Howard

Paul  Andy

Andy  Kim

Andy  Tom

;

data h1;

  set have;

    var=var1;output;

      var=var2;output;

run;

proc sql;

  select distinct var into :a_var separated by ' ' from h1;

    select distinct var into :c_var separated by ', ' from h1;

  select distinct quote(cats(var)) into :h_var separated by ',' from h1;

  quit;

data _null_;

  if _n_=1 then do;

    if 0 then set have;

      declare hash v1(dataset:'have', multidata:'y');

      v1.definekey('var1');

      v1.definedata(all:'y');

      v1.definedone();

            declare hash v2(dataset:'have', multidata:'y');

      v2.definekey('var2');

      v2.definedata(all:'y');

      v2.definedone();

   declare hash want(ordered:'a');

   want.definekey('var');

   want.definedata('var',&h_var.);

   want.definedone();

   end;

   set h1 end=last;

    array _m  &a_var ;

      retain &a_var 0;

      if _n_>1 then do over _m;

_m=0;

       end;

      rc=want.find();

      rc=v1.find(key:var);

         do rc=0 by 0 while (rc=0);

do over _m;

_m=ifn(var=vname(_m) or var2=vname(_m),1,_m);

end;

rc=v1.find_next(key:var);

            end;

      

      rc=v2.find(key:var);

         do rc=0 by 0 while (rc=0);

do over _m;

_m=ifn(var=vname(_m) or var1=vname(_m),1,_m);

end;

rc=v2.find_next(key:var);

            end;

            rc=want.replace();

       if last then rc=want.output(dataset:'want');

      run;


Haikuo

View solution in original post

10 REPLIES 10
Tom
Super User Tom
Super User

Here is a post on this that uses IML.

http://nutrinjen.wordpress.com/2009/09/15/adjacency-matrix-using-sasiml/

The idea is really the same in a SAS datastep.

TomiKong
Fluorite | Level 6

Thanks. But it seems I have different data format as the wirter.....

Tom

Fugue
Quartz | Level 8

Your desired output does not seem to match the sample input data that you have provided.

For example, Andy is connected to Kim, Tom (and Paul, if you go right-to-left) according to your input data. But, your desired output has a match for Hellen even though there is no row in the input data for Andy--> Helen (or vice versa).

But Hellen is "connected" to Tom (who is connected to Andy). So does this imply that you want a match if there is a link through distant connections? So, Andy is "connected" to Tom who is "connected" to Hellen; therefore, Andy and Hellen have a 1 in the matrix? Are only right--> left connections allowed? Or, two-way? If indirect connections are permitted, how many levels are allowed?

The method that Tom linked re: adjacency matrix will work with some modifications (assuming name1 in your input is the "ego" and name2 is the "alter"), especially if you associate a unique ID with every individual.

I've attached some relatively crude code that will mostly accomplish what you want (ASSUMING only left--> right matches, only matching rows in the input file are considered as a match (not downstream matches), and you always want a 1 when the person is the same as the column name).

TomiKong
Fluorite | Level 6

Thanks. Sorry I made a mistake. Only direct connections are permitted. So there still is one problem about the codes you attached. Finally, one symmetric matrix will be created. So, on "andy" column, "kim" and "tom" rows will also be 1.




Tom

Fugue
Quartz | Level 8

The code I sent you will transform the data you supplied into a matrix. You probably missed that I set a dummy variable = 1 in the initial data step (reading in the initial values).

TomiKong
Fluorite | Level 6

Done. Thanks.

proc iml;

use havesorted;

read all var _NUM_ into m;

n_row=nrow(m);

n_col=ncol(m);

do i=1 to n_col;

   do j=1 to n_row;

      if m[j,i]=1 then m[i,j]=1;

   end;

end;

print m;

create MyData from m;

append from m;

close MyData;

quit;

Scott_Mitchell
Quartz | Level 8

I don't understand how you are defining your matrix.  You are specifying the creation of a numeric matrix, but you don't have any numeric variables.

Haikuo
Onyx | Level 15

Obviously this is matrix problem, so Proc IML would probably be my first choice as well. However, Data step can also address this problem using Hash(). This may look a little bit complex, but if you want to build a social network deeper than just one-layer direct connection, then Hash() will  get  very handy quickly.

data have;

input var1 $ var2 $;

cards;

Tom   Jack

Jack  Mike

Tom   Hellen

Dora  Hellen

Ross  Jack

Will  Paul

Paul  Howard

Paul  Andy

Andy  Kim

Andy  Tom

;

data h1;

  set have;

    var=var1;output;

      var=var2;output;

run;

proc sql;

  select distinct var into :a_var separated by ' ' from h1;

    select distinct var into :c_var separated by ', ' from h1;

  select distinct quote(cats(var)) into :h_var separated by ',' from h1;

  quit;

data _null_;

  if _n_=1 then do;

    if 0 then set have;

      declare hash v1(dataset:'have', multidata:'y');

      v1.definekey('var1');

      v1.definedata(all:'y');

      v1.definedone();

            declare hash v2(dataset:'have', multidata:'y');

      v2.definekey('var2');

      v2.definedata(all:'y');

      v2.definedone();

   declare hash want(ordered:'a');

   want.definekey('var');

   want.definedata('var',&h_var.);

   want.definedone();

   end;

   set h1 end=last;

    array _m  &a_var ;

      retain &a_var 0;

      if _n_>1 then do over _m;

_m=0;

       end;

      rc=want.find();

      rc=v1.find(key:var);

         do rc=0 by 0 while (rc=0);

do over _m;

_m=ifn(var=vname(_m) or var2=vname(_m),1,_m);

end;

rc=v1.find_next(key:var);

            end;

      

      rc=v2.find(key:var);

         do rc=0 by 0 while (rc=0);

do over _m;

_m=ifn(var=vname(_m) or var1=vname(_m),1,_m);

end;

rc=v2.find_next(key:var);

            end;

            rc=want.replace();

       if last then rc=want.output(dataset:'want');

      run;


Haikuo

TomiKong
Fluorite | Level 6

Thanks, Hai.kuo. If I have 100 thousands of IDs, the length of a_var and c_var will exceed the maximal length of one sas string. How to deal with this situation?

Tom

Vince28_Statcan
Quartz | Level 8

@Hai.Kuo,

If I recall, graph theory dictates you can rebuild chain links by simply putting your matrix ^n where n would be either the chain length you consider a chain link to still be a link or is the dimension of your matrix. You will then simply need to convert all values >1 to 1s to return to a binary matrix (it can be optimized, computationally, by using binary "or" but I am not sure that SAS supports it quite well).

@TomiKong

With regards to what Hai.Kuo wrote, a "simple" way around exceeding the 65k default length of macro variables, would be to load your var names into multiple macro variables and nest the steps using the macro variables into another macro and loop on those. This could be done as follow:

break down

proc sql;

  select distinct var into :a_var separated by ' ' from h1;

    select distinct var into :c_var separated by ', ' from h1;

  select distinct quote(cats(var)) into :h_var separated by ',' from h1;

  quit;

into

proc sql;

   

  select distinct var into :a_var1-:a_var99999999 from h1;

    select distinct var into :c_var1-:c_var99999999 from h1;

quit;

%let loopend1=&sqlobs;

proc sql;

select distinct quote(cats(var)) into :h_var1-:h_var99999999 from h1;

quit;

%let loopend2=&sqlobs;

Then nesting the rest of the code into %macro something(); %mend; you can use the fact that the following block is equivalent to &a_var in Hai.Kuo's Code.

&a_var1

%do i=2 %to &loopend1;

, &&a_var&i

%end;

On a side note, thanks for this alternative option Hai.Kuo. Everytime I needed to double quote contents using proc sql macro variable allocation, I used to use separated by "," but its less natural to have to add the starting and ending double quotes each time I deref the macro variable. I never even remotely considered the quote function as a way around.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1730 views
  • 6 likes
  • 6 in conversation