BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Deester
Obsidian | Level 7

I have two datasets. One of the dataset contains 10 ID number for each observation (a total of 1000 observation). Hence the dataset is a 1000 by 10 matrix. Each ID number is an ID number for a location that contains a set of information, such as housing price, school quality, share of green space. It looks like:

 

                  c1       c2    c3     c4  ...  c10       (columes are 10 IDs corresponding to each observations)

obs1          91      72      4     87         54

obs2          34      76     95     8         100

:

:

obs1000    1         3      90     78        64

 

Another dataset has all the information for all the locations (ID is the indicator which location it is). The dataset looks like:

         housing price      school quality index    share of greenspace

id1    1000,000,000                  2                               50%

id2      

:

:

id100   40,000,000                  5                                15%

 

I would like to construct a unified dataset by combining information from the two datasets. It will look like:

                                                                                                                          

          housing price     school quaility index     share of greenspace  |   housing price     school quaility index     share of greenspace

obs1              info for location with ID91                                                            info for location with ID72

obs2              info for location with ID34                                                            info for location with ID76

:

:

obs100

 

Therefore the final dataset is a 100 (number of observation) by 10 (10 ID for each observation )*3 (number of variables correspond to each ID) matrix. 

 

I am having trouble to find a way to achieve the final goal. Any help will be much appreciated!

 

Thank you!!

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

Not sure what you are trying to do regarding the random selection part. Since I've never liked doing double transposes (which is what you have to do to get from wide to wider), I always do that kind of transposition with the macro you can download for free at:

http://www.sascommunity.org/wiki/A_Better_Way_to_Flip_(Transpose)_a_SAS_Dataset

 

That said, while I only tested this on a 5 id case (like you presented data for in your example), the following appears to do what you want:

 

data obs;
  input obs id1-id5;
  cards;
1          91      72      4     87         54
2          34      76     95     8         100
1000    1         3      90     78        64
;

proc transpose data=obs out=tobs (rename=(col1=id));
  var id1-id5;
  by obs;
run;

data tobs (drop=_:);
  set tobs;
  pref=input(substr(_name_,3),8.);
run;

data locs;
  input id housing_price school_quality_index    share_of_greenspace;
  cards;
91    1000000000                  2                               50
4      150000000               1                                35
76   40000000                  5                                15
;

proc sql;
  create table need as
    select * from
      tobs a
        left join
          locs b
            on a.id eq b.id
              order by obs,pref
  ;
quit;

%transpose(data=need, out=want, by=obs, id=pref, guessingrows=1000,
 delimiter=_, var=housing_price school_quality_index share_of_greenspace)

HTH,

Art, CEO, AnalystFinder.com

 

View solution in original post

7 REPLIES 7
Reeza
Super User

Transpose dataset1 to a long format and then join it to the second dataset. You can then make it wide again with a transpose, but the long format is likely to be more useful. 

 

Use PROC TRANSPOSE to flip the datasets. 

Deester
Obsidian | Level 7

After transpose data1 from wide to long format, each column has 10 IDs an observation chooses. The 10 IDs are randomly chosen from 1 to 100 IDs in the second datasets.

So if I want to merge the first dataset with the second dataset, I need to do it for each column.

That might work. 

 

art297
Opal | Level 21

Actually, sounds like you want a 1000 X 10 X 4 matrix .. unless you don't need to keep track of the IDs in which case it would be a 1000 X 10 X3 matrix.  Are you sure that's what you want? Do you need to keep track of the IDs?

 

Art, CEO, AnalystFinder.com

 

Deester
Obsidian | Level 7

I don't think I need to track the IDs. 

As long as the variables for each ID enter into the corresponding columes for each observation in the final dataset, It is OK to not have IDs.

So the first three colums in the final dataset are the housing price, school quality, and share of greenspace for ID 92, because that is the first choice for observation 1, The 3-6 columns are housing price, school quality, and share of green space for ID 72 (the second choice for observation 1). The last three columns will be housing price, school quality, and share of green space for ID 54 for observation 1.

 

 

art297
Opal | Level 21

Not sure what you are trying to do regarding the random selection part. Since I've never liked doing double transposes (which is what you have to do to get from wide to wider), I always do that kind of transposition with the macro you can download for free at:

http://www.sascommunity.org/wiki/A_Better_Way_to_Flip_(Transpose)_a_SAS_Dataset

 

That said, while I only tested this on a 5 id case (like you presented data for in your example), the following appears to do what you want:

 

data obs;
  input obs id1-id5;
  cards;
1          91      72      4     87         54
2          34      76     95     8         100
1000    1         3      90     78        64
;

proc transpose data=obs out=tobs (rename=(col1=id));
  var id1-id5;
  by obs;
run;

data tobs (drop=_:);
  set tobs;
  pref=input(substr(_name_,3),8.);
run;

data locs;
  input id housing_price school_quality_index    share_of_greenspace;
  cards;
91    1000000000                  2                               50
4      150000000               1                                35
76   40000000                  5                                15
;

proc sql;
  create table need as
    select * from
      tobs a
        left join
          locs b
            on a.id eq b.id
              order by obs,pref
  ;
quit;

%transpose(data=need, out=want, by=obs, id=pref, guessingrows=1000,
 delimiter=_, var=housing_price school_quality_index share_of_greenspace)

HTH,

Art, CEO, AnalystFinder.com

 

Deester
Obsidian | Level 7

Thank you very much Art! I will try it out and see how it works out!!

 

 

Deester
Obsidian | Level 7

Thank you Art. Your method worked. 

I really appreciate your help!!!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1047 views
  • 0 likes
  • 3 in conversation