Solved: Re: Construct a dataset using information from two separate datasets

Deester · Posted 03-11-2017 02:26 PM

I have two datasets. One of the dataset contains 10 ID number for each observation (a total of 1000 observation). Hence the dataset is a 1000 by 10 matrix. Each ID number is an ID number for a location that contains a set of information, such as housing price, school quality, share of green space. It looks like:

c1 c2 c3 c4 ... c10 (columes are 10 IDs corresponding to each observations)

obs1 91 72 4 87 54

obs2 34 76 95 8 100

:

obs1000 1 3 90 78 64

Another dataset has all the information for all the locations (ID is the indicator which location it is). The dataset looks like:

housing price school quality index share of greenspace

id1 1000,000,000 2 50%

id2

:

id100 40,000,000 5 15%

I would like to construct a unified dataset by combining information from the two datasets. It will look like:

housing price school quaility index share of greenspace | housing price school quaility index share of greenspace

obs1 info for location with ID91 info for location with ID72

obs2 info for location with ID34 info for location with ID76

:

obs100

Therefore the final dataset is a 100 (number of observation) by 10 (10 ID for each observation )*3 (number of variables correspond to each ID) matrix.

I am having trouble to find a way to achieve the final goal. Any help will be much appreciated!

Thank you!!

art297 · Posted 03-11-2017 03:31 PM

Not sure what you are trying to do regarding the random selection part. Since I've never liked doing double transposes (which is what you have to do to get from wide to wider), I always do that kind of transposition with the macro you can download for free at:

http://www.sascommunity.org/wiki/A_Better_Way_to_Flip_(Transpose)_a_SAS_Dataset

That said, while I only tested this on a 5 id case (like you presented data for in your example), the following appears to do what you want:

data obs;
  input obs id1-id5;
  cards;
1          91      72      4     87         54
2          34      76     95     8         100
1000    1         3      90     78        64
;

proc transpose data=obs out=tobs (rename=(col1=id));
  var id1-id5;
  by obs;
run;

data tobs (drop=_:);
  set tobs;
  pref=input(substr(_name_,3),8.);
run;

data locs;
  input id housing_price school_quality_index    share_of_greenspace;
  cards;
91    1000000000                  2                               50
4      150000000               1                                35
76   40000000                  5                                15
;

proc sql;
  create table need as
    select * from
      tobs a
        left join
          locs b
            on a.id eq b.id
              order by obs,pref
  ;
quit;

%transpose(data=need, out=want, by=obs, id=pref, guessingrows=1000,
 delimiter=_, var=housing_price school_quality_index share_of_greenspace)

HTH,

Art, CEO, AnalystFinder.com

View solution in original post

Reeza · Posted 03-11-2017 02:48 PM

Transpose dataset1 to a long format and then join it to the second dataset. You can then make it wide again with a transpose, but the long format is likely to be more useful.

Use PROC TRANSPOSE to flip the datasets.

Deester · Posted 03-11-2017 03:17 PM

After transpose data1 from wide to long format, each column has 10 IDs an observation chooses. The 10 IDs are randomly chosen from 1 to 100 IDs in the second datasets.

So if I want to merge the first dataset with the second dataset, I need to do it for each column.

That might work.

art297 · Posted 03-11-2017 02:53 PM

Actually, sounds like you want a 1000 X 10 X 4 matrix .. unless you don't need to keep track of the IDs in which case it would be a 1000 X 10 X3 matrix. Are you sure that's what you want? Do you need to keep track of the IDs?

Art, CEO, AnalystFinder.com

Deester · Posted 03-11-2017 03:08 PM

I don't think I need to track the IDs.

As long as the variables for each ID enter into the corresponding columes for each observation in the final dataset, It is OK to not have IDs.

So the first three colums in the final dataset are the housing price, school quality, and share of greenspace for ID 92, because that is the first choice for observation 1, The 3-6 columns are housing price, school quality, and share of green space for ID 72 (the second choice for observation 1). The last three columns will be housing price, school quality, and share of green space for ID 54 for observation 1.

art297 · Posted 03-11-2017 03:31 PM

Not sure what you are trying to do regarding the random selection part. Since I've never liked doing double transposes (which is what you have to do to get from wide to wider), I always do that kind of transposition with the macro you can download for free at:

http://www.sascommunity.org/wiki/A_Better_Way_to_Flip_(Transpose)_a_SAS_Dataset

That said, while I only tested this on a 5 id case (like you presented data for in your example), the following appears to do what you want:

data obs;
  input obs id1-id5;
  cards;
1          91      72      4     87         54
2          34      76     95     8         100
1000    1         3      90     78        64
;

proc transpose data=obs out=tobs (rename=(col1=id));
  var id1-id5;
  by obs;
run;

data tobs (drop=_:);
  set tobs;
  pref=input(substr(_name_,3),8.);
run;

data locs;
  input id housing_price school_quality_index    share_of_greenspace;
  cards;
91    1000000000                  2                               50
4      150000000               1                                35
76   40000000                  5                                15
;

proc sql;
  create table need as
    select * from
      tobs a
        left join
          locs b
            on a.id eq b.id
              order by obs,pref
  ;
quit;

%transpose(data=need, out=want, by=obs, id=pref, guessingrows=1000,
 delimiter=_, var=housing_price school_quality_index share_of_greenspace)

HTH,

Art, CEO, AnalystFinder.com

Deester · Posted 03-11-2017 03:35 PM

Thank you very much Art! I will try it out and see how it works out!!

Deester · Posted 03-11-2017 10:15 PM

Thank you Art. Your method worked.

I really appreciate your help!!!

SAS Innovate 2025: Call for Content