Construct a dataset using information from two separate datasets

Solved
Occasional Contributor
Posts: 19

Construct a dataset using information from two separate datasets

I have two datasets. One of the dataset contains 10 ID number for each observation (a total of 1000 observation). Hence the dataset is a 1000 by 10 matrix. Each ID number is an ID number for a location that contains a set of information, such as housing price, school quality, share of green space. It looks like:

c1       c2    c3     c4  ...  c10       (columes are 10 IDs corresponding to each observations)

obs1          91      72      4     87         54

obs2          34      76     95     8         100

:

:

obs1000    1         3      90     78        64

Another dataset has all the information for all the locations (ID is the indicator which location it is). The dataset looks like:

housing price      school quality index    share of greenspace

id1    1000,000,000                  2                               50%

id2

:

:

id100   40,000,000                  5                                15%

I would like to construct a unified dataset by combining information from the two datasets. It will look like:

housing price     school quaility index     share of greenspace  |   housing price     school quaility index     share of greenspace

obs1              info for location with ID91                                                            info for location with ID72

obs2              info for location with ID34                                                            info for location with ID76

:

:

obs100

Therefore the final dataset is a 100 (number of observation) by 10 (10 ID for each observation )*3 (number of variables correspond to each ID) matrix.

I am having trouble to find a way to achieve the final goal. Any help will be much appreciated!

Thank you!!

Accepted Solutions
Solution
‎03-11-2017 10:14 PM
PROC Star
Posts: 8,150

Re: Construct a dataset using information from two separate datasets

Not sure what you are trying to do regarding the random selection part. Since I've never liked doing double transposes (which is what you have to do to get from wide to wider), I always do that kind of transposition with the macro you can download for free at:

http://www.sascommunity.org/wiki/A_Better_Way_to_Flip_(Transpose)_a_SAS_Dataset

That said, while I only tested this on a 5 id case (like you presented data for in your example), the following appears to do what you want:

```data obs;
input obs id1-id5;
cards;
1          91      72      4     87         54
2          34      76     95     8         100
1000    1         3      90     78        64
;

proc transpose data=obs out=tobs (rename=(col1=id));
var id1-id5;
by obs;
run;

data tobs (drop=_:);
set tobs;
pref=input(substr(_name_,3),8.);
run;

data locs;
input id housing_price school_quality_index    share_of_greenspace;
cards;
91    1000000000                  2                               50
4      150000000               1                                35
76   40000000                  5                                15
;

proc sql;
create table need as
select * from
tobs a
left join
locs b
on a.id eq b.id
order by obs,pref
;
quit;

%transpose(data=need, out=want, by=obs, id=pref, guessingrows=1000,
delimiter=_, var=housing_price school_quality_index share_of_greenspace)
```

HTH,

Art, CEO, AnalystFinder.com

All Replies
Super User
Posts: 23,342

Re: Construct a dataset using information from two separate datasets

Transpose dataset1 to a long format and then join it to the second dataset. You can then make it wide again with a transpose, but the long format is likely to be more useful.

Use PROC TRANSPOSE to flip the datasets.

Occasional Contributor
Posts: 19

Re: Construct a dataset using information from two separate datasets

[ Edited ]

After transpose data1 from wide to long format, each column has 10 IDs an observation chooses. The 10 IDs are randomly chosen from 1 to 100 IDs in the second datasets.

So if I want to merge the first dataset with the second dataset, I need to do it for each column.

That might work.

PROC Star
Posts: 8,150

Re: Construct a dataset using information from two separate datasets

Actually, sounds like you want a 1000 X 10 X 4 matrix .. unless you don't need to keep track of the IDs in which case it would be a 1000 X 10 X3 matrix.  Are you sure that's what you want? Do you need to keep track of the IDs?

Art, CEO, AnalystFinder.com

Occasional Contributor
Posts: 19

Re: Construct a dataset using information from two separate datasets

I don't think I need to track the IDs.

As long as the variables for each ID enter into the corresponding columes for each observation in the final dataset, It is OK to not have IDs.

So the first three colums in the final dataset are the housing price, school quality, and share of greenspace for ID 92, because that is the first choice for observation 1, The 3-6 columns are housing price, school quality, and share of green space for ID 72 (the second choice for observation 1). The last three columns will be housing price, school quality, and share of green space for ID 54 for observation 1.

Solution
‎03-11-2017 10:14 PM
PROC Star
Posts: 8,150

Re: Construct a dataset using information from two separate datasets

Not sure what you are trying to do regarding the random selection part. Since I've never liked doing double transposes (which is what you have to do to get from wide to wider), I always do that kind of transposition with the macro you can download for free at:

http://www.sascommunity.org/wiki/A_Better_Way_to_Flip_(Transpose)_a_SAS_Dataset

That said, while I only tested this on a 5 id case (like you presented data for in your example), the following appears to do what you want:

```data obs;
input obs id1-id5;
cards;
1          91      72      4     87         54
2          34      76     95     8         100
1000    1         3      90     78        64
;

proc transpose data=obs out=tobs (rename=(col1=id));
var id1-id5;
by obs;
run;

data tobs (drop=_:);
set tobs;
pref=input(substr(_name_,3),8.);
run;

data locs;
input id housing_price school_quality_index    share_of_greenspace;
cards;
91    1000000000                  2                               50
4      150000000               1                                35
76   40000000                  5                                15
;

proc sql;
create table need as
select * from
tobs a
left join
locs b
on a.id eq b.id
order by obs,pref
;
quit;

%transpose(data=need, out=want, by=obs, id=pref, guessingrows=1000,
delimiter=_, var=housing_price school_quality_index share_of_greenspace)
```

HTH,

Art, CEO, AnalystFinder.com

Occasional Contributor
Posts: 19

Re: Construct a dataset using information from two separate datasets

Thank you very much Art! I will try it out and see how it works out!!

Occasional Contributor
Posts: 19

Re: Construct a dataset using information from two separate datasets

Thank you Art. Your method worked.