BookmarkSubscribeRSS Feed
chenyiwen1717
Fluorite | Level 6

I have 10 SAS data set with different columns (Some columns are same, and some are different). All these data sets are without identical IDs. My goal is to combine these 10 SAS data sets to 1 single master data set with identical IDs. By doing this, I will be able to retrieve any records with their original columns from master data set by select the unique IDs.

Like say, these 10 data set are from year 1990 to 2000.(e.g. DSN90 : DSN00), and I already created IDs '9000001','0900002'...for data set DSN90, and "9100001','9100002' for data set DSN91 and so on.

What I want is:

If I select ID='9000001' from master data set, I will get the first record of data set DSN90 with the its columns.

If I select ID='9100001' from master data set, I will get the first record of data set DSN91 with the its columns.


However, the columns from DSN90 is different with DSN91. What my master data set should look like? Please advise. Thanks in advance.

3 REPLIES 3
ballardw
Super User

If you combine datasets as described for your example ID='9000001' selection result will have the columns associated with all of the other data sets as well but with all missing values for those that weren't in DSN90. If you don't want those extra columns you would have to post process the data to remove any column with all missing data. If you don't mind the blank columns then something like

data want;

     set dsn90 dsn91 <etc>;

run;

I

Either that or write a very complex extraction program that would require knowing which original data set an id is from.

I have a very hard time with the "advantage" of this structure may be.

I would be very tempted to consider a data set of ID and Source data set and use that for a control to pull records from the original

Reeza
Super User

chenyiwen1717 wrote:

Some columns are same, and some are different)

That is your main issue. How do you want the output to look like for the column names that are the same? Where the columns are different?

If you already have ID's, you can use a Data Step to combine your 10 datasets relatively easily as well as to identify the source data set.

See the INDSNAME option highlighted by Rick yesterday:

http://blogs.sas.com/content/iml/2015/08/03/indsname-option.html

ndp
Quartz | Level 8 ndp
Quartz | Level 8

this is where macro/macro variables com in handy. You can change following code to suit your needs but it should help you subset data and variables.

data dsn90;
id='9000001';
var1="zzz";
output;
run;

data dsn91;
id='9100001';
var2="zzz";
output;
run;

data all; ** will include both variables var1+var2;
set dsn90 dsn91;
run;

%macro subset(_id=9100001);

/* macro variable for yr */
%let _yr=%substr(&_id,1,2);

/* macro variable for list of variables in original data for that year*/
proc sql noprint;
select distinct(name) into: _varlst separated by " "
from sashelp.vcolumn
where upcase(libname)="WORK" and upcase(memname)="DSN&_YR.";
quit;

data subset;
set all;
where id="&_id.";
keep &_varlst.;
run;

%mend;

options symbolgen mlogic mprint;
%subset;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 670 views
  • 0 likes
  • 4 in conversation