BookmarkSubscribeRSS Feed
bollibompa
Quartz | Level 8

I have a library with data from four years (2010-2014) each dataset includes two variables: id and source

library:

data_2010

data_2011

data_2012

data_2013

I now want to append these years and in the append step (or what step the best for doing this...) create a variable that tells you which year the data comes from. Like this:

id     source     year (new)

1      a             2010     

2      a             2010

1      b            2011

2      b           2011

and so on...

I'd rather not create the variable in each dataset because they are huge

Is it possible to create this variable in the append step?

/Thomas


6 REPLIES 6
LinusH
Tourmaline | Level 20

You could create views which contains the years, have proc append read from those instead.

Data never sleeps
BrunoMueller
SAS Super FREQ

Hi bollibompa

See code sample below, it uses some techniques available since SAS9.2 where you can specify data sets lists and the INDSNAME= option on the SET statement.

Bruno

data data_2010 data_2011  data_2012 data_2013;
  do id = 1 to 5;
    source = char(
"ABCDEFG", id);
    output;
 
end;
run;

data data_want;
  set data_2010 - data_2013 indsname=_dsname;
  length year $ 4;
  year = scan(_dsname,
2, "_");
run;
Tom
Super User Tom
Super User

INDSNAME option on the SET statement is what you want.

But you might also want to look into using a BY statement with your SET so that the final dataset will still be sorted by ID.  And if you order the datasets in the SET statement properly then the final dataset will also be sorted by YEAR within ID.

data want ;

  length indsname $41 ;

  set data_2010-data_2013 indsname=indsname ;

  by id source ;

  year = input(scan(idsname,-1,'_'),4.);

run;

stat_sas
Ammonite | Level 13

How about this one

proc sql;

insert into all

select id,source, 2010 as source_file

from  data_2010

UNION

select id,source, 2011 as source_file

from  data_2011

UNION

select id,source, 2012 as source_file

from  data_2012

UNION

select id,source, 2013 as source_file

from  data_2013;

quit;

bollibompa
Quartz | Level 8

Thank you all for quick  replies!

INDSNAME was exactly what I was looking for, works perfect!

Thomas

ArunSingapore
Calcite | Level 5

Hi,

 

I am reading huge xml files using xmlv2 libname and xmlmapper file.

After reading the each file, I need to add a source filename to the output dataset.

I can not use INDSNAME option because I must use PROC APPEND.

 

Because my source file has missing columns or additional new columns every time.

This can be done only using PROC APPEND with FORCE option.

Hence kindly guide me how can I add a new column in PROC APPEND?

 

Thanks

Arun

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1647 views
  • 0 likes
  • 6 in conversation