Help using Base SAS procedures

Add dataset-name as new variable and repeat for a large number of datasets

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 9
Accepted Solution

Add dataset-name as new variable and repeat for a large number of datasets

Hello, 

 

I am stuck on a procedure that I really need to automize, but I am very new to writing own code in SAS.

My problem could be summarized as follows:

 

input > 500 datasets (named dataset_name) having X variables.

output = 1 dataset having X+1 variables ("+1" = "name" in dataset_name).

 

What this means is that I have a library that contains over 500 datasets that I'd like to consolidate. All files are named with the same prefix, followed by the date, e.g.

 

data_2017-01-06

data_2017-01-18

data_2017-02-08,

 

and so on. What I need to do is to create a new dataset containing all observations from all datasets, while I also need to be able to identify at which date the observation was made (i.e. the date-part of the dataset name). I would like to store only the YYYY-MM-DD part of the name as observations (preferably as characters).

 

Is there any friendly soul out there who could help me to solve this problem?


Accepted Solutions
Solution
‎02-08-2017 08:49 AM
Super User
Posts: 5,081

Re: Add dataset-name as new variable and repeat for a large number of datasets

SAS actually contains very helpful tools for this.  Assuming that you want all data set names that begin with data_2017 ...

 

data want;

set lib.data_2017: indsname=complete_name;

date_id = scan(complete_name, 2, '_');

run;

 

The colon in data_2017: gets you all data set names that begin with those characters.  INDSNAME= creates a variable holding the name of the incoming data set.

View solution in original post


All Replies
Solution
‎02-08-2017 08:49 AM
Super User
Posts: 5,081

Re: Add dataset-name as new variable and repeat for a large number of datasets

SAS actually contains very helpful tools for this.  Assuming that you want all data set names that begin with data_2017 ...

 

data want;

set lib.data_2017: indsname=complete_name;

date_id = scan(complete_name, 2, '_');

run;

 

The colon in data_2017: gets you all data set names that begin with those characters.  INDSNAME= creates a variable holding the name of the incoming data set.

Community Manager
Posts: 2,761

Re: Add dataset-name as new variable and repeat for a large number of datasets

@Astounding beat me to it.  I was just having to use this technique myself!  Here's what I did with my collection of daily data sets that have the name pattern "GA_DAILYyyyymmdd".  Use the colon operator to match the name pattern, and the INDSNAME= option to capture the input data set name.

 

data consolidated;
  length source $ 32;
  set ga.ga_daily: indsname=in;
  source = in;
run;

Result:

 

indsname.png

Occasional Contributor
Posts: 9

Re: Add dataset-name as new variable and repeat for a large number of datasets

Hi ChrisHemedinger! Thank you so much for your help! The code worked almost perfect, the only thing was that it stored the entire file-name, but that is really a trifle. Thank you, and thank you again!
Occasional Contributor
Posts: 9

Re: Add dataset-name as new variable and repeat for a large number of datasets

Hi Astounding! Waow, I really did not expect help that fast! Your code works PERFEKT! Thank you so much for your help, I could not have figured this out myself in time!
Occasional Contributor
Posts: 9

Re: Add dataset-name as new variable and repeat for a large number of datasets

Thank you @Astounding @ChrisHemedinger, I am forever grateful to your quick support!

Occasional Contributor
Posts: 9

Re: Add dataset-name as new variable and repeat for a large number of datasets

Hi again, and sorry for bothering about this topic again.

 

The solution worked fine as long as the dataset-formats were equal. When running over al my sets, I noticed that in some -not all- of my files there exists a variable X. Sometimes this one is defined as numeric and sometimes as character, resulting in an error: 

 

Variable X has been defined as both character and numeric

 

I tried to solve it using 

 

set lib.data_2017: (rename=X) or (drop=X),

 

and so on, but this won't work since the variable is not in all datasets. Any ideas on how to proceed? 

 

 

 

 

 

Super User
Posts: 5,081

Re: Add dataset-name as new variable and repeat for a large number of datasets

[ Edited ]

There's only one quick solution that I know of  ... and I'm not 100% sure it would work.  It would only apply if X is the one and only variable name that begins with "X" in your data set(s).  In that case, you could try ( drop=x: )

 

If there are other variable names that begin with "X", however, this approach would drop those other variables as well.  More cumbersome methods are required.

 

If this does work, it might generate a warning.

 

Regardless, the final solution will not be able to combine the data sets if a variable is character in one data set and numeric in another.

Occasional Contributor
Posts: 9

Re: Add dataset-name as new variable and repeat for a large number of datasets

Again, thank you @Astounding for your help. That solution you proposed I already tried, and it does unfortuntly not work. I solved it very very badly but at least I can continue. Since all files have almost the same name (what differs is the date), I made 31 single conversion for each day in the month: 

 

data want_YYYY_MM_DD;
set have._YYYY_MM_DD (rename=(TROUBLE_VARIABLE=TEMP));
TROUBLE_VARIABLE = put(TEMP, 7.);
drop TEMP;
run;

 

Next by doing "find and replace" on the YYYY and MM numbers, I could repeat the procedure for every month and every year in my sample period. As said, this was very non-fancy, but it worked! 

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 254 views
  • 4 likes
  • 3 in conversation