Solved: Re: Add dataset-name as new variable and repeat for a large number of ...

MiniRadde · Posted 02-08-2017 08:12 AM

Hello,

I am stuck on a procedure that I really need to automize, but I am very new to writing own code in SAS.

My problem could be summarized as follows:

input > 500 datasets (named dataset_name) having X variables.

output = 1 dataset having X+1 variables ("+1" = "name" in dataset_name).

What this means is that I have a library that contains over 500 datasets that I'd like to consolidate. All files are named with the same prefix, followed by the date, e.g.

data_2017-01-06

data_2017-01-18

data_2017-02-08,

and so on. What I need to do is to create a new dataset containing all observations from all datasets, while I also need to be able to identify at which date the observation was made (i.e. the date-part of the dataset name). I would like to store only the YYYY-MM-DD part of the name as observations (preferably as characters).

Is there any friendly soul out there who could help me to solve this problem?

Astounding · Posted 02-08-2017 08:23 AM

SAS actually contains very helpful tools for this. Assuming that you want all data set names that begin with data_2017 ...

data want;

set lib.data_2017: indsname=complete_name;

date_id = scan(complete_name, 2, '_');

run;

The colon in data_2017: gets you all data set names that begin with those characters. INDSNAME= creates a variable holding the name of the incoming data set.

View solution in original post

Astounding · Posted 02-08-2017 08:23 AM

SAS actually contains very helpful tools for this. Assuming that you want all data set names that begin with data_2017 ...

data want;

set lib.data_2017: indsname=complete_name;

date_id = scan(complete_name, 2, '_');

run;

The colon in data_2017: gets you all data set names that begin with those characters. INDSNAME= creates a variable holding the name of the incoming data set.

ChrisHemedinger · Posted 02-08-2017 08:30 AM

@Astounding beat me to it. I was just having to use this technique myself! Here's what I did with my collection of daily data sets that have the name pattern "GA_DAILYyyyymmdd". Use the colon operator to match the name pattern, and the INDSNAME= option to capture the input data set name.

data consolidated;
  length source $ 32;
  set ga.ga_daily: indsname=in;
  source = in;
run;

Result:

SAS For Dummies 3rd Edition! Check out the new edition, covering SAS 9.4, SAS Viya, and all of the modern ways to use SAS!

MiniRadde · Posted 02-08-2017 08:46 AM

Hi ChrisHemedinger! Thank you so much for your help! The code worked almost perfect, the only thing was that it stored the entire file-name, but that is really a trifle. Thank you, and thank you again!

MiniRadde · Posted 02-08-2017 08:49 AM

Hi Astounding! Waow, I really did not expect help that fast! Your code works PERFEKT! Thank you so much for your help, I could not have figured this out myself in time!

MiniRadde · Posted 02-08-2017 08:53 AM

Thank you @Astounding and @ChrisHemedinger, I am forever grateful to your quick support!

MiniRadde · Posted 02-13-2017 04:09 AM

Hi again, and sorry for bothering about this topic again.

The solution worked fine as long as the dataset-formats were equal. When running over al my sets, I noticed that in some -not all- of my files there exists a variable X. Sometimes this one is defined as numeric and sometimes as character, resulting in an error:

Variable X has been defined as both character and numeric

I tried to solve it using

set lib.data_2017: (rename=X) or (drop=X),

and so on, but this won't work since the variable is not in all datasets. Any ideas on how to proceed?

Astounding · Posted 02-13-2017 04:53 PM

There's only one quick solution that I know of ... and I'm not 100% sure it would work. It would only apply if X is the one and only variable name that begins with "X" in your data set(s). In that case, you could try ( drop=x: )

If there are other variable names that begin with "X", however, this approach would drop those other variables as well. More cumbersome methods are required.

If this does work, it might generate a warning.

Regardless, the final solution will not be able to combine the data sets if a variable is character in one data set and numeric in another.

MiniRadde · Posted 02-16-2017 07:27 AM

Again, thank you @Astounding for your help. That solution you proposed I already tried, and it does unfortuntly not work. I solved it very very badly but at least I can continue. Since all files have almost the same name (what differs is the date), I made 31 single conversion for each day in the month:

data want_YYYY_MM_DD;
set have._YYYY_MM_DD (rename=(TROUBLE_VARIABLE=TEMP));
TROUBLE_VARIABLE = put(TEMP, 7.);
drop TEMP;
run;

Next by doing "find and replace" on the YYYY and MM numbers, I could repeat the procedure for every month and every year in my sample period. As said, this was very non-fancy, but it worked!

Add dataset-name as new variable and repeat for a large number of datasets

Re: Add dataset-name as new variable and repeat for a large number of datasets

Re: Add dataset-name as new variable and repeat for a large number of datasets

Re: Add dataset-name as new variable and repeat for a large number of datasets

Re: Add dataset-name as new variable and repeat for a large number of datasets

Re: Add dataset-name as new variable and repeat for a large number of datasets

Re: Add dataset-name as new variable and repeat for a large number of datasets

Re: Add dataset-name as new variable and repeat for a large number of datasets

Re: Add dataset-name as new variable and repeat for a large number of datasets

Re: Add dataset-name as new variable and repeat for a large number of datasets

Registration is open

Registration is open

SAS Training: Just a Click Away