Hello,
I am stuck on a procedure that I really need to automize, but I am very new to writing own code in SAS.
My problem could be summarized as follows:
input > 500 datasets (named dataset_name) having X variables.
output = 1 dataset having X+1 variables ("+1" = "name" in dataset_name).
What this means is that I have a library that contains over 500 datasets that I'd like to consolidate. All files are named with the same prefix, followed by the date, e.g.
data_2017-01-06
data_2017-01-18
data_2017-02-08,
and so on. What I need to do is to create a new dataset containing all observations from all datasets, while I also need to be able to identify at which date the observation was made (i.e. the date-part of the dataset name). I would like to store only the YYYY-MM-DD part of the name as observations (preferably as characters).
Is there any friendly soul out there who could help me to solve this problem?
SAS actually contains very helpful tools for this. Assuming that you want all data set names that begin with data_2017 ...
data want;
set lib.data_2017: indsname=complete_name;
date_id = scan(complete_name, 2, '_');
run;
The colon in data_2017: gets you all data set names that begin with those characters. INDSNAME= creates a variable holding the name of the incoming data set.
SAS actually contains very helpful tools for this. Assuming that you want all data set names that begin with data_2017 ...
data want;
set lib.data_2017: indsname=complete_name;
date_id = scan(complete_name, 2, '_');
run;
The colon in data_2017: gets you all data set names that begin with those characters. INDSNAME= creates a variable holding the name of the incoming data set.
@Astounding beat me to it. I was just having to use this technique myself! Here's what I did with my collection of daily data sets that have the name pattern "GA_DAILYyyyymmdd". Use the colon operator to match the name pattern, and the INDSNAME= option to capture the input data set name.
data consolidated;
length source $ 32;
set ga.ga_daily: indsname=in;
source = in;
run;
Result:
Thank you @Astounding and @ChrisHemedinger, I am forever grateful to your quick support!
Hi again, and sorry for bothering about this topic again.
The solution worked fine as long as the dataset-formats were equal. When running over al my sets, I noticed that in some -not all- of my files there exists a variable X. Sometimes this one is defined as numeric and sometimes as character, resulting in an error:
Variable X has been defined as both character and numeric
I tried to solve it using
set lib.data_2017: (rename=X) or (drop=X),
and so on, but this won't work since the variable is not in all datasets. Any ideas on how to proceed?
There's only one quick solution that I know of ... and I'm not 100% sure it would work. It would only apply if X is the one and only variable name that begins with "X" in your data set(s). In that case, you could try ( drop=x: )
If there are other variable names that begin with "X", however, this approach would drop those other variables as well. More cumbersome methods are required.
If this does work, it might generate a warning.
Regardless, the final solution will not be able to combine the data sets if a variable is character in one data set and numeric in another.
Again, thank you @Astounding for your help. That solution you proposed I already tried, and it does unfortuntly not work. I solved it very very badly but at least I can continue. Since all files have almost the same name (what differs is the date), I made 31 single conversion for each day in the month:
data want_YYYY_MM_DD;
set have._YYYY_MM_DD (rename=(TROUBLE_VARIABLE=TEMP));
TROUBLE_VARIABLE = put(TEMP, 7.);
drop TEMP;
run;
Next by doing "find and replace" on the YYYY and MM numbers, I could repeat the procedure for every month and every year in my sample period. As said, this was very non-fancy, but it worked!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.