BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MiniRadde
Obsidian | Level 7

Hello, 

 

I am stuck on a procedure that I really need to automize, but I am very new to writing own code in SAS.

My problem could be summarized as follows:

 

input > 500 datasets (named dataset_name) having X variables.

output = 1 dataset having X+1 variables ("+1" = "name" in dataset_name).

 

What this means is that I have a library that contains over 500 datasets that I'd like to consolidate. All files are named with the same prefix, followed by the date, e.g.

 

data_2017-01-06

data_2017-01-18

data_2017-02-08,

 

and so on. What I need to do is to create a new dataset containing all observations from all datasets, while I also need to be able to identify at which date the observation was made (i.e. the date-part of the dataset name). I would like to store only the YYYY-MM-DD part of the name as observations (preferably as characters).

 

Is there any friendly soul out there who could help me to solve this problem?

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

SAS actually contains very helpful tools for this.  Assuming that you want all data set names that begin with data_2017 ...

 

data want;

set lib.data_2017: indsname=complete_name;

date_id = scan(complete_name, 2, '_');

run;

 

The colon in data_2017: gets you all data set names that begin with those characters.  INDSNAME= creates a variable holding the name of the incoming data set.

View solution in original post

8 REPLIES 8
Astounding
PROC Star

SAS actually contains very helpful tools for this.  Assuming that you want all data set names that begin with data_2017 ...

 

data want;

set lib.data_2017: indsname=complete_name;

date_id = scan(complete_name, 2, '_');

run;

 

The colon in data_2017: gets you all data set names that begin with those characters.  INDSNAME= creates a variable holding the name of the incoming data set.

ChrisHemedinger
Community Manager

@Astounding beat me to it.  I was just having to use this technique myself!  Here's what I did with my collection of daily data sets that have the name pattern "GA_DAILYyyyymmdd".  Use the colon operator to match the name pattern, and the INDSNAME= option to capture the input data set name.

 

data consolidated;
  length source $ 32;
  set ga.ga_daily: indsname=in;
  source = in;
run;

Result:

 

indsname.png

Register for SAS Innovate 2025!! The premier event for SAS users, May 6-9 in Orlando FL. Sign up now for the best deals!
MiniRadde
Obsidian | Level 7
Hi ChrisHemedinger! Thank you so much for your help! The code worked almost perfect, the only thing was that it stored the entire file-name, but that is really a trifle. Thank you, and thank you again!
MiniRadde
Obsidian | Level 7
Hi Astounding! Waow, I really did not expect help that fast! Your code works PERFEKT! Thank you so much for your help, I could not have figured this out myself in time!
MiniRadde
Obsidian | Level 7

Thank you @Astounding @ChrisHemedinger, I am forever grateful to your quick support!

MiniRadde
Obsidian | Level 7

Hi again, and sorry for bothering about this topic again.

 

The solution worked fine as long as the dataset-formats were equal. When running over al my sets, I noticed that in some -not all- of my files there exists a variable X. Sometimes this one is defined as numeric and sometimes as character, resulting in an error: 

 

Variable X has been defined as both character and numeric

 

I tried to solve it using 

 

set lib.data_2017: (rename=X) or (drop=X),

 

and so on, but this won't work since the variable is not in all datasets. Any ideas on how to proceed? 

 

 

 

 

 

Astounding
PROC Star

There's only one quick solution that I know of  ... and I'm not 100% sure it would work.  It would only apply if X is the one and only variable name that begins with "X" in your data set(s).  In that case, you could try ( drop=x: )

 

If there are other variable names that begin with "X", however, this approach would drop those other variables as well.  More cumbersome methods are required.

 

If this does work, it might generate a warning.

 

Regardless, the final solution will not be able to combine the data sets if a variable is character in one data set and numeric in another.

MiniRadde
Obsidian | Level 7

Again, thank you @Astounding for your help. That solution you proposed I already tried, and it does unfortuntly not work. I solved it very very badly but at least I can continue. Since all files have almost the same name (what differs is the date), I made 31 single conversion for each day in the month: 

 

data want_YYYY_MM_DD;
set have._YYYY_MM_DD (rename=(TROUBLE_VARIABLE=TEMP));
TROUBLE_VARIABLE = put(TEMP, 7.);
drop TEMP;
run;

 

Next by doing "find and replace" on the YYYY and MM numbers, I could repeat the procedure for every month and every year in my sample period. As said, this was very non-fancy, but it worked! 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 15948 views
  • 6 likes
  • 3 in conversation