BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MiniRadde
Obsidian | Level 7

Hello, 

 

I am stuck on a procedure that I really need to automize, but I am very new to writing own code in SAS.

My problem could be summarized as follows:

 

input > 500 datasets (named dataset_name) having X variables.

output = 1 dataset having X+1 variables ("+1" = "name" in dataset_name).

 

What this means is that I have a library that contains over 500 datasets that I'd like to consolidate. All files are named with the same prefix, followed by the date, e.g.

 

data_2017-01-06

data_2017-01-18

data_2017-02-08,

 

and so on. What I need to do is to create a new dataset containing all observations from all datasets, while I also need to be able to identify at which date the observation was made (i.e. the date-part of the dataset name). I would like to store only the YYYY-MM-DD part of the name as observations (preferably as characters).

 

Is there any friendly soul out there who could help me to solve this problem?

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

SAS actually contains very helpful tools for this.  Assuming that you want all data set names that begin with data_2017 ...

 

data want;

set lib.data_2017: indsname=complete_name;

date_id = scan(complete_name, 2, '_');

run;

 

The colon in data_2017: gets you all data set names that begin with those characters.  INDSNAME= creates a variable holding the name of the incoming data set.

View solution in original post

8 REPLIES 8
Astounding
PROC Star

SAS actually contains very helpful tools for this.  Assuming that you want all data set names that begin with data_2017 ...

 

data want;

set lib.data_2017: indsname=complete_name;

date_id = scan(complete_name, 2, '_');

run;

 

The colon in data_2017: gets you all data set names that begin with those characters.  INDSNAME= creates a variable holding the name of the incoming data set.

ChrisHemedinger
Community Manager

@Astounding beat me to it.  I was just having to use this technique myself!  Here's what I did with my collection of daily data sets that have the name pattern "GA_DAILYyyyymmdd".  Use the colon operator to match the name pattern, and the INDSNAME= option to capture the input data set name.

 

data consolidated;
  length source $ 32;
  set ga.ga_daily: indsname=in;
  source = in;
run;

Result:

 

indsname.png

It's time to register for SAS Innovate! Join your SAS user peers in Las Vegas on April 16-19 2024.
MiniRadde
Obsidian | Level 7
Hi ChrisHemedinger! Thank you so much for your help! The code worked almost perfect, the only thing was that it stored the entire file-name, but that is really a trifle. Thank you, and thank you again!
MiniRadde
Obsidian | Level 7
Hi Astounding! Waow, I really did not expect help that fast! Your code works PERFEKT! Thank you so much for your help, I could not have figured this out myself in time!
MiniRadde
Obsidian | Level 7

Thank you @Astounding @ChrisHemedinger, I am forever grateful to your quick support!

MiniRadde
Obsidian | Level 7

Hi again, and sorry for bothering about this topic again.

 

The solution worked fine as long as the dataset-formats were equal. When running over al my sets, I noticed that in some -not all- of my files there exists a variable X. Sometimes this one is defined as numeric and sometimes as character, resulting in an error: 

 

Variable X has been defined as both character and numeric

 

I tried to solve it using 

 

set lib.data_2017: (rename=X) or (drop=X),

 

and so on, but this won't work since the variable is not in all datasets. Any ideas on how to proceed? 

 

 

 

 

 

Astounding
PROC Star

There's only one quick solution that I know of  ... and I'm not 100% sure it would work.  It would only apply if X is the one and only variable name that begins with "X" in your data set(s).  In that case, you could try ( drop=x: )

 

If there are other variable names that begin with "X", however, this approach would drop those other variables as well.  More cumbersome methods are required.

 

If this does work, it might generate a warning.

 

Regardless, the final solution will not be able to combine the data sets if a variable is character in one data set and numeric in another.

MiniRadde
Obsidian | Level 7

Again, thank you @Astounding for your help. That solution you proposed I already tried, and it does unfortuntly not work. I solved it very very badly but at least I can continue. Since all files have almost the same name (what differs is the date), I made 31 single conversion for each day in the month: 

 

data want_YYYY_MM_DD;
set have._YYYY_MM_DD (rename=(TROUBLE_VARIABLE=TEMP));
TROUBLE_VARIABLE = put(TEMP, 7.);
drop TEMP;
run;

 

Next by doing "find and replace" on the YYYY and MM numbers, I could repeat the procedure for every month and every year in my sample period. As said, this was very non-fancy, but it worked! 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 14590 views
  • 6 likes
  • 3 in conversation