BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
cghost
Obsidian | Level 7

I'm working to merge around 25 individual survey data files, in waves of around (but not exactly) 5 at a time, and I'm finding all sorts of issues with user-defined formats, informats, and the Case IDs being used in each file. While some of these issues must be worked out by hand, I'd like to build a list of all the distinct CASE_ID values between all 25 datasets, and create comparison datasets with the formats, values and labels across each of the ~5ish dataset groups. The easiest way to do this, especially the formatting stuff, seems to be with Proc SQL. But I was hoping that I could build a macro that would generate my join statements (for the formats) and union statements (for the case ids) on the fly to save some time. Is it possible to have a macro generate syntax within a proc sql statement that goes beyond listing column/variable names and numbers?

 

I attempted to build something like this to give me a full list of the case IDs from my 25 files (which all begin with the prefix DATA), but it doesn't run the way I expected.  I get an error about expecting a "select" statement after my union statement:

%macro combineids;
proc sql;
select memname into :datasets separated by ' '
from dictionary.tables
where libname = "WORK" and memname like "DATA%";
%let i=1; %let dataset=%scan(&datasets,&i);
create table all_id as
%do i=1 %to %sysfunc(countw(&datasets.));
%if i=1 %then;
select CASE_ID from &dataset.
%else
UNION ALL select CASE_ID from &dataset.
%let i=%eval(&i+1);
%let dataset=%scan(&datasets,&I);
%end;
;
quit;
%mend;

If this is something that's even possible to do, I suspect I may need to wrap the code in a way so that SAS knows the output is executable? 

 

 

 

I realize there may be an easier way to do this with a datastep, but if it's possible, I'd prefer to create SQL syntax since I'd like to use a similar method to generate field selections and join statements to put my grouped formats files together. I haven't fully worked the logic out for that yet, but presume I'm working from a table called fmtcomp that contains all the distinct variable/format/value combinations in my datafiles, and &datasets is a macro variable containing all my proc formats cntlout format datasets. It'd look vaguely like this:

%macro compfmt;
proc sql;
select memname, format, 
%let i=1, %let dataset=%scan(&datasets,&i)
%do i=1 to %sysfunc(countw(%datasets,%str( )));
&dataset.start as start&i., &dataset.label as label&i.,
%end
from fmtcomp
%let i=1, %let dataset=%scan(&datasets,&i)
%do i=1 to %sysfunc(countw(%datasets,%str( )));
join &dataset. on fmtcomp.format = &dataset..format and fmtcomp.start = &dataset..start
%end

Is it possible to generate union and join statements within proc sql syntax like this on the fly, or am I stuck figuring out how to do both of these operations through datasteps instead (and I should note, there are type issues between the datasets that prevent a simple merge)? 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

This:

%if i=1 %then;

is the culprit. You end the %then branch without doing anything.

These two lines should be put into 1 line of macro code:

%if i=1 %then select CASE_ID from &dataset.;

Note that the semicolon ends the macro statement and not the select statement that is handed off to the proc SQL.

 

I would greatly simplify the macro:

%macro combineids;
proc sql;
select memname into :datasets separated by ' '
from dictionary.tables
where libname = "WORK" and memname like "DATA%";
create table all_id as
select CASE_ID from %scan(&datasets,1)
%do i=2 %to %sysfunc(countw(&datasets.));
UNION ALL select CASE_ID from %scan(&datasets,&i)
%end;
;
quit;
%mend;

But the data step solution for a "union" is so much easier that it beats the sql by lightyears:

data all_id;
set work.data: (keep=case_id);
run;

Note that you can use indsname= in the set statement so that you can put statements into the data step that are dependent on which dataset is being read.

For the joins, you're probably better off by using my macro suggestion as a basis for your code.

And do not manipulate the index variable of a %do (or do) loop inside the lop, as that skewers the incrementation, and you'll skip (or repeat) iterations.

View solution in original post

4 REPLIES 4
Kurt_Bremser
Super User

This:

%if i=1 %then;

is the culprit. You end the %then branch without doing anything.

These two lines should be put into 1 line of macro code:

%if i=1 %then select CASE_ID from &dataset.;

Note that the semicolon ends the macro statement and not the select statement that is handed off to the proc SQL.

 

I would greatly simplify the macro:

%macro combineids;
proc sql;
select memname into :datasets separated by ' '
from dictionary.tables
where libname = "WORK" and memname like "DATA%";
create table all_id as
select CASE_ID from %scan(&datasets,1)
%do i=2 %to %sysfunc(countw(&datasets.));
UNION ALL select CASE_ID from %scan(&datasets,&i)
%end;
;
quit;
%mend;

But the data step solution for a "union" is so much easier that it beats the sql by lightyears:

data all_id;
set work.data: (keep=case_id);
run;

Note that you can use indsname= in the set statement so that you can put statements into the data step that are dependent on which dataset is being read.

For the joins, you're probably better off by using my macro suggestion as a basis for your code.

And do not manipulate the index variable of a %do (or do) loop inside the lop, as that skewers the incrementation, and you'll skip (or repeat) iterations.

cghost
Obsidian | Level 7

Perfect! This works fantastically for my ID union situation. I'll try this for my joins when I get to them, and report back.

ChrisNZ
Tourmaline | Level 20

For gathering all the CASE_IDs, I would just use:


data CASE_IDS;
  length CASE_ID $80;
  set %do i=1 %to %sysfunc(countw(&datasets.));
         %scan(&datasets,&i)(keep=CASE_ID)
       %end;
  ;  
  if CASE_ID ne lag(CASE_ID);
run;

 

This will keep all the values regardless of their original type, and you can then summarise.

You can probably do something similar for the formats using the metadata output tables, I am unsure I followed your train of thought.

 

 

cghost
Obsidian | Level 7

This is great! I was wondering how to navigate my type issue, and it seems the answer is that the loop tells it just to keep that single variable from each dataset in the merge. Thanks!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 4991 views
  • 2 likes
  • 3 in conversation