- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello. Noob here.
I have data that looks like this:
Tech SubTech
T1 AB11
T1 JDE1
T1 LEIO
T2 AJNC
T2 JUN1
T2 PLNL
T3 TNEVE
T3 NED0
I would like to use two loops, the outside to loop through the three technologies, and the inner one for each subtech within, to create macro variables something like this:
subs1 = "JDE1, LEI0"
subs2 = "AJNC, JUN1, PLNL"
subs3="TNEVE, NED0"
via subs&i = (correct command here)
Something like:
data _null_;
set techs;
if first.Tech = 1 then do until last.Tech = 1
subs&i = ;
<second loop over j in here>
subs&i = subs&i.||<AppropriateSubTech>; *(trying to paste together a list here like "JDE1, LEI0")
end;
end;
run;
Two questions:
1) How do I define the inside loop?
2) Also, I feel like the outside loop will terminate one too early if I use last.Tech = 1 because I want to include that term in the loop as well. How do I appropriately define the index for the outer loop?
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
That isn't how you've shown your data. you've only shown values in a single column, not multiple variables.
select ABCD, ABCE, ABCF, DCAB, ... from table1 -> you only showed two variables, Tech and SubTech, where do variables ABCD and ABCE come from? If those are values you're not understanding what I'm trying to suggest.
And, Why for the first set are you missing the first entry?
subs1 = "JDE1, LEI0" -> where's the first one?
So you're trying to create subsets for each tech group based on the subtech groups? You can add a WHERE to the select statement, but I don't understand why you don't just join them and find the values you need.
or use an approach like this.
proc sql;
create table tech1set as
select *
from dataset2
where subtech in (select subtech from table1 where tech='T1');
quit;
There's some ideas here on how to create those lists but SAS doesn't loop the way you're thinking, there's already a data step loop that you need to take advantage of, as well as the BY group processing that's supported.
See this example here:
https://gist.github.com/statgeek/d583cfa992bf56da51d435165b07e96a
You can run it, and I suggest you do and see how it works. If you really, really want the macro variables at the point, its a very easy addition, it would be something like the following at each of the last.group values.
if last.group then call symputx(group_name, variable_value);
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The point of this exercise is to subset another dataset that doesn't have the Tech column but has the SubTech column.
Eg,
dataset2 is like:
SubTech Data
AB11 0.102
AB11 -1.22
LEI0 0.73
data Tech1Set;
set dataset2;
if SubTech IN (&subs1);
run;
etc.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Change your process:
proc sql;
create table tech1set as
select *
from dataset2
where subtech in (select subtech from table1);
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your reply.
I don't think that will work as I have 100's of subtechs that I need to automate their listing.
It would be like
proc sql;
create table tech1set as
select *
from dataset2
where subtech in (select ABCD, ABCE, ABCF, DCAB, ... from table1);
quit;
There's no way I would hand-type out those SubTechs, and the list of SubTechs is expanding, which is why I need to pull the most up to date list from the first dataset that gets updated with all active subtechs.
If I just literally "select subtech from tabel1" then it wouldn't subset the data between the major tech groups.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
That isn't how you've shown your data. you've only shown values in a single column, not multiple variables.
select ABCD, ABCE, ABCF, DCAB, ... from table1 -> you only showed two variables, Tech and SubTech, where do variables ABCD and ABCE come from? If those are values you're not understanding what I'm trying to suggest.
And, Why for the first set are you missing the first entry?
subs1 = "JDE1, LEI0" -> where's the first one?
So you're trying to create subsets for each tech group based on the subtech groups? You can add a WHERE to the select statement, but I don't understand why you don't just join them and find the values you need.
or use an approach like this.
proc sql;
create table tech1set as
select *
from dataset2
where subtech in (select subtech from table1 where tech='T1');
quit;
There's some ideas here on how to create those lists but SAS doesn't loop the way you're thinking, there's already a data step loop that you need to take advantage of, as well as the BY group processing that's supported.
See this example here:
https://gist.github.com/statgeek/d583cfa992bf56da51d435165b07e96a
You can run it, and I suggest you do and see how it works. If you really, really want the macro variables at the point, its a very easy addition, it would be something like the following at each of the last.group values.
if last.group then call symputx(group_name, variable_value);
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I think adding the where statement will make it work. Thank you!
Both data sets have two columns, one has Tech and SubTech, and the other has SubTech and Data. The data sets I showed were small examples of very large 500k+ rows of information.
Missing the first entry just due to missing it when providing my example.
The values ABCD, ABCE...etc, are just in the dataset. Those are the fake versions of the subtech names, and are values, not separate variables, which I assume should be the top column if they were supposed to be variables.
I don't just join them because there is some slight complexities I know how to deal with that I'm glossing over with respect to the SubTech column, which actually contains multiple techs, and therefore the two SubTech columns are not identical. I can use the where statement along with an index indicator function to get around that though.
I was trying to take advantage of the internal loop structure of the data step by using a sorted data set and the first/last commands. Perhaps badly. I'll check out that link. Thank you for your help!