- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello all, I was hoping someone could help me with a formula that uses Proc Print on one data set for only certain values which are populated in a separate data set. For example if I have the following two tables:
Unique_Identifier |
---|
1 |
2 |
3 |
Unique_Identifier | Name | Sales |
---|---|---|
1 | Joe | 175 |
2 | Bill | 165 |
3 | John | 150 |
4 | Mark | 200 |
5 | Barkevious | 100 |
How can I write a procedure so that I print of only the values that populate in table 1 under unique identifier?
I have tried just using
proc print data=Table.1; Where Unique_Identifier EQ "1"; run;
proc print data=Table.1; Where Unique_Identifier EQ "1"; run;
proc print data=Table.1; Where Unique_Identifier EQ "1"; run;
And while this works when running a smaller set of data if you try and do this for thousands of variables (which I am hoping to do) it takes to long. Any help I could get would be greatly appreciated, thank you in advance!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes, as ballardw said you can use an IN operator for selecting set of values. But you mentioned that to print the observations where uniqueidentiier in first table(lookup) are 35k and sec table is 5million. In that case an efficient method is to use PROC SQL. One method is explained above in my first reply and more efficient will be:
So you have in table1 your set of uniq values which you want to print from table2 which contains a larger set of values.
ODS CSV File="C:\Temp.csv";
proc sql;
select a.*
from table2 as a, table1 as b
where a.uniqident = b.uniqident;
quit;
ODS CSV File="C:\Temp.csv";
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
What you are attempting is not clear. Why would you run the exact same code 3 times? Are wanting to create an output table for each value of unique_identifier? That would be BY group processing which would require sorting if not already done.
proc print data= datasetname;
by Unique_identifer;
run;
If this isn't what your looking for provide some examples of desired output.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for the quick reply sorry for not being more clear, I made an error in the example code, it was supposed to look like this:
proc print data=Table.1; Where Unique_Identifier EQ "1"; run;
proc print data=Table.1; Where Unique_Identifier EQ "2"; run;
proc print data=Table.1; Where Unique_Identifier EQ "3"; run;
The idea is that the second table contains many unique identifiers and I only want to print a subset of the data. My goal would be for the output table to simply print the contents of the rows in the second table with the Unique_Identifier value of 1,2 and 3. I am hoping to do this on a large data set where I know the values of ~35k unique identifiers and need to extract them from a second table that contains ~5million rows of unique identifiers.
Hope that clarifies things
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
As ballardw said your query is not clear but what I learned from the query is that you want to compare the values of Unique Identifier in the two tables and print out the selected obs.
For that you can use Proc SQL
Ex: proc sql;
select *
from table2
where select uniqueidentifier in (select uniqueidentifier from table1 where ........);
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
%macro print_out();
proc sql;
select distinct count(uniqueidentifier)
from table1
into: distinct_vars;
quit;
proc sort data = table1 nodupkey out = uniq;
by uniqueidentifier;
run;
data _null_;
set uniq;
call symput('varb'||LEFT(PUT(_N_, 4.)), TRIM(uniqueidentifier));
run;
do i = 1 to distinct_vars;
ods html file = "Print_&distinct_vars..xls";
proc print data = table2;
where uniqueidentifier = "&&varb&i";
run;
ods html close;
%mend print_out;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I think this should solve the issue....
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'm not nearly advanced as you guys and just need to extract the data so here is what I ended up doing, let me know if you have any thoughts on this
ODS CSV File="C:\Temp.csv";
Proc Print data=Table2;
where
Unique_Identifier EQ 1 or
Unique_Identifier EQ 2 or
Unique Identifier EQ 3;
run;
ODS CSV CLOSE;
This worked perfectly for a smaller set of data but I ask for replies because like I said I need to run this on a much larger dataset and am worried about speed/performance. Any help would be much appreciated, thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Shortcut in syntax for a list of variables is to use the IN operator
where Unique_identifier in (1,2,3);
Put the values in quotes if they are character values.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes, as ballardw said you can use an IN operator for selecting set of values. But you mentioned that to print the observations where uniqueidentiier in first table(lookup) are 35k and sec table is 5million. In that case an efficient method is to use PROC SQL. One method is explained above in my first reply and more efficient will be:
So you have in table1 your set of uniq values which you want to print from table2 which contains a larger set of values.
ODS CSV File="C:\Temp.csv";
proc sql;
select a.*
from table2 as a, table1 as b
where a.uniqident = b.uniqident;
quit;
ODS CSV File="C:\Temp.csv";
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Awesome, thank you for your help!