BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Faruk
Obsidian | Level 7
Hi,

I have table that has 200k obesarvations, that contains sourcecode (source).

i have another table with hardcoded values(hard)

i want to search in the source table for hardcoded values from the hard (10k).

does someone know how i can do this.

i used the following method:

i created macro variabels hardcode_1 till hardcode_99999

then in a macro i use do loop and increase the variable name with 1 and resolve the variable each loop. in the same loop i have also an proc sql with a where statement, in the statement i use the variable to search and append it to another table

issue with this solution is that it is taking to long and i have also issue with values that contains &. i triesld %superq but then i miss quote's.
1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Almost certainly no macro variables needed.In fact my first feeling is that macro variables make this harder.

 

You can search whether the text value of a variable occurs as part of the text in another with the INDEX, INDEX, FIND or FINDW functions.

If your "values" are actually numeric you may have a bit more challenge as the "source" is almost certainly character valued and you would have to search for character equivalent of the number. But depending on many styles if your number value is 10 you may have to search in text for 10.0, 10.00, 1E1 or other possible representations of the number. PLUS exactly how it is used. Would you want to match "10" with the 10 in a Length, Format or Informat statement or assignment, an array definition?.

 

Examples and very explicit rules about what will constitute a match and what you expect the result to look like.

 

Here is one example The two data steps create some example data to work with and Proc SQL executes the matches.

data source;
   infile datalines truncover;
   input line $200.;
datalines4;
options nocenter
        nodate
        mcompilenote=all
        nofmterr
        dlcreatedir
        formdlim='-'
        pagesize=max
;
title;

%let centpath=D:\Users\Owner\Documents\Miniatures\Renegade legion\;
libname cent "&centpath.SASData";
libname super "D:\Users\Owner\DATA\Super";
options append=(fmtsearch=cent);
%let outpath=D:\Users\Owner\Documents\;
;;;;

data hard;
   infile datalines truncover;
   input value $15.;
datalines;
fmterr
libname
cent
;

proc sql;
   create table found as
   select a.line, b.value
   from source as a, hard as b
   where findw(a.line,strip(b.value))>0;
quit;

This is searching for matching WORDS so "fmterr" is not a match for "nofmterr" if you were wondering.

The strip is needed because the value variable will have trailing blanks used in the comparison. You may also need to remove leading blanks in the source data depending..

 

If you have not seen the DATALINES4 that is needed to allow SAS to read inline text that contains semicolons. The option requires 4 sequential semicolons to end the datalines text, otherwise the first semicolon ends the data.

The infile option truncover prevents SAS from reading past the end of shorter lines in the when the Input is reading up to 200 characters that would typically result in incomplete data. Run the steps without the option to see the difference.

 

This is not an attempt to resolve any of the issues I mentioned about searching for numeric values but to demonstrate a "proof of principal" that macro variables are likely to be much more complex to work with.

If you have multiple variables in your "hard" data set you could use and "or" in the where to allow searching for multiple values from a record at one time.

 

Note: 200K observations in your first set with the 10K macro variables mentioned means that you were loading that 200K data set 10,000 times if I under the description of your attempt. So yes, it takes some time to run. My demonstration runs pretty quickly. With 200K *10K it will also run for awhile because it will search every record for every value. The output could be pretty large as well depending on the values involved.

View solution in original post

5 REPLIES 5
Reeza
Super User
Sounds like a join or lookup.

If you post a sample of each data set, someone can help with the code.
Otherwise consider this tutorial: https://communities.sas.com/t5/Ask-the-Expert/How-Do-I-Combine-Data-in-SAS-Q-amp-A-Slides-and-On-Dem...
ballardw
Super User

Almost certainly no macro variables needed.In fact my first feeling is that macro variables make this harder.

 

You can search whether the text value of a variable occurs as part of the text in another with the INDEX, INDEX, FIND or FINDW functions.

If your "values" are actually numeric you may have a bit more challenge as the "source" is almost certainly character valued and you would have to search for character equivalent of the number. But depending on many styles if your number value is 10 you may have to search in text for 10.0, 10.00, 1E1 or other possible representations of the number. PLUS exactly how it is used. Would you want to match "10" with the 10 in a Length, Format or Informat statement or assignment, an array definition?.

 

Examples and very explicit rules about what will constitute a match and what you expect the result to look like.

 

Here is one example The two data steps create some example data to work with and Proc SQL executes the matches.

data source;
   infile datalines truncover;
   input line $200.;
datalines4;
options nocenter
        nodate
        mcompilenote=all
        nofmterr
        dlcreatedir
        formdlim='-'
        pagesize=max
;
title;

%let centpath=D:\Users\Owner\Documents\Miniatures\Renegade legion\;
libname cent "&centpath.SASData";
libname super "D:\Users\Owner\DATA\Super";
options append=(fmtsearch=cent);
%let outpath=D:\Users\Owner\Documents\;
;;;;

data hard;
   infile datalines truncover;
   input value $15.;
datalines;
fmterr
libname
cent
;

proc sql;
   create table found as
   select a.line, b.value
   from source as a, hard as b
   where findw(a.line,strip(b.value))>0;
quit;

This is searching for matching WORDS so "fmterr" is not a match for "nofmterr" if you were wondering.

The strip is needed because the value variable will have trailing blanks used in the comparison. You may also need to remove leading blanks in the source data depending..

 

If you have not seen the DATALINES4 that is needed to allow SAS to read inline text that contains semicolons. The option requires 4 sequential semicolons to end the datalines text, otherwise the first semicolon ends the data.

The infile option truncover prevents SAS from reading past the end of shorter lines in the when the Input is reading up to 200 characters that would typically result in incomplete data. Run the steps without the option to see the difference.

 

This is not an attempt to resolve any of the issues I mentioned about searching for numeric values but to demonstrate a "proof of principal" that macro variables are likely to be much more complex to work with.

If you have multiple variables in your "hard" data set you could use and "or" in the where to allow searching for multiple values from a record at one time.

 

Note: 200K observations in your first set with the 10K macro variables mentioned means that you were loading that 200K data set 10,000 times if I under the description of your attempt. So yes, it takes some time to run. My demonstration runs pretty quickly. With 200K *10K it will also run for awhile because it will search every record for every value. The output could be pretty large as well depending on the values involved.

Faruk
Obsidian | Level 7
After i wrote my question i also realised that i am making it unnecessary complex.

My solution works but it gives warning when a data contains special characters.

I tried instead of my solution just a query with find function, just like you're suggestion. i think it worked, but it takes a while to complete.

i will trie this week again with strip.

edit: i am not searching for numeric values, they are organisation names.

Why will fmterr not match with nofmterr. it contains the same string.
Tom
Super User Tom
Super User

@Faruk wrote:

Why will fmterr not match with nofmterr. it contains the same string.

The string "nofmterr" does not contain the word "fmterr".  The string "no fmterr" does contain that word.

ballardw
Super User

@Faruk wrote:

Why will fmterr not match with nofmterr. it contains the same string.

The specific function in the example: FINDW finds "words". Words are separated by delimiters (typically space, comma and other punctuation). Since there are none of the typical delimiters inside the value of nofmterr then fmterr is not a match. The Find and Index functions are to find strings that are not delimited "words" and would consider that a match.

That is why I indicated "very specific rules".

 

The options involved, depending on the function, involve specifying delimiters, using or ignoring case, including/excluding trailing spaces, start position and a few details involved with these.

 

There are also the PRX functions involved for patterns that might be of use.

 

Time: 200K * 10K = 2 billion comparisons at least. If your code does a 1K comparisons per second then that is 2000K seconds. Hope it runs faster than 1K per second. At that rate it is about 24 days.

 

Suggestion: try with 1K of the 200K and see how long that takes. Then you could estimate how long this will take so you can plan accordingly.

It might help to check if you have duplicate lines or values. Consider that if you run this against a typical SAS program you might find lots of "run;" You definitely do not want duplicates in your "hard" data set as that is just plain duplication with an SQL Cartesian join like this.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 281 views
  • 0 likes
  • 4 in conversation