SAS Programming

Kelly_S · Posted 07-28-2019 04:46 PM

Hello,

I need help in building a code for a specific criteria. I have a very huge dataset with lots of variables and observations.

I need to identify the repetitive pair of observations/values in 2 columns by month/year

In this given dataset, the column 1 AND column 2 is a character Variable contains either name or id.

Jan Month have these values get repeated in Feb month,

(a,b) , (c,e or e,c) , (7,p or p,7), (6, u or u,6) .

Also Jan month and April month have a pair (1,2 or 2,1) and so do March..

I need to pick only these pair matches in any of the months and store it in different datasets.

I am trying my code in Proc Sql. Thanks for looking.

Given Dataset:

year Months Column 1- Name or ID Column2- Name or ID

17 Jan a b

17 Jan 1 2

17 Jan c e

17 Jan 5 9

17 Jan g h

17 Jan 7 p

17 Jan u 6

17 feb a b

17 feb 11 2

17 feb e c

17 feb 7 p

17 feb 6 u

17 Mar c e

17 Mar 6 u

17 Apr 2 1

18 Jan 1 2

18 Jan g h

18 feb a b

18 mar 5 b

PGStats · Posted 07-28-2019 06:14 PM

Please provide the desired output for your example data.

PG

Kelly_S · Posted 07-28-2019 07:01 PM

Historically i wanted to see how many times the pair gets repeated for each month . I am still thinking about my output. But I am trying to output somewhat like this,

2017 2018
Jan
Feb
Mar
Apr
May
June

Reeza · Posted 07-28-2019 06:17 PM

Please show the SQL code you've tried. SQL doesn't do well for detecting sequences.

Kelly_S · Posted 07-28-2019 06:53 PM

something like this,
Below code picks the one's that has pairs repeated, but i cant group by month.
I also have another variable called id's, its unique to each pair for each month.

proc sql;
create table want as
select col 1,
col 2,
count(distinct id) as ct
from have
group by
col1,col2
having count(distint id) >=2;run;

PaigeMiller · Posted 07-28-2019 06:17 PM

To identify the matches, I would use PROC FREQ

/**** UNTESTED CODE ****/

proc freq data=have noprint;
    table col1*col2/out=counts;
run;

proc sql;
    create table want as select h.* from
        have as h left join counts as c on h.col1=c.col1 and h.col2=c.col2
        where c.count>1;
quit;

This will create a table of rows that have at least one matching pair of col1 and col2. From there, I leave it up to you to split these into separate data sets (something which is usually not recommended, by the way).

If this does not work for some reason, then you would need to provide your example data as a SAS data step, following these instructions: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

and also clearly state the desired results.

--
Paige Miller

PGStats · Posted 07-28-2019 10:54 PM

Something like this?

data have;
length col1 col2 $8;
input year Month $ Col1 Col2;
date = input(cats("01",month,year), date.);
format date mmyy.;
drop year month;
datalines;
 17 Jan a b
 17 Jan 1 2
 17 Jan c e
 17 Jan 5 9
 17 Jan g h
 17 Jan 7 p
 17 Jan u 6
 17 feb a b
 17 feb 11 2
 17 feb e c
 17 feb 7 p
 17 feb 6 u
 17 mar c e
 17 mar 6 u
 17 Apr 2 1
 18 Jan 1 2
 18 Jan g h
 18 feb a b
 18 mar 5 b
;

data temp;
set have; by date;
call sortc(col1, col2);
run;

proc sort data=temp; by col1 col2 date; run;

data want;
set temp; by col1 col2 date;
if not (first.col2 and last.col2);
run;

proc print data=want; run;

PG

Kurt_Bremser · Posted 07-29-2019 01:43 AM

Create a concatenated key, eg

matchkey = catx('|',col1,col2);

Then sort by that key, year and month.

After that, a double DOW can do it:

data want;
do until (last.matchkey);
  set have;
  by matchkey;
  flag = not first.matchkey;
end;
do until (last.matchkey);
  set have;
  by matchkey;
  if flag then output;
end;
drop flag;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

SAS Programming

Identify match in 2 column

Re: Identify match in 2 column

Re: Identify match in 2 column

Re: Identify match in 2 column

Re: Identify match in 2 column

Re: Identify match in 2 column

Re: Identify match in 2 column

Re: Identify match in 2 column

Identifying fuzzy matches within a single variable

SAS Data Studio 2.2: New Transforms for Unique Identifier & Match and ...

How to identify correct treatment

Pattern matching and subset

Need a way to identify non-USA keyboard characters in a string

Follow Us

What is...

SAS Programming

Our biggest data and AI event of the year.

SAS Training: Just a Click Away

Follow Us

What is...