SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
Kelly_S
Calcite | Level 5

Hello,

 

I need help in building a code for a specific criteria. I have a very huge dataset with lots of variables and observations.

I need to identify the repetitive pair of observations/values in 2 columns by month/year

In this given dataset, the column 1 AND column 2 is a character Variable  contains either name or id.

Jan Month have these values  get repeated in Feb month,

(a,b) , (c,e or e,c) , (7,p or p,7), (6, u or u,6) .

Also Jan month and April month have a pair (1,2 or 2,1) and so do March..

I need to pick only these pair matches in any of the months and store it in different datasets.

I am trying my code in Proc Sql. Thanks for looking.

 

Given Dataset:

year         Months      Column 1- Name or ID          Column2- Name or ID

 17               Jan              a                                            b

 17            Jan              1                                               2

 17             Jan              c                                              e

 17               Jan              5                                            9

 17               Jan              g                                           h

 17               Jan             7                                            p

 17                Jan             u                                            6

 17               feb              a                                            b

 17                feb           11                                            2

 17               feb              e                                            c

 17              feb             7                                              p

 17              feb             6                                              u

 17              Mar            c                                              e

 17              Mar           6                                               u

 17              Apr           2                                               1

 18            Jan              1                                               2

 18               Jan              g                                           h

 18               feb              a                                            b

 18               mar              5                                            b

        

7 REPLIES 7
PGStats
Opal | Level 21

Please provide the desired output for your example data.

PG
Kelly_S
Calcite | Level 5
Historically i wanted to see how many times the pair gets repeated for each month . I am still thinking about my output. But I am trying to output somewhat like this,

2017 2018
Jan
Feb
Mar
Apr
May
June
Reeza
Super User
Please show the SQL code you've tried. SQL doesn't do well for detecting sequences.
Kelly_S
Calcite | Level 5
something like this,
Below code picks the one's that has pairs repeated, but i cant group by month.
I also have another variable called id's, its unique to each pair for each month.

proc sql;
create table want as
select col 1,
col 2,
count(distinct id) as ct
from have
group by
col1,col2
having count(distint id) >=2;run;
PaigeMiller
Diamond | Level 26

To identify the matches, I would use PROC FREQ

 

/**** UNTESTED CODE ****/

proc freq data=have noprint;
    table col1*col2/out=counts;
run;

proc sql;
    create table want as select h.* from
        have as h left join counts as c on h.col1=c.col1 and h.col2=c.col2
        where c.count>1;
quit;

This will create a table of rows that have at least one matching pair of col1 and col2. From there, I leave it up to you to split these into separate data sets (something which is usually not recommended, by the way).

 

If this does not work for some reason, then you would need to provide your example data as a SAS data step, following these instructions: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

and also clearly state the desired results.

--
Paige Miller
PGStats
Opal | Level 21

Something like this?

 

data have;
length col1 col2 $8;
input year Month $ Col1 Col2;
date = input(cats("01",month,year), date.);
format date mmyy.;
drop year month;
datalines;
 17 Jan a b
 17 Jan 1 2
 17 Jan c e
 17 Jan 5 9
 17 Jan g h
 17 Jan 7 p
 17 Jan u 6
 17 feb a b
 17 feb 11 2
 17 feb e c
 17 feb 7 p
 17 feb 6 u
 17 mar c e
 17 mar 6 u
 17 Apr 2 1
 18 Jan 1 2
 18 Jan g h
 18 feb a b
 18 mar 5 b
;

data temp;
set have; by date;
call sortc(col1, col2);
run;

proc sort data=temp; by col1 col2 date; run;

data want;
set temp; by col1 col2 date;
if not (first.col2 and last.col2);
run;

proc print data=want; run; 
PG
Kurt_Bremser
Super User

Create a concatenated key, eg

matchkey = catx('|',col1,col2);

Then sort by that key, year and month. 

After that, a double DOW can do it:

data want;
do until (last.matchkey);
  set have;
  by matchkey;
  flag = not first.matchkey;
end;
do until (last.matchkey);
  set have;
  by matchkey;
  if flag then output;
end;
drop flag;
run;

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 6090 views
  • 1 like
  • 5 in conversation