- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I need help in building a code for a specific criteria. I have a very huge dataset with lots of variables and observations.
I need to identify the repetitive pair of observations/values in 2 columns by month/year
In this given dataset, the column 1 AND column 2 is a character Variable contains either name or id.
Jan Month have these values get repeated in Feb month,
(a,b) , (c,e or e,c) , (7,p or p,7), (6, u or u,6) .
Also Jan month and April month have a pair (1,2 or 2,1) and so do March..
I need to pick only these pair matches in any of the months and store it in different datasets.
I am trying my code in Proc Sql. Thanks for looking.
Given Dataset:
year Months Column 1- Name or ID Column2- Name or ID
17 Jan a b
17 Jan 1 2
17 Jan c e
17 Jan 5 9
17 Jan g h
17 Jan 7 p
17 Jan u 6
17 feb a b
17 feb 11 2
17 feb e c
17 feb 7 p
17 feb 6 u
17 Mar c e
17 Mar 6 u
17 Apr 2 1
18 Jan 1 2
18 Jan g h
18 feb a b
18 mar 5 b
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Please provide the desired output for your example data.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
2017 2018
Jan
Feb
Mar
Apr
May
June
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Below code picks the one's that has pairs repeated, but i cant group by month.
I also have another variable called id's, its unique to each pair for each month.
proc sql;
create table want as
select col 1,
col 2,
count(distinct id) as ct
from have
group by
col1,col2
having count(distint id) >=2;run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
To identify the matches, I would use PROC FREQ
/**** UNTESTED CODE ****/
proc freq data=have noprint;
table col1*col2/out=counts;
run;
proc sql;
create table want as select h.* from
have as h left join counts as c on h.col1=c.col1 and h.col2=c.col2
where c.count>1;
quit;
This will create a table of rows that have at least one matching pair of col1 and col2. From there, I leave it up to you to split these into separate data sets (something which is usually not recommended, by the way).
If this does not work for some reason, then you would need to provide your example data as a SAS data step, following these instructions: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...
and also clearly state the desired results.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Something like this?
data have;
length col1 col2 $8;
input year Month $ Col1 Col2;
date = input(cats("01",month,year), date.);
format date mmyy.;
drop year month;
datalines;
17 Jan a b
17 Jan 1 2
17 Jan c e
17 Jan 5 9
17 Jan g h
17 Jan 7 p
17 Jan u 6
17 feb a b
17 feb 11 2
17 feb e c
17 feb 7 p
17 feb 6 u
17 mar c e
17 mar 6 u
17 Apr 2 1
18 Jan 1 2
18 Jan g h
18 feb a b
18 mar 5 b
;
data temp;
set have; by date;
call sortc(col1, col2);
run;
proc sort data=temp; by col1 col2 date; run;
data want;
set temp; by col1 col2 date;
if not (first.col2 and last.col2);
run;
proc print data=want; run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Create a concatenated key, eg
matchkey = catx('|',col1,col2);
Then sort by that key, year and month.
After that, a double DOW can do it:
data want;
do until (last.matchkey);
set have;
by matchkey;
flag = not first.matchkey;
end;
do until (last.matchkey);
set have;
by matchkey;
if flag then output;
end;
drop flag;
run;