BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Kirtid
Obsidian | Level 7

Hi,

 

I have the below case where I'm unable to find a best approach to implement the solution, could you please advise me here.

 

Input data: (in the below example I have the item and year columns grouped, and based on the values of X1 and X2 , valid and error datasets are created.

ItemYearX1X2
A2018 12
A201810 
B2019 20
B2019 5
C20171 
C201712
C2017  
D2016  
E2016102
E20166 

 

Aggregation Criteria in which the two output datasets are created as below:

  • Valid dataset will always have a unique entry from the group. The non-null values of X1 and X2 in a group is considered to be valid(ex - Item : A)
  • when X1/X2 have same values in a group, then the unique value is picked and added to the valid dataset (ex- item :C)
  • Conflicts in values of X1/X2 in a group are considered to be an error. when X1/X2 values are different in a group then the last value of X1/X2 in the group are picked as a valid. So in this case the valid dataset will have a unique entry for the mentioned group (ex- Item: B ) and the error dataset will save the both the records of B from the input dataset.
  • When there is only in entry in the input, then it goes to the valid dataset (ex - Item: D)

 

Output1 - Valid dataset:

ItemYearX1X2
A20181012
B2019 5
C201712
D2016  
E201662

 

Output2 : Error dataset

ItemYearX1X2
B2019 20
B2019 5
E2016102
E20166 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ed_sas_member
Meteorite | Level 14

Hi @Kirtid 

Please try this:

 

data have;
	infile datalines dsd missover dlm="09"x;
	input Item $ Year X1 X2;
	datalines;
A	2018	 	12
A	2018	10	 
B	2019	 	20
B	2019	 	5
C	2017	1	 
C	2017	1	2
C	2017	 	 
D	2016	 	 
E	2016	10	2
E	2016	6	 
;

/* Identify record number within each group item-year */
data have2;
	set have;
	by Item Year;
	if first.item then num=0;
	num+1;
run;

/* Create OUTPUT1 */

data output1;
	set have2;
	by item year num;
	retain x1_bis x2_bis;
	if first.year then call missing(x1_bis, x2_bis);
	if not missing(x1) then x1_bis=x1;
	if not missing(x2) then x2_bis=x2;
	if last.year then output;
	drop x1 x2;
	rename x1_bis=x1 x2_bis=x2;
run;

/* Identify conflicts in values of X1/X2 -> OUTPUT2 */
proc sql;
	create table output2 as
	
	select *
	from have2
	where x1 is not missing
	group by item, year
	having count(distinct x1) > 1
	
	union all corr
	
	select *
	from have2
	where x2 is not missing
	group by item, year
	having count(distinct x2) > 1
	
	order by item, year, num;
quit;

proc print data=output1;run;
proc print data=output2;run;

Best,

View solution in original post

6 REPLIES 6
ed_sas_member
Meteorite | Level 14

Hi @Kirtid 

Please try this:

 

data have;
	infile datalines dsd missover dlm="09"x;
	input Item $ Year X1 X2;
	datalines;
A	2018	 	12
A	2018	10	 
B	2019	 	20
B	2019	 	5
C	2017	1	 
C	2017	1	2
C	2017	 	 
D	2016	 	 
E	2016	10	2
E	2016	6	 
;

/* Identify record number within each group item-year */
data have2;
	set have;
	by Item Year;
	if first.item then num=0;
	num+1;
run;

/* Create OUTPUT1 */

data output1;
	set have2;
	by item year num;
	retain x1_bis x2_bis;
	if first.year then call missing(x1_bis, x2_bis);
	if not missing(x1) then x1_bis=x1;
	if not missing(x2) then x2_bis=x2;
	if last.year then output;
	drop x1 x2;
	rename x1_bis=x1 x2_bis=x2;
run;

/* Identify conflicts in values of X1/X2 -> OUTPUT2 */
proc sql;
	create table output2 as
	
	select *
	from have2
	where x1 is not missing
	group by item, year
	having count(distinct x1) > 1
	
	union all corr
	
	select *
	from have2
	where x2 is not missing
	group by item, year
	having count(distinct x2) > 1
	
	order by item, year, num;
quit;

proc print data=output1;run;
proc print data=output2;run;

Best,

Kirtid
Obsidian | Level 7

Thanks for the solution. I have a question here, the solution worked for the provided example, but in original I have around 10 columns(X1... X10) for which this valid check to be performed, in such case using 'union corr all' would it be fine while creating the invalid dataset?, wondering if there will be any performance issues, kindly advise.

Astounding
PROC Star

This is untested but ought to work.  GIven that your data is sorted:

proc summary data=have;
   by item year;
   var a b;
   output out=validity (keep=item year min_a min_b max_a max_b)
   min=min_a min_b max=max_a max_b;
run;
data valid invalid;
   merge have validity;
   by item year;
   if (min_a < max_a) or (min_b < max_b) then output invalid;
   else output valid;
   drop min_a min_b max_a max_b;
run;

Note that the handling of missing values is tricky but accurate.  PROC SUMMARY ignores missing values when calculating statistics.  The only way MIN_A or MIN_B ends up with a missing value is when all the values of the incoming variable are missing for that group of observations.

s_lassen
Meteorite | Level 14

Here is a relatively simple data step solution:

data valid error_keys(keep=Item Year);
  merge 
    have(keep=Item Year)
    have(keep=Item Year X1 where=(X1 is not null) in=in1)
    have(keep=Item Year X2 where=(X2 is not null) in=in2)
    ;
  by item year;
  if first.year then do;
    _x1=x1;
    _x2=x2;
    _error=0;
    end;
  else do;
    if in1 and _x1 ne x1 then do;
      _error=1;
      _x1=x1;
      end;
    if in2 and _x2 ne x2 then do;
      _error=1;
      _x2=x2;
      end;
    end;
  if last.year;
  x1=_x1;
  x2=_x2;
  output valid;
  if _error then output error_keys;
  retain _:;
  drop _:;
run;

data errors;
  merge error_keys(in=error) have;
  by item year;
  if error;
run;
Kirtid
Obsidian | Level 7

Thank you all much for your time and for giving me a different view on the solution approach, it's really interesting for me to note them. Have a great day..

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1203 views
  • 4 likes
  • 5 in conversation