DATA Step, Macro, Functions and more

identify missing prefix variables of same suffix variable group

Accepted Solution Solved
Reply
Contributor
Posts: 44
Accepted Solution

identify missing prefix variables of same suffix variable group

Hello All,

I spent a day on this and I practically give up. I tried loops, joins and now this, but not able to crack this problem.

Background:

I want to identify variables from a dataset that have different prefixes which grouped by the same suffixes. Should a prefix be missing from a suffix group, we are displayed a comment noting the missing prefix group – see table at bottom of this post.

 

My approach:

I used proc contents to generate the list of variables. After some minor changes, I created this table below.

I tried to transpose the data by failed to get what I want.

If this was successful. I would use case when statements and create the comments. However, I don’t think this would work as there could be quite a few variable groups i.e. instead of 3 columns after transpose, I could have 6, and then the commenting system wont work.

 

What I have:

How do I get a dataset as below:

dlv_variables

prefix

suffix

GRP_NOMATCH

GRP_

NOMATCH

INP_PIBESC

INP_

PIBESC

SCO_PIBESC

SCO_

PIBESC

GRP_SCOREAPS

GRP_

SCOREAPS

SCO_SCOREAPS

SCO_

SCOREAPS

GRP_SCOREFAC

GRP_

SCOREFAC

INP_SCOREFAC

INP_

SCOREFAC

SCO_SCOREFAC

SCO_

SCOREFAC

SCO_TEMP

SCO_

TEMP

INP_TRIAL

INP_

TRIAL

data chk8reg1;
input dlv_variables $ prefix $ suffix $;
datalines ;
GRP_NOMATCH GRP_ NOMATCH
INP_PIBESC INP_ PIBESC
SCO_PIBESC SCO_ PIBESC
GRP_SCOREAPS GRP_ SCOREAPS
SCO_SCOREAPS SCO_ SCOREAPS
GRP_SCOREFAC GRP_ SCOREFAC
INP_SCOREFAC INP_ SCOREFAC
SCO_SCOREFAC SCO_ SCOREFAC
SCO_TEMP SCO_ TEMP
INP_TRIAL INP_ TRIAL
;

 

 

 

My failed attempt:

The proc transpose feature using the following code creates the following table:

 

proc transpose data =chk8reg1 out=chk8trans (keep= part2 col:);
var dlv_variables;
by suffix;
run;

 

 

suffix

COL1

COL2

COL3

NOMATCH

GRP_NOMATCH

   

PIBESC

INP_PIBESC

SCO_PIBESC

 

SCOREAPS

GRP_SCOREAPS

SCO_SCOREAPS

 

SCOREFAC

GRP_SCOREFAC

INP_SCOREFAC

SCO_SCOREFAC

TEMP

SCO_TEMP

   

TRIAL

INP_TRIAL

   

 

The issue with this table is that COL1 should only contain distinct groups eg. Variable Part1=”GRP”,

COL2 should only contain Variable PART1=”INP”

And so on and so forth. Order/ sequence does not matter as long as the individual columns consist of individual prefix groups.

 

In essence, the table should look like this - What I want:

I want a table like this:

suffix

COL1 (GRP_)

COL2 (INP_)

COL3(SCO_)

comment

NOMATCH

GRP_NOMATCH

   

INP_NOMATCH and SCO_NOMATCH missing

PIBESC

 

INP_PIBESC

SCO_PIBESC

GRP_PIBESC missing

SCOREAPS

GRP_SCOREAPS

 

SCO_SCOREAPS

INP_SCOREAPS missing

SCOREFAC

GRP_SCOREFAC

INP_SCOREFAC

SCO_SCOREFAC

 

TEMP

   

SCO_TEMP

GRP_TEMP and INP_TEMP missing

TRIAL

 

INP_TRIAL

 

GRP_TRIAL and SCO_TRIAL missing

If there are any missing variables COL1, COL2, COL3, then the result should display the missing variables as shown..

 

 

If you have any other way to approach this problem, please do let me know.

Many thanks.

Regards,

Seb


Accepted Solutions
Solution
‎12-18-2015 03:39 AM
Respected Advisor
Posts: 4,920

Re: identify missing prefix variables of same suffix variable group

Posted in reply to sebster24

Try this:

 

proc sql;
create table ps as
select * 
from 
    (select distinct prefix from chk8reg1),
    (select distinct suffix from chk8reg1);
create table missingPrefix as
    select * from ps
    except
    select prefix, suffix from chk8reg1
order by suffix, prefix;
quit;

data comments;
length comment $200;
do until(last.suffix);
    set missingPrefix; by suffix;
    comment = catx(" and ", comment, cats(prefix,suffix));
    end;
if not missing(comment) then comment = catt(comment, " missing");
keep suffix comment;
run;

proc sql;
create table wantList as
select a.suffix, a.prefix, a.dlv_variables, b.comment
from chk8reg1 as a inner join comments as b
    on a.suffix=b.suffix
order by suffix, comment;
quit;

proc transpose data=wantList out=want(drop=_name_);
by suffix comment;
var dlv_variables;
id prefix;
run;
PG

View solution in original post


All Replies
Trusted Advisor
Posts: 1,117

Re: identify missing prefix variables of same suffix variable group

Posted in reply to sebster24

Try this:

proc freq data=chk8reg1 noprint;
tables suffix*prefix / sparse out=frq;
run;

data result;
length varnm $32
       missv $200;
set frq;
if count then varnm=cats(prefix, suffix);
else missv=cats(prefix, suffix);
run;

data allmissv;
length comment $200;
do until(last.suffix);
  set result;
  by suffix;
  comment=catx(' ', comment, missv);
end;
keep suffix comment;
run;

proc transpose data=result out=trans(drop=_name_);
by suffix;
id prefix;
var varnm;
run;

data want;
merge trans
      allmissv;
by suffix;
if countw(comment)>1 then comment=tranwrd(trim(comment), ' ', ' and ');
if comment ne ' ' then comment=catx(' ', comment, 'missing');
run;

proc print data=want noobs;
run;

Please increase the $200 lengths of character variables if your variable lists or comments are longer.

Contributor
Posts: 44

Re: identify missing prefix variables of same suffix variable group

Posted in reply to FreelanceReinhard
Thank you FreelanceReinhard for the solution.
THis solution works well too Smiley Happy
Have a good new year celebration.
Solution
‎12-18-2015 03:39 AM
Respected Advisor
Posts: 4,920

Re: identify missing prefix variables of same suffix variable group

Posted in reply to sebster24

Try this:

 

proc sql;
create table ps as
select * 
from 
    (select distinct prefix from chk8reg1),
    (select distinct suffix from chk8reg1);
create table missingPrefix as
    select * from ps
    except
    select prefix, suffix from chk8reg1
order by suffix, prefix;
quit;

data comments;
length comment $200;
do until(last.suffix);
    set missingPrefix; by suffix;
    comment = catx(" and ", comment, cats(prefix,suffix));
    end;
if not missing(comment) then comment = catt(comment, " missing");
keep suffix comment;
run;

proc sql;
create table wantList as
select a.suffix, a.prefix, a.dlv_variables, b.comment
from chk8reg1 as a inner join comments as b
    on a.suffix=b.suffix
order by suffix, comment;
quit;

proc transpose data=wantList out=want(drop=_name_);
by suffix comment;
var dlv_variables;
id prefix;
run;
PG
Respected Advisor
Posts: 3,156

Re: identify missing prefix variables of same suffix variable group

[ Edited ]
Posted in reply to sebster24

Another possibility:

 

 
data chk8reg1;
	input (dlv_variables  prefix  suffix) (:$20.);
	datalines;
GRP_NOMATCH GRP_ NOMATCH
INP_PIBESC INP_ PIBESC
SCO_PIBESC SCO_ PIBESC
GRP_SCOREAPS GRP_ SCOREAPS
SCO_SCOREAPS SCO_ SCOREAPS
GRP_SCOREFAC GRP_ SCOREFAC
INP_SCOREFAC INP_ SCOREFAC
SCO_SCOREFAC SCO_ SCOREFAC
SCO_TEMP SCO_ TEMP
INP_TRIAL INP_ TRIAL
;

PROC SORT
	DATA=WORK.CHK8REG1(KEEP=dlv_variables prefix suffix)
	OUT=_1
;
	BY suffix;
RUN;

PROC TRANSPOSE DATA=_1
	OUT=_2(drop=source)
	NAME=Source
	PREFIX=CoL_
;
	BY suffix;
	ID prefix;
	VAR dlv_variables;
RUN;

QUIT;

DATA WANT;
	SET _2;
	LENGTH _TEMP COMMENTS $ 200;
	ARRAY COL COL:;
	DO OVER COL;
		IF MISSING(COL) THEN
			_TEMP=CATS(SCAN(VNAME(COL),2,'_'),'_',SUFFIX);
		ELSE _TEMP='';
		COMMENTS=CATX('/',COMMENTS,_TEMP);
	END;

	IF NOT MISSING(COMMENTS) THEN
		COMMENTS=CATX(' ',COMMENTS,'MISSING');
	DROP _TEMP;
RUN;
Contributor
Posts: 44

Re: identify missing prefix variables of same suffix variable group

Thank you Haikuo for the solution.

Got another weapon in my SAS utility belt Smiley Happy

Have a good new year celebration.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 300 views
  • 0 likes
  • 4 in conversation