Solved: Re: identify missing prefix variables of same suffix variable group

sebster24 · Posted 12-17-2015 01:20 PM

Hello All,

I spent a day on this and I practically give up. I tried loops, joins and now this, but not able to crack this problem.

Background:

I want to identify variables from a dataset that have different prefixes which grouped by the same suffixes. Should a prefix be missing from a suffix group, we are displayed a comment noting the missing prefix group – see table at bottom of this post.

My approach:

I used proc contents to generate the list of variables. After some minor changes, I created this table below.

I tried to transpose the data by failed to get what I want.

If this was successful. I would use case when statements and create the comments. However, I don’t think this would work as there could be quite a few variable groups i.e. instead of 3 columns after transpose, I could have 6, and then the commenting system wont work.

What I have:

How do I get a dataset as below:

dlv_variables	prefix	suffix
GRP_NOMATCH	GRP_	NOMATCH
INP_PIBESC	INP_	PIBESC
SCO_PIBESC	SCO_	PIBESC
GRP_SCOREAPS	GRP_	SCOREAPS
SCO_SCOREAPS	SCO_	SCOREAPS
GRP_SCOREFAC	GRP_	SCOREFAC
INP_SCOREFAC	INP_	SCOREFAC
SCO_SCOREFAC	SCO_	SCOREFAC
SCO_TEMP	SCO_	TEMP
INP_TRIAL	INP_	TRIAL

data chk8reg1;
input dlv_variables $ prefix $ suffix $;
datalines ;
GRP_NOMATCH GRP_ NOMATCH
INP_PIBESC INP_ PIBESC
SCO_PIBESC SCO_ PIBESC
GRP_SCOREAPS GRP_ SCOREAPS
SCO_SCOREAPS SCO_ SCOREAPS
GRP_SCOREFAC GRP_ SCOREFAC
INP_SCOREFAC INP_ SCOREFAC
SCO_SCOREFAC SCO_ SCOREFAC
SCO_TEMP SCO_ TEMP
INP_TRIAL INP_ TRIAL
;

My failed attempt:

The proc transpose feature using the following code creates the following table:

proc transpose data =chk8reg1 out=chk8trans (keep= part2 col:);
var dlv_variables;
by suffix;
run;

suffix	COL1	COL2	COL3
NOMATCH	GRP_NOMATCH
PIBESC	INP_PIBESC	SCO_PIBESC
SCOREAPS	GRP_SCOREAPS	SCO_SCOREAPS
SCOREFAC	GRP_SCOREFAC	INP_SCOREFAC	SCO_SCOREFAC
TEMP	SCO_TEMP
TRIAL	INP_TRIAL

The issue with this table is that COL1 should only contain distinct groups eg. Variable Part1=”GRP”,

COL2 should only contain Variable PART1=”INP”

And so on and so forth. Order/ sequence does not matter as long as the individual columns consist of individual prefix groups.

In essence, the table should look like this - What I want:

I want a table like this:

suffix	COL1 (GRP_)	COL2 (INP_)	COL3(SCO_)	comment
NOMATCH	GRP_NOMATCH			INP_NOMATCH and SCO_NOMATCH missing
PIBESC		INP_PIBESC	SCO_PIBESC	GRP_PIBESC missing
SCOREAPS	GRP_SCOREAPS		SCO_SCOREAPS	INP_SCOREAPS missing
SCOREFAC	GRP_SCOREFAC	INP_SCOREFAC	SCO_SCOREFAC
TEMP			SCO_TEMP	GRP_TEMP and INP_TEMP missing
TRIAL		INP_TRIAL		GRP_TRIAL and SCO_TRIAL missing

If there are any missing variables COL1, COL2, COL3, then the result should display the missing variables as shown..

If you have any other way to approach this problem, please do let me know.

Many thanks.

Regards,

Seb

PGStats · Posted 12-17-2015 03:57 PM

Try this:

proc sql;
create table ps as
select * 
from 
    (select distinct prefix from chk8reg1),
    (select distinct suffix from chk8reg1);
create table missingPrefix as
    select * from ps
    except
    select prefix, suffix from chk8reg1
order by suffix, prefix;
quit;

data comments;
length comment $200;
do until(last.suffix);
    set missingPrefix; by suffix;
    comment = catx(" and ", comment, cats(prefix,suffix));
    end;
if not missing(comment) then comment = catt(comment, " missing");
keep suffix comment;
run;

proc sql;
create table wantList as
select a.suffix, a.prefix, a.dlv_variables, b.comment
from chk8reg1 as a inner join comments as b
    on a.suffix=b.suffix
order by suffix, comment;
quit;

proc transpose data=wantList out=want(drop=_name_);
by suffix comment;
var dlv_variables;
id prefix;
run;

PG

View solution in original post

FreelanceReinh · Posted 12-17-2015 02:42 PM

Try this:

proc freq data=chk8reg1 noprint;
tables suffix*prefix / sparse out=frq;
run;

data result;
length varnm $32
       missv $200;
set frq;
if count then varnm=cats(prefix, suffix);
else missv=cats(prefix, suffix);
run;

data allmissv;
length comment $200;
do until(last.suffix);
  set result;
  by suffix;
  comment=catx(' ', comment, missv);
end;
keep suffix comment;
run;

proc transpose data=result out=trans(drop=_name_);
by suffix;
id prefix;
var varnm;
run;

data want;
merge trans
      allmissv;
by suffix;
if countw(comment)>1 then comment=tranwrd(trim(comment), ' ', ' and ');
if comment ne ' ' then comment=catx(' ', comment, 'missing');
run;

proc print data=want noobs;
run;

Please increase the $200 lengths of character variables if your variable lists or comments are longer.

sebster24 · Posted 12-18-2015 03:43 AM

Thank you FreelanceReinhard for the solution.
THis solution works well too 🙂
Have a good new year celebration.

PGStats · Posted 12-17-2015 03:57 PM

Try this:

proc sql;
create table ps as
select * 
from 
    (select distinct prefix from chk8reg1),
    (select distinct suffix from chk8reg1);
create table missingPrefix as
    select * from ps
    except
    select prefix, suffix from chk8reg1
order by suffix, prefix;
quit;

data comments;
length comment $200;
do until(last.suffix);
    set missingPrefix; by suffix;
    comment = catx(" and ", comment, cats(prefix,suffix));
    end;
if not missing(comment) then comment = catt(comment, " missing");
keep suffix comment;
run;

proc sql;
create table wantList as
select a.suffix, a.prefix, a.dlv_variables, b.comment
from chk8reg1 as a inner join comments as b
    on a.suffix=b.suffix
order by suffix, comment;
quit;

proc transpose data=wantList out=want(drop=_name_);
by suffix comment;
var dlv_variables;
id prefix;
run;

PG

Haikuo · Posted 12-17-2015 05:18 PM

Another possibility:

data chk8reg1;
	input (dlv_variables  prefix  suffix) (:$20.);
	datalines;
GRP_NOMATCH GRP_ NOMATCH
INP_PIBESC INP_ PIBESC
SCO_PIBESC SCO_ PIBESC
GRP_SCOREAPS GRP_ SCOREAPS
SCO_SCOREAPS SCO_ SCOREAPS
GRP_SCOREFAC GRP_ SCOREFAC
INP_SCOREFAC INP_ SCOREFAC
SCO_SCOREFAC SCO_ SCOREFAC
SCO_TEMP SCO_ TEMP
INP_TRIAL INP_ TRIAL
;

PROC SORT
	DATA=WORK.CHK8REG1(KEEP=dlv_variables prefix suffix)
	OUT=_1
;
	BY suffix;
RUN;

PROC TRANSPOSE DATA=_1
	OUT=_2(drop=source)
	NAME=Source
	PREFIX=CoL_
;
	BY suffix;
	ID prefix;
	VAR dlv_variables;
RUN;

QUIT;

DATA WANT;
	SET _2;
	LENGTH _TEMP COMMENTS $ 200;
	ARRAY COL COL:;
	DO OVER COL;
		IF MISSING(COL) THEN
			_TEMP=CATS(SCAN(VNAME(COL),2,'_'),'_',SUFFIX);
		ELSE _TEMP='';
		COMMENTS=CATX('/',COMMENTS,_TEMP);
	END;

	IF NOT MISSING(COMMENTS) THEN
		COMMENTS=CATX(' ',COMMENTS,'MISSING');
	DROP _TEMP;
RUN;

sebster24 · Posted 12-18-2015 03:41 AM

Thank you Haikuo for the solution.

Got another weapon in my SAS utility belt 🙂

Have a good new year celebration.

Classroom Training Available!