Hello All,
I spent a day on this and I practically give up. I tried loops, joins and now this, but not able to crack this problem.
Background:
I want to identify variables from a dataset that have different prefixes which grouped by the same suffixes. Should a prefix be missing from a suffix group, we are displayed a comment noting the missing prefix group – see table at bottom of this post.
My approach:
I used proc contents to generate the list of variables. After some minor changes, I created this table below.
I tried to transpose the data by failed to get what I want.
If this was successful. I would use case when statements and create the comments. However, I don’t think this would work as there could be quite a few variable groups i.e. instead of 3 columns after transpose, I could have 6, and then the commenting system wont work.
What I have:
How do I get a dataset as below:
dlv_variables |
prefix |
suffix |
GRP_NOMATCH |
GRP_ |
NOMATCH |
INP_PIBESC |
INP_ |
PIBESC |
SCO_PIBESC |
SCO_ |
PIBESC |
GRP_SCOREAPS |
GRP_ |
SCOREAPS |
SCO_SCOREAPS |
SCO_ |
SCOREAPS |
GRP_SCOREFAC |
GRP_ |
SCOREFAC |
INP_SCOREFAC |
INP_ |
SCOREFAC |
SCO_SCOREFAC |
SCO_ |
SCOREFAC |
SCO_TEMP |
SCO_ |
TEMP |
INP_TRIAL |
INP_ |
TRIAL |
data chk8reg1; input dlv_variables $ prefix $ suffix $; datalines ; GRP_NOMATCH GRP_ NOMATCH INP_PIBESC INP_ PIBESC SCO_PIBESC SCO_ PIBESC GRP_SCOREAPS GRP_ SCOREAPS SCO_SCOREAPS SCO_ SCOREAPS GRP_SCOREFAC GRP_ SCOREFAC INP_SCOREFAC INP_ SCOREFAC SCO_SCOREFAC SCO_ SCOREFAC SCO_TEMP SCO_ TEMP INP_TRIAL INP_ TRIAL ;
My failed attempt:
The proc transpose feature using the following code creates the following table:
proc transpose data =chk8reg1 out=chk8trans (keep= part2 col:); var dlv_variables; by suffix; run;
suffix |
COL1 |
COL2 |
COL3 |
NOMATCH |
GRP_NOMATCH |
||
PIBESC |
INP_PIBESC |
SCO_PIBESC |
|
SCOREAPS |
GRP_SCOREAPS |
SCO_SCOREAPS |
|
SCOREFAC |
GRP_SCOREFAC |
INP_SCOREFAC |
SCO_SCOREFAC |
TEMP |
SCO_TEMP |
||
TRIAL |
INP_TRIAL |
The issue with this table is that COL1 should only contain distinct groups eg. Variable Part1=”GRP”,
COL2 should only contain Variable PART1=”INP”
And so on and so forth. Order/ sequence does not matter as long as the individual columns consist of individual prefix groups.
In essence, the table should look like this - What I want:
I want a table like this:
suffix |
COL1 (GRP_) |
COL2 (INP_) |
COL3(SCO_) |
comment |
NOMATCH |
GRP_NOMATCH |
INP_NOMATCH and SCO_NOMATCH missing |
||
PIBESC |
INP_PIBESC |
SCO_PIBESC |
GRP_PIBESC missing |
|
SCOREAPS |
GRP_SCOREAPS |
SCO_SCOREAPS |
INP_SCOREAPS missing |
|
SCOREFAC |
GRP_SCOREFAC |
INP_SCOREFAC |
SCO_SCOREFAC |
|
TEMP |
SCO_TEMP |
GRP_TEMP and INP_TEMP missing |
||
TRIAL |
INP_TRIAL |
GRP_TRIAL and SCO_TRIAL missing |
If there are any missing variables COL1, COL2, COL3, then the result should display the missing variables as shown..
If you have any other way to approach this problem, please do let me know.
Many thanks.
Regards,
Seb
Try this:
proc sql;
create table ps as
select *
from
(select distinct prefix from chk8reg1),
(select distinct suffix from chk8reg1);
create table missingPrefix as
select * from ps
except
select prefix, suffix from chk8reg1
order by suffix, prefix;
quit;
data comments;
length comment $200;
do until(last.suffix);
set missingPrefix; by suffix;
comment = catx(" and ", comment, cats(prefix,suffix));
end;
if not missing(comment) then comment = catt(comment, " missing");
keep suffix comment;
run;
proc sql;
create table wantList as
select a.suffix, a.prefix, a.dlv_variables, b.comment
from chk8reg1 as a inner join comments as b
on a.suffix=b.suffix
order by suffix, comment;
quit;
proc transpose data=wantList out=want(drop=_name_);
by suffix comment;
var dlv_variables;
id prefix;
run;
Try this:
proc freq data=chk8reg1 noprint;
tables suffix*prefix / sparse out=frq;
run;
data result;
length varnm $32
missv $200;
set frq;
if count then varnm=cats(prefix, suffix);
else missv=cats(prefix, suffix);
run;
data allmissv;
length comment $200;
do until(last.suffix);
set result;
by suffix;
comment=catx(' ', comment, missv);
end;
keep suffix comment;
run;
proc transpose data=result out=trans(drop=_name_);
by suffix;
id prefix;
var varnm;
run;
data want;
merge trans
allmissv;
by suffix;
if countw(comment)>1 then comment=tranwrd(trim(comment), ' ', ' and ');
if comment ne ' ' then comment=catx(' ', comment, 'missing');
run;
proc print data=want noobs;
run;
Please increase the $200 lengths of character variables if your variable lists or comments are longer.
Try this:
proc sql;
create table ps as
select *
from
(select distinct prefix from chk8reg1),
(select distinct suffix from chk8reg1);
create table missingPrefix as
select * from ps
except
select prefix, suffix from chk8reg1
order by suffix, prefix;
quit;
data comments;
length comment $200;
do until(last.suffix);
set missingPrefix; by suffix;
comment = catx(" and ", comment, cats(prefix,suffix));
end;
if not missing(comment) then comment = catt(comment, " missing");
keep suffix comment;
run;
proc sql;
create table wantList as
select a.suffix, a.prefix, a.dlv_variables, b.comment
from chk8reg1 as a inner join comments as b
on a.suffix=b.suffix
order by suffix, comment;
quit;
proc transpose data=wantList out=want(drop=_name_);
by suffix comment;
var dlv_variables;
id prefix;
run;
Another possibility:
data chk8reg1;
input (dlv_variables prefix suffix) (:$20.);
datalines;
GRP_NOMATCH GRP_ NOMATCH
INP_PIBESC INP_ PIBESC
SCO_PIBESC SCO_ PIBESC
GRP_SCOREAPS GRP_ SCOREAPS
SCO_SCOREAPS SCO_ SCOREAPS
GRP_SCOREFAC GRP_ SCOREFAC
INP_SCOREFAC INP_ SCOREFAC
SCO_SCOREFAC SCO_ SCOREFAC
SCO_TEMP SCO_ TEMP
INP_TRIAL INP_ TRIAL
;
PROC SORT
DATA=WORK.CHK8REG1(KEEP=dlv_variables prefix suffix)
OUT=_1
;
BY suffix;
RUN;
PROC TRANSPOSE DATA=_1
OUT=_2(drop=source)
NAME=Source
PREFIX=CoL_
;
BY suffix;
ID prefix;
VAR dlv_variables;
RUN;
QUIT;
DATA WANT;
SET _2;
LENGTH _TEMP COMMENTS $ 200;
ARRAY COL COL:;
DO OVER COL;
IF MISSING(COL) THEN
_TEMP=CATS(SCAN(VNAME(COL),2,'_'),'_',SUFFIX);
ELSE _TEMP='';
COMMENTS=CATX('/',COMMENTS,_TEMP);
END;
IF NOT MISSING(COMMENTS) THEN
COMMENTS=CATX(' ',COMMENTS,'MISSING');
DROP _TEMP;
RUN;
Thank you Haikuo for the solution.
Got another weapon in my SAS utility belt 🙂
Have a good new year celebration.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.