Hey,
I have data like this:
data have;
input entity $ con1 $ con2 con3 $;
datalines;
a y 1 d
a y 2 i
a y 3 d
a y 1 d
a n 2 d
a n 3 d
b y 1 di
b y 1 di
b n 1 ii
b n 3 i
b n 3 .
b n 3 .
c y 2 .
c n 2 .
c y 3 di
c n 3 .
d y 1 di
d y 2 ii
d y 3 i
d n 2 i
d y 1 d
d n 2 d
d y 3 .
d n 1 i
;
There are more variables and more entities, this is the simplied version to protect the innocent.
I want to make an output file like this:
entity | con1-y | con1-n | con2-1 | con2-2 | con2-3 | con3-d | con3-i | con3-di | con3-ii |
a | 4 | 2 | 2 | 2 | 2 | 5 | 1 | 0 | 0 |
b | 2 | 4 | 3 | 0 | 3 | 0 | 1 | 2 | 1 |
c | 2 | 2 | 0 | 2 | 2 | 0 | 0 | 1 | 0 |
d | 5 | 3 | 3 | 3 | 2 | 2 | 3 | 1 | 1 |
I want that output file to serve as a source for a mailmerge and a communication that can be all pretty.
I can generate the output with this SQL:
proc sql;
select distinct entity as entity,
(select count(con1)
from have
where con1 = 'y' and entity='a') as con1_y,
(select count(con1)
from have
where con1 = 'n' and entity='a') as con1_n,
(select count(con2)
from have
where con2 = 1 and entity='a') as con2_1,
(select count(con2)
from have
where con2 = 2 and entity='a') as con2_2,
(select count(con2)
from have
where con2 = 3 and entity='a') as con2_3,
(select count(con3)
from have
where con3 = 'd' and entity='a') as con3_d,
(select count(con3)
from have
where con3 = 'i' and entity='a') as con3_i,
(select count(con3)
from have
where con3 = 'di' and entity='a') as con3_di,
(select count(con3)
from have
where con3 = 'ii' and entity='a') as con3_ii
from have
where entity = 'a';
where I repeat that block once for each entity and append to an Excel file with the result.
That strikes me as not very efficient, certainly not elegant, and not very good programming. This process started with about 8 entities, it's now about 40 entities, and I'm involved because it could go to 200 entities and I'm trying to avert disaster in which a code block like this is repeated 200 times and the output is individually formatted by hand before sending (as it is now).
What I envision is some sort of looping deal, but while I can write the above SQL, I don't know how to code a loop like this in SAS.
I think I want something like:
for each entity
for each variable
for each possible value
count
next value
next variable
next entity
Maybe load arrays of possible entities, possible variable, and possible values and then walk them in some sort of nesting? Just not sure. There is probably cool SAS stuff that does this that I don't know about.
I want a loop of code that doesn't change (or changes very little) regardless of the number of input entities. The reporting variables will stay the same. Currently, adding a new entity to the reporting mix means adding a new input source on the front end of processing and then a couple of reporting blocks on the back end. The front end input I don't mind, but I want to avoid 200 repeating output blocks. It screams loop, I just don't know how to do it.
Help?
How about:
data have; input entity $ con1 $ con2 con3 $; datalines; a y 1 d a y 2 i a y 3 d a y 1 d a n 2 d a n 3 d b y 1 di b y 1 di b n 1 ii b n 3 i b n 3 . b n 3 . c y 2 . c n 2 . c y 3 di c n 3 . d y 1 di d y 2 ii d y 3 i d n 2 i d y 1 d d n 2 d d y 3 . d n 1 i ; run; data temp(keep=entity name ); set have; length name $ 10; name=catx('_','con1',con1); output; name=catx('_','con2',con2); output; name=catx('_','con3',con3); output; run; proc freq data=temp noprint; tables entity*name/list out=x(drop=percent) nocum nopercent; run; options missing=0; proc transpose data=x out=want(drop=_:); by entity; id name; var count; run;
Ksharp
If you're willing to accept a similar report, instead of exactly the same, this would do it:
proc tabulate data=have;
class entity con1-con3;
tables entity, (con1-con3)*n=' ';
run;
It's simple enough, but is the output good enough? Some things you can tweak, some you can't.
They are currently using a PROC FREQ to make a report. PROC TABULATE, PROC FREQ. Regaardless of which is used, I think the output is still not going to be pretty enough but more importantly it will still be 40 or 200 or however many different files.
That PROC TABULATE blows up anyway.
1561 proc tabulate data=have;
1563 tables entity, (con1-con3)*n=' ';
-
22
----
202
ERROR 22-322: Syntax error, expecting one of the following: a name, a quoted string, (, ), *, <, =, [, {.
ERROR 202-322: The option or parameter is not recognized and will be ignored.
1564 run;
I'd like to make one output file that contains the info and can be used as a source as desired. Like I say, I can do it using the same approach of repeating a block of code endlessly, but that's paving a cow path and I'm looking to make a proper road.
1562 class entity con1-con3;
Well, just like Astounding said, it is simple enough to just get the output like that. Will there be alternative way to obtain a dataset other than the SQL you have used? Yes. Will it be better? you 'd be the judge.
options missing='';
data have;
input entity $ con1 $ con2 con3 $;
datalines;
a y 1 d
a y 2 i
a y 3 d
a y 1 d
a n 2 d
a n 3 d
b y 1 di
b y 1 di
b n 1 ii
b n 3 i
b n 3 .
b n 3 .
c y 2 .
c n 2 .
c y 3 di
c n 3 .
d y 1 di
d y 2 ii
d y 3 i
d n 2 i
d y 1 d
d n 2 d
d y 3 .
d n 1 i
;
ODS LISTING CLOSE;
ods output table=have1;
proc tabulate data=have;
class entity con1 con2 con3;
table entity ,
con1 con 2 con3;
run;
ods output close;
ODS LISTING ;
proc sort data=have1 out=have2 (keep=entity con1-con3 n);
by entity;
run;
data have3 (drop=con2 rename=con4=con2);
set have2;
con4=put (con2,1.);
run;
data have4;
set have3;
array v con1-con3;
array vid $8. id1-id3;
do i=1 to dim(v);
if not missing(v(i)) then
vid(i)=catx('_',vname(v(i)),v(i));
end;
id=coalescec(id1,id2,id3);
run;
options missing=0;
proc transpose data=have4 out=want (drop=_:) ;
by entity;
var n;
id id;
run;
proc print;run;
Haikuo
That PROC TABULATE blows up as well.
That is odd. It runs through smoothly on my machine. Anyway, let's make it simpler, replace the ods part of code with the following:
ODS LISTING CLOSE;
ods output table=have1;
proc tabulate data=have;
class entity con1 con2 con3;
table entity ,
con1 con2 con3 / misstext='0.00';
run;
ods output close;
ODS LISTING ;
BTW, when you do copy paste between your web browser and SAS, it will produce empty lines, which sometimes will mess up with your raw data. Get rid of them before running the code.
On my machine, you can abbrevate con1-con3 on the CLASS stateme nt, but it blows up as part of the TABLES statement. So replace this:
con1-con3
Use this instead:
con1 con2 con3
If the overall form is good enough, we can look at using ODS to get Excel-ready output.
Astounding wrote:
If the overall form is good enough, we can look at using ODS to get Excel-ready output.
That's an extremely intriguing prospect.
This PROC TABULATE runs:
proc tabulate data=have;
class entity con1 con2 con3;
table entity, con1*n con2*n con3*n;
run;
and it does replicate my Excel table, but I don't think I can dump the PROC TABULATE output into a data file and ultimately an Excel file for use a merge source. Can I? If I could, this might be a lot easier than I thought.
Yes, you can, by running my code.
I'm still staring at that, trying to figure out what it does.
Just updated the code. con1-con3 as well as other non-essential parts have been replaced.
The original version blew up.
2276 proc transpose data=have4 out=want (drop=_:) ;
2277 by entity;
2278 var n;
2279 id id;
2280 run;
ERROR: The ID value "con2__" occurs twice in the same BY group.
NOTE: The above message was for the following by-group:
entity=a
ERROR: The ID value "con2__" occurs twice in the same BY group.
ERROR: The ID value "con2__" occurs twice in the same BY group.
NOTE: The above message was for the following by-group:
entity=b
ERROR: The ID value "con2__" occurs twice in the same BY group.
ERROR: The ID value "con2__" occurs twice in the same BY group.
ERROR: The ID value "con2__" occurs twice in the same BY group.
NOTE: The above message was for the following by-group:
entity=d
WARNING: 3 BY groups omitted due to earlier errors.
NOTE: There were 26 observations read from the data set WORK.HAVE4.
NOTE: The data set WORK.WANT has 1 observations and 7 variables.
NOTE: PROCEDURE TRANSPOSE used (Total process time):
real time 0.07 seconds
cpu time 0.01 seconds
Running the new version now.
Still blows up on the PROC TRANSPOSE. Same errors.
HB,
This is a data-related problem. It indicates that con2 takes on some values that are longer than 1 digit. Try replacing this statement:
con4 = put(con2,1.);
with this combination:
length con4 $ 5;
con4 = left(put(con2,5.));
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.