BookmarkSubscribeRSS Feed
jdserbon1
Calcite | Level 5

I have a workable code that imputes the field name into a single var per row of data if the field has an indicator equal to 1. There are 190 values for the X_IND field and 190 associated MODEL vars. Not sure where to start here, but can this be automated/"macroized" - at least the MODEL vars? The Model vars are then concatenated and counted to get the number of all unique combinations.

 

In essence, what the code does is below - basic imputation if an indicator is present - this is the analytic dataset that will be used to tally combinations.

 

ID  A_IND   B_IND   C_IND    D_IND   AA     Model1    Model2    Model3    Model4    Combo

X         1            0           1           0          1           A                          C                             AC

Y         0            1            1          0          1                           B          C                             BC

Z         0            0            1           1          1                                       C               D           DC

 

 

 

 

 

 

 

 

12 REPLIES 12
PaigeMiller
Diamond | Level 26

Macros do not seem to be a solution here. Maybe ARRAYs would work better. But why do this? If the goal is to "this is the analytic dataset that will be used to tally combinations", I think PROC SUMMARY does this for you without even creating these specially derived columns, and saves you some pretty complicated programming. But explain that "tally combinations" part further, please.

--
Paige Miller
jdserbon1
Calcite | Level 5

So, this is a request to produce a more automated approach as opposed to potentially copying and pasting the indicators into the code, as the names of the indicators will change each quarter (they are masked in the code, aliases). The current code works just fine. The tally portion is the third step in the code, it simply produces a count for the unique combinations.

PaigeMiller
Diamond | Level 26

@jdserbon1 wrote:

The tally portion is the third step in the code, it simply produces a count for the unique combinations.


Either PROC FREQ or PROC SUMMARY ought to do this for you without creating these combinations of variables, all you need is A_IND through D_IND.

--
Paige Miller
ballardw
Super User

@jdserbon1 wrote:

So, this is a request to produce a more automated approach as opposed to potentially copying and pasting the indicators into the code, as the names of the indicators will change each quarter (they are masked in the code, aliases). The current code works just fine. The tally portion is the third step in the code, it simply produces a count for the unique combinations.


If the "names of the indicators will change each quarter " is because the date is part of the variable name then this is an indication of poor data structuring, perhaps another  attempt to force spreadsheet thinking to a different approach to working with data. If this is the case you are better off with a VARIABLE that indicates the time period and keeping all of the "model" variables the same. Then when doing the model you filter the data on the desired period by selecting with a WHERE clause or even using BY group processing to get results for each "quarter".

ed_sas_member
Meteorite | Level 14

Hi @jdserbon1 

Here is an approach to automatize steps1 and 2.

The idea is to leverage the use of VNAME(), which reference the variable name in an array:

%let variables = A_IND B_IND C_IND D_IND; /* To be adapted */
%let count_variables = %sysfunc(countw(&variables.));

data test1;
	set test;
	array _ind (*) &variables;
	array MODEL(&count_variables) $;
	do i=1 to dim(_ind);
		if AA=1 and _ind(i)=1 then MODEL(i) = substr(vname(_ind(i)),1,1);
	end;
	COMBO=CATX(',', OF MODEL:);
	MOD_NUM=COUNTW(COMBO,',');
	drop i &variables.;
run;

PROC SQL;
	CREATE TABLE Test3 AS SELECT
	COMBO,
	MOD_NUM,
	COUNT(*) AS AA_COUNT
	FROM Test1
	WHERE (COMBO NE ' ' AND MOD_NUM GE 2)
	GROUP BY 1, 2
	ORDER BY 3 DESC;
QUIT;
Kurt_Bremser
Super User

Once again, a case for Maxim 33: Intelligent Data Makes for Intelligent Programs, and Maxim 12: Long Beats Wide.

data have;
input ID $ A_IND B_IND C_IND D_IND;
datalines;
X 1 0 1 0   
Y 0 1 1 0   
Z 0 0 1 1 
;

proc transpose data=have out=long;
by id;
var _numeric_;
run;

data want;
set long (where=(col1 = 1));
by id;
length combo $10;
retain combo;
if first.id then combo = "";
combo = cats(combo,ifc(indexc(substr(_name_,1,1),combo),'',substr(_name_,1,1)));
if last.id;
keep id combo;
run;

The final data step makes only two assumptions: that the first character of the original variable name is the indicator, and there's maximally 10 observations. No literal names need to be written, no arrays, no macro.

PeterClemmensen
Tourmaline | Level 20
data have;
input ID $ A_IND B_IND C_IND D_IND;
datalines;
X 1 0 1 0
Y 0 1 1 0
Z 0 0 1 1
;

data want(drop=i);
    set have;
    array in {*} A_IND--D_IND;
    array Model {4} $; 
    do i = 1 to dim(in);
        if in[i] then Model[i] = char(vname(in[i]), 1);
    end;
    Combo = cats(of Model[*]);
run;
yabwon
Onyx | Level 15
If I may, since there are 190 columns `xx_IND` it seems to be more than alphabet letters, maybe a `scan(vname(in[i]), 1, "_");` would work better?
Bart
_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Kurt_Bremser
Super User

Missed the "190 columns" part. (Insert another comment about proper data modeling here)

What if there's columns A_IND, B_IND and AB_IND?

We surely need more information about the real column names to come up with viable suggestions.

RichardDeVen
Barite | Level 11
How many rows does the data have ?
RichardDeVen
Barite | Level 11

Use VNAME to obtain the variable corresponding to an array reference.  If the arrayed variables have a consistent naming convention *_ID use SCAN to obtain the value to place in the corresponding model array element.

 

Create sample data (100k rows, 191 columns)

Spoiler
* create 190 identifiers (like excel sheet column names);
data ids(keep=id); length id $2; do code1 = 65 to 90; id = byte(code1); k+1; output; end; do code1 = 65 to 90; do code2 = 65 to 90; id = byte(code1)||byte(code2); k+1; output; if k = 190 then stop; end; end; run; * create variable names for the ids;
data names(keep=name); set ids; length name $32; name = cats(id,'_ID'); run; * pivot to wide;
proc transpose data=names out=havebase(drop=_name_); id name; run; * fill the wide structure with some data; * upto 7 random ids are 1;
data have(keep=id a_id--gh_id); length id 8; if 0 then set have havebase; call streaminit(2020); do id = 1 to 100000; array ids a_id--gh_id; call missing (of ids(*)); do candidate = 1 to rand('integer',1,7); ids(rand('integer',1,dim(ids))) = 1; end; output; end; stop; run;

Sample code:

* compute csv string that lists 1 flagged ids as combo value;

data stage1; set have; array ids a_id--gh_id; length model1-model190 $10; array models model1-model190; do over ids; name = vname(ids); drop name; if length(name)-3 > lengthc(models) then do; put 'ERROR: Model needs more length ' name=; abort cancel; end; if ids then models=scan(vname(ids),1,'_'); end; combo = catx(',', of models(*)); run;
* frequency table for combos;
proc sql; create table combo_freq as select combo, count(*) as freq from stage1 group by combo order by freq desc ; quit;
Reeza
Super User

I would recommend transposing your data to a long format. Processing it as desired and then transposing back to your wide format for reporting if needed. In general wide data sets are clunky and a bad design for this exact reason. It's infinitely easier to add rows to a data set than columns. 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1399 views
  • 5 likes
  • 9 in conversation