BookmarkSubscribeRSS Feed
benjamin_2018
Fluorite | Level 6

I use SAS Enterprise Guide 7.1.

I have a dataset of Car Models by Year and 38 character variables that identify particular characteristics of a model in a given year. I want a total sum of changes for each variable and for each year-model.

 

Here is what I think the logic should be: 

 

I want to compare all variables using year-model as a by group, and then, say, if there is 1 level of variable (no change), set a counting variable = 1. If there is a change and there are 2 levels for that by group, set a counting variable = 2. Then, sum the counting variables for each variable, then divide by the total number of observations (missing values are valid) weighted by the number of year-models (2). The remainder will equal the total variable changes.

 

So a formula could be mod(Total Variable Levels/(Total Obs / Year-Models per by group)).

 

7 REPLIES 7
benjamin_2018
Fluorite | Level 6
data cars;
INPUT Car $ 1-13 A $ 15-16 B $ 18-19 C $ 21-22 D $ 24-27 E $ 29-30 F $ 32-33 G $ 34-35 H $ 36-37;
datalines;
2229024038001 A3 B2	C3 924 	E1 .  .	H4
2229024038002 A3 B2	C1 1288 E1 .  .	H4
2229024039002 A3 B3	C1 800	E1 .  .	.
2229024039003 A3 B2	C1 1288	E1 .  .	H4
2763003042001 A1 B2	C2 . 	.  .  .	H4
2763003042002 A1 B3	C2 .	.  .  .	H4
595143-004001 A3 B3	C1 886	E1 .  .	.
595143-004002 A3 B3	C1 1000	E1 .  .	.
6062623013003 A3 B4	C1 .	E1 .  .	H4
6062623013004 A3 B4	C1 .	E1 .  .	H4
;
run;
PaigeMiller
Diamond | Level 26

Can you show us a small example, with the desired output ?

--
Paige Miller
benjamin_2018
Fluorite | Level 6

Here is the desired output for the data given above.

 

Total Obs                             10 10 10 10 10 10 10 10
Year-model by car (weight)  2    2    2  2   2   2   2   2
Total Variable Levels            5    7   6   8  5   5   5   6
Total Variable Changes        0    2   1   3  0  0   0    1

benjamin_2018
Fluorite | Level 6

I modified the input code to make the by-group more clear. 

data cars;
INPUT Car $10. Year $4. A $3. B $3. C $3. D $5. E $3. F $3. G $3. H $3.;
datalines;
2229024038 001 A3 B2	C3 924 	E1 	    H4
2229024038 002 A3 B2	C1 1288 E1      H4
2229024039 002 A3 B3	C1 800	E1        
2229024039 003 A3 B2	C1 1288	E1      H4
2763003042 001 A1 B2	C2              H4
2763003042 002 A1 B3	C2              H4
595143-004 001 A3 B3	C1 886	E1        
595143-004 002 A3 B3	C1 1000	E1        
6062623013 003 A3 B4	C1      E1      H4
6062623013 004 A3 B4	C1      E1      H4
;
run;
Tom
Super User Tom
Super User

@benjamin_2018 wrote:

Here is the desired output for the data given above.

 

Total Obs                             10 10 10 10 10 10 10 10
Year-model by car (weight)  2    2    2  2   2   2   2   2
Total Variable Levels            5    7   6   8  5   5   5   6
Total Variable Changes        0    2   1   3  0  0   0    1


That seems like a very strange result.  First it appears to be rotated. It looks like has 8 observations across the page and 4 variables going down.  Normally data is presented in the other orientation, like in your example input.  Second there is nothing to explain what the different columns (observations) mean or how the rows (variables) relate to the variables in the sample input.

benjamin_2018
Fluorite | Level 6

Okay I will try to explain further. Sorry for the confusion and I appreciate any offer to help. 

 

There are 8 variables that I am trying to analyze - A -- H. This is just a subset of the variables, there are 38 in total. In this example there are 8. 

 

Total observations is the sum of all non-missing and missing obs for all variables (10 for all). 

Year-model by car is the number of observations in the by group. I think in the input data I use Ver, and by this I meant "version" or in this case year-model. 

Total Variable levels is the sum of variable levels by by-group (car). So for variable A (a unique feature of a car, in this case) there were no changes between any year-model, which is the result in the last row of the desired output. For variable B, there were 2 changes - for cars 2229024039 and 2763003042 - between year-models (or model-year might be a better term). 

 

 

 

 

benjamin_2018
Fluorite | Level 6

Taking from other topics in SAS community and a nice dose of SAS help pages/examples, I came up with this solution. Let me know if this solution makes sense or if there is any improvement. 

 

%let data = work.cars;

proc freq data = &data (drop= year) nlevels;
	by Car;
	ods exclude onewayfreqs;
    ods output nlevels = out_n_levels
				(keep = tablevar nlevels rename  = (tablevar = name
                 nlevels  = n_levels));
				 *to match  out_contents;
				 run;

proc sort data=work.out_n_levels;
by name;
run;

ods trace on;

proc means data=work.out_n_levels n nmiss sum;
class name;
var n_levels;
ods output Summary=Variable_Levels;
run;

ods trace off;

data work.spec_changes;
set work.variable_levels;
Var_Ch=mod(n_levels_Sum,NObs);
run;

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1618 views
  • 0 likes
  • 3 in conversation