I use SAS Enterprise Guide 7.1.
I have a dataset of Car Models by Year and 38 character variables that identify particular characteristics of a model in a given year. I want a total sum of changes for each variable and for each year-model.
Here is what I think the logic should be:
I want to compare all variables using year-model as a by group, and then, say, if there is 1 level of variable (no change), set a counting variable = 1. If there is a change and there are 2 levels for that by group, set a counting variable = 2. Then, sum the counting variables for each variable, then divide by the total number of observations (missing values are valid) weighted by the number of year-models (2). The remainder will equal the total variable changes.
So a formula could be mod(Total Variable Levels/(Total Obs / Year-Models per by group)).
data cars; INPUT Car $ 1-13 A $ 15-16 B $ 18-19 C $ 21-22 D $ 24-27 E $ 29-30 F $ 32-33 G $ 34-35 H $ 36-37; datalines; 2229024038001 A3 B2 C3 924 E1 . . H4 2229024038002 A3 B2 C1 1288 E1 . . H4 2229024039002 A3 B3 C1 800 E1 . . . 2229024039003 A3 B2 C1 1288 E1 . . H4 2763003042001 A1 B2 C2 . . . . H4 2763003042002 A1 B3 C2 . . . . H4 595143-004001 A3 B3 C1 886 E1 . . . 595143-004002 A3 B3 C1 1000 E1 . . . 6062623013003 A3 B4 C1 . E1 . . H4 6062623013004 A3 B4 C1 . E1 . . H4 ; run;
I modified the input code to make the by-group more clear.
data cars; INPUT Car $10. Year $4. A $3. B $3. C $3. D $5. E $3. F $3. G $3. H $3.; datalines; 2229024038 001 A3 B2 C3 924 E1 H4 2229024038 002 A3 B2 C1 1288 E1 H4 2229024039 002 A3 B3 C1 800 E1 2229024039 003 A3 B2 C1 1288 E1 H4 2763003042 001 A1 B2 C2 H4 2763003042 002 A1 B3 C2 H4 595143-004 001 A3 B3 C1 886 E1 595143-004 002 A3 B3 C1 1000 E1 6062623013 003 A3 B4 C1 E1 H4 6062623013 004 A3 B4 C1 E1 H4 ; run;
Here is the desired output for the data given above.
Total Obs 10 10 10 10 10 10 10 10
Year-model by car (weight) 2 2 2 2 2 2 2 2
Total Variable Levels 5 7 6 8 5 5 5 6
Total Variable Changes 0 2 1 3 0 0 0 1
That seems like a very strange result. First it appears to be rotated. It looks like has 8 observations across the page and 4 variables going down. Normally data is presented in the other orientation, like in your example input. Second there is nothing to explain what the different columns (observations) mean or how the rows (variables) relate to the variables in the sample input.
Okay I will try to explain further. Sorry for the confusion and I appreciate any offer to help.
There are 8 variables that I am trying to analyze - A -- H. This is just a subset of the variables, there are 38 in total. In this example there are 8.
Total observations is the sum of all non-missing and missing obs for all variables (10 for all).
Year-model by car is the number of observations in the by group. I think in the input data I use Ver, and by this I meant "version" or in this case year-model.
Total Variable levels is the sum of variable levels by by-group (car). So for variable A (a unique feature of a car, in this case) there were no changes between any year-model, which is the result in the last row of the desired output. For variable B, there were 2 changes - for cars 2229024039 and 2763003042 - between year-models (or model-year might be a better term).
Taking from other topics in SAS community and a nice dose of SAS help pages/examples, I came up with this solution. Let me know if this solution makes sense or if there is any improvement.
%let data = work.cars; proc freq data = &data (drop= year) nlevels; by Car; ods exclude onewayfreqs; ods output nlevels = out_n_levels (keep = tablevar nlevels rename = (tablevar = name nlevels = n_levels)); *to match out_contents; run; proc sort data=work.out_n_levels; by name; run; ods trace on; proc means data=work.out_n_levels n nmiss sum; class name; var n_levels; ods output Summary=Variable_Levels; run; ods trace off; data work.spec_changes; set work.variable_levels; Var_Ch=mod(n_levels_Sum,NObs); run;
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.