Hi everyone,
I am struggling to combine values within the SUB1 variable. I want to combine alcohol, barbiturates and tranquilizers. into a new variable. I'm using SAS on Demand
Thank you
I would probably hard code it, since you don't seem super familiar with formats/recoding.
data tedsa_puf_2021_combined;
set tedsa_puf_2021;
length sub1_combined$100.;
if sub1 in (2, 14, 15) then sub1_combined = 'Alcohol, Barbiturates, Other Tranquilizers';
else sub1_combined= put(sub1, sub.);
run;
proc freq data=tedsa_puf_2021_combined;
table sub1_combined*region;
run;
The suggestion from @Astounding is the best path forward here. Excellent advice.
I got this error
This is my data set
You cannot put PART of a VALUE statement from PROC FORMAT into the middle of DATA step. And even worse into the middle of the DATA statement of that step.
You have to run the whole PROC FORMAT step separately from the DATA step or the PROC FREQ step.
May i have an example of this please?
Run the code as provided by @Astounding :
proc format;
value $mycat
"Alcohol", "Barbiturates", "Other Tranquilizers" = "Combined ABT" ;
run;
so the log has no errors but the results are unchanged.
The first post by @Astounding explains that you have to modify PROC FREQ and gives the specific command to do so.
The log here shows that you ran PROC FORMAT successfully. Having done that, the next step would be to actually apply the format you have created. For example:
proc freq data=work.tedsa_puf_2021;
tables sub1 * gender;
format sub1 $mycat.;
run;
Try that, and see if it starts to make sense.
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
68
69 proc cimport
70
71 infile="/home/u63544247/curiosity cup/TEDS_A_2021-data-sas.stc" lib=WORK;
72
73 run;
NOTE: PROC CIMPORT begins to create/update catalog WORK.FORMATS
NOTE: Entry ADMYR.FORMAT has been imported.
NOTE: Entry AGE.FORMAT has been imported.
NOTE: Entry ALCD.FORMAT has been imported.
NOTE: Entry ARRESTS.FORMAT has been imported.
NOTE: Entry CBSA.FORMAT has been imported.
NOTE: Entry DAYWAIT.FORMAT has been imported.
NOTE: Entry DETCRIM.FORMAT has been imported.
NOTE: Entry DETNLF.FORMAT has been imported.
NOTE: Entry DIVISION.FORMAT has been imported.
NOTE: Entry DSMCRIT.FORMAT has been imported.
NOTE: Entry EDUC.FORMAT has been imported.
NOTE: Entry EMPLOY.FORMAT has been imported.
NOTE: Entry ETH.FORMAT has been imported.
NOTE: Entry FLAG.FORMAT has been imported.
NOTE: Entry FREQ.FORMAT has been imported.
NOTE: Entry FRSTUSE.FORMAT has been imported.
NOTE: Entry GENDER.FORMAT has been imported.
NOTE: Entry HLTHINS.FORMAT has been imported.
NOTE: Entry IDU.FORMAT has been imported.
NOTE: Entry LIVARAG.FORMAT has been imported.
NOTE: Entry MARSTAT.FORMAT has been imported.
NOTE: Entry METHUSE.FORMAT has been imported.
NOTE: Entry NOPRIOR.FORMAT has been imported.
NOTE: Entry PREG.FORMAT has been imported.
NOTE: Entry PRIMINC.FORMAT has been imported.
NOTE: Entry PRIMPAY.FORMAT has been imported.
NOTE: Entry PSOURCE.FORMAT has been imported.
NOTE: Entry PSYPROB.FORMAT has been imported.
NOTE: Entry RACE.FORMAT has been imported.
NOTE: Entry REGION.FORMAT has been imported.
NOTE: Entry ROUTE.FORMAT has been imported.
NOTE: Entry SELFHELP.FORMAT has been imported.
NOTE: Entry SERVSETA.FORMAT has been imported.
NOTE: Entry STFIPS.FORMAT has been imported.
NOTE: Entry SUB.FORMAT has been imported.
NOTE: Entry VET.FORMAT has been imported.
NOTE: Total number of entries processed in catalog WORK.FORMATS: 36
NOTE: PROC CIMPORT begins to create/update data set WORK.TEDSA_PUF_2021
NOTE: Data set contains 62 variables and 1482543 observations.
Logical record length is 496
NOTE: Compressing data set WORK.TEDSA_PUF_2021 decreased size by 60.42 percent.
Compressed is 2223 pages; un-compressed would require 5616 pages.
NOTE: PROCEDURE CIMPORT used (Total process time):
real time 6.69 seconds
user cpu time 6.46 seconds
system cpu time 0.23 seconds
memory 2467.43k
OS Memory 22692.00k
Timestamp 02/12/2024 07:30:55 PM
Step Count 24 Switch Count 39
Page Faults 0
Page Reclaims 695
Page Swaps 0
Voluntary Context Switches 249
Involuntary Context Switches 19
Block Input Operations 0
Block Output Operations 569480
74 Proc contents data=work.tedsa_puf_2021;
75
NOTE: PROCEDURE CONTENTS used (Total process time):
real time 0.06 seconds
user cpu time 0.06 seconds
system cpu time 0.01 seconds
memory 4030.59k
OS Memory 23720.00k
Timestamp 02/12/2024 07:30:55 PM
Step Count 25 Switch Count 0
Page Faults 0
Page Reclaims 1043
Page Swaps 0
Voluntary Context Switches 3
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 40
76 proc freq data=work.tedsa_puf_2021;
77 tables SUB1*GENDER;
78 run;
NOTE: There were 1482543 observations read from the data set WORK.TEDSA_PUF_2021.
NOTE: PROCEDURE FREQ used (Total process time):
real time 1.41 seconds
user cpu time 1.33 seconds
system cpu time 0.09 seconds
memory 3366.31k
OS Memory 24748.00k
Timestamp 02/12/2024 07:30:56 PM
Step Count 26 Switch Count 20
Page Faults 0
Page Reclaims 739
Page Swaps 0
Voluntary Context Switches 66
Involuntary Context Switches 3
Block Input Operations 0
Block Output Operations 568
79 proc freq data=work.tedsa_puf_2021;
80 tables SUB1*Region;
NOTE: There were 1482543 observations read from the data set WORK.TEDSA_PUF_2021.
NOTE: PROCEDURE FREQ used (Total process time):
real time 1.50 seconds
user cpu time 1.43 seconds
system cpu time 0.07 seconds
memory 3086.84k
OS Memory 24748.00k
Timestamp 02/12/2024 07:30:58 PM
Step Count 27 Switch Count 9
Page Faults 0
Page Reclaims 469
Page Swaps 0
Voluntary Context Switches 36
Involuntary Context Switches 6
Block Input Operations 0
Block Output Operations 576
81 proc format;
82 value $mycat
83 "Alcohol", "Barbiturates", "Other Tranquilizers" = "Combined ABT" ;
NOTE: Format $MYCAT has been output.
84 run;
NOTE: PROCEDURE FORMAT used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 306.87k
OS Memory 22688.00k
Timestamp 02/12/2024 07:30:58 PM
Step Count 28 Switch Count 0
Page Faults 0
Page Reclaims 66
Page Swaps 0
Voluntary Context Switches 0
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 0
85 proc freq data=work.tedsa_puf_2021;
86 tables sub1 * gender;
87 format sub1 $mycat.;
ERROR: You are trying to use the character format $MYCAT with the numeric variable SUB1 in data set WORK.TEDSA_PUF_2021.
88 run;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE FREQ used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 2383.37k
OS Memory 24228.00k
Timestamp 02/12/2024 07:30:58 PM
Step Count 29 Switch Count 0
Page Faults 0
Page Reclaims 332
Page Swaps 0
Voluntary Context Switches 0
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 0
89
90 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
100 i get this error
The FREQ Procedure
Substance use (primary)SUB1 Frequency Percent CumulativeFrequency CumulativePercentMissing/unknown/not collected/invalidNoneAlcoholCocaine/crackMarijuana/hashishHeroinNon-prescription methadoneOther opiates and syntheticsPCPHallucinogensMethamphetamine/speedOther amphetaminesOther stimulantsBenzodiazepinesOther tranquilizersBarbituratesOther sedatives or hypnoticsInhalantsOver-the-counter medicationsOther drugs
| 219380 | 14.80 | 219380 | 14.80 |
| 37871 | 2.55 | 257251 | 17.35 |
| 439755 | 29.66 | 697006 | 47.01 |
| 70456 | 4.75 | 767462 | 51.77 |
| 129343 | 8.72 | 896805 | 60.49 |
| 255401 | 17.23 | 1152206 | 77.72 |
| 1509 | 0.10 | 1153715 | 77.82 |
| 113764 | 7.67 | 1267479 | 85.49 |
| 3911 | 0.26 | 1271390 | 85.76 |
| 1989 | 0.13 | 1273379 | 85.89 |
| 170220 | 11.48 | 1443599 | 97.37 |
| 10678 | 0.72 | 1454277 | 98.09 |
| 2606 | 0.18 | 1456883 | 98.27 |
| 14008 | 0.94 | 1470891 | 99.21 |
| 111 | 0.01 | 1471002 | 99.22 |
| 731 | 0.05 | 1471733 | 99.27 |
| 1635 | 0.11 | 1473368 | 99.38 |
| 678 | 0.05 | 1474046 | 99.43 |
| 602 | 0.04 | 1474648 | 99.47 |
| 7895 | 0.53 | 1482543 | 100.00 |
The FREQ Procedure
Substance use (primary)SUB1 Frequency Percent CumulativeFrequency CumulativePercent-912345678910111213141516171819
| 219380 | 14.80 | 219380 | 14.80 |
| 37871 | 2.55 | 257251 | 17.35 |
| 439755 | 29.66 | 697006 | 47.01 |
| 70456 | 4.75 | 767462 | 51.77 |
| 129343 | 8.72 | 896805 | 60.49 |
| 255401 | 17.23 | 1152206 | 77.72 |
| 1509 | 0.10 | 1153715 | 77.82 |
| 113764 | 7.67 | 1267479 | 85.49 |
| 3911 | 0.26 | 1271390 | 85.76 |
| 1989 | 0.13 | 1273379 | 85.89 |
| 170220 | 11.48 | 1443599 | 97.37 |
| 10678 | 0.72 | 1454277 | 98.09 |
| 2606 | 0.18 | 1456883 | 98.27 |
| 14008 | 0.94 | 1470891 | 99.21 |
| 111 | 0.01 | 1471002 | 99.22 |
| 731 | 0.05 | 1471733 | 99.27 |
| 1635 | 0.11 | 1473368 | 99.38 |
| 678 | 0.05 | 1474046 | 99.43 |
| 602 | 0.04 | 1474648 | 99.47 |
| 7895 | 0.53 | 1482543 | 100.00 |
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
68
69 proc freq data=work.tedsa_puf_2021;
70 tables sub1;
71 run;
NOTE: There were 1482543 observations read from the data set WORK.TEDSA_PUF_2021.
NOTE: PROCEDURE FREQ used (Total process time):
real time 1.27 seconds
user cpu time 1.21 seconds
system cpu time 0.08 seconds
memory 3527.18k
OS Memory 24744.00k
Timestamp 02/13/2024 12:50:32 AM
Step Count 40 Switch Count 3
Page Faults 0
Page Reclaims 488
Page Swaps 0
Voluntary Context Switches 13
Involuntary Context Switches 2
Block Input Operations 0
Block Output Operations 272
72 proc freq data=tedsa_puf_2021;
73 tables sub1;
74 format sub1;
75 run;
NOTE: There were 1482543 observations read from the data set WORK.TEDSA_PUF_2021.
NOTE: PROCEDURE FREQ used (Total process time):
real time 1.25 seconds
user cpu time 1.18 seconds
system cpu time 0.07 seconds
memory 2778.93k
OS Memory 24744.00k
Timestamp 02/13/2024 12:50:34 AM
Step Count 41 Switch Count 9
Page Faults 0
Page Reclaims 403
Page Swaps 0
Voluntary Context Switches 32
Involuntary Context Switches 3
Block Input Operations 0
Block Output Operations 280
76
77 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
87
Good job running the program I requested. The results are a bit difficult to read. Do they appear the same before you posted them here?
You have two tables, with the same numbers in them. That is good. The idea now is to take find the match between the displayed values, such as "Other tranqualizers" and the codes that the data uses to represent those values. Compare the two tables, and see if it's obvious which codes match which categories.
If the two tables can't be compared in that way, we'll devise another way. But for now see if it is obvious to you how to match up the codes and the categories.
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
68
69 proc cimport
70
71 infile="/home/u63544247/curiosity cup/TEDS_A_2021-data-sas.stc" lib=WORK;
72
73 run;
NOTE: PROC CIMPORT begins to create/update catalog WORK.FORMATS
NOTE: Entry ADMYR.FORMAT has been imported.
NOTE: Entry AGE.FORMAT has been imported.
NOTE: Entry ALCD.FORMAT has been imported.
NOTE: Entry ARRESTS.FORMAT has been imported.
NOTE: Entry CBSA.FORMAT has been imported.
NOTE: Entry DAYWAIT.FORMAT has been imported.
NOTE: Entry DETCRIM.FORMAT has been imported.
NOTE: Entry DETNLF.FORMAT has been imported.
NOTE: Entry DIVISION.FORMAT has been imported.
NOTE: Entry DSMCRIT.FORMAT has been imported.
NOTE: Entry EDUC.FORMAT has been imported.
NOTE: Entry EMPLOY.FORMAT has been imported.
NOTE: Entry ETH.FORMAT has been imported.
NOTE: Entry FLAG.FORMAT has been imported.
NOTE: Entry FREQ.FORMAT has been imported.
NOTE: Entry FRSTUSE.FORMAT has been imported.
NOTE: Entry GENDER.FORMAT has been imported.
NOTE: Entry HLTHINS.FORMAT has been imported.
NOTE: Entry IDU.FORMAT has been imported.
NOTE: Entry LIVARAG.FORMAT has been imported.
NOTE: Entry MARSTAT.FORMAT has been imported.
NOTE: Entry METHUSE.FORMAT has been imported.
NOTE: Entry NOPRIOR.FORMAT has been imported.
NOTE: Entry PREG.FORMAT has been imported.
NOTE: Entry PRIMINC.FORMAT has been imported.
NOTE: Entry PRIMPAY.FORMAT has been imported.
NOTE: Entry PSOURCE.FORMAT has been imported.
NOTE: Entry PSYPROB.FORMAT has been imported.
NOTE: Entry RACE.FORMAT has been imported.
NOTE: Entry REGION.FORMAT has been imported.
NOTE: Entry ROUTE.FORMAT has been imported.
NOTE: Entry SELFHELP.FORMAT has been imported.
NOTE: Entry SERVSETA.FORMAT has been imported.
NOTE: Entry STFIPS.FORMAT has been imported.
NOTE: Entry SUB.FORMAT has been imported.
NOTE: Entry VET.FORMAT has been imported.
NOTE: Total number of entries processed in catalog WORK.FORMATS: 36
NOTE: PROC CIMPORT begins to create/update data set WORK.TEDSA_PUF_2021
NOTE: Data set contains 62 variables and 1482543 observations.
Logical record length is 496
NOTE: Compressing data set WORK.TEDSA_PUF_2021 decreased size by 60.42 percent.
Compressed is 2223 pages; un-compressed would require 5616 pages.
NOTE: PROCEDURE CIMPORT used (Total process time):
real time 6.64 seconds
user cpu time 6.35 seconds
system cpu time 0.28 seconds
memory 2420.28k
OS Memory 25252.00k
Timestamp 02/13/2024 04:13:37 AM
Step Count 79 Switch Count 39
Page Faults 0
Page Reclaims 324
Page Swaps 0
Voluntary Context Switches 247
Involuntary Context Switches 6
Block Input Operations 0
Block Output Operations 569704
74 Proc contents data=work.tedsa_puf_2021;
75
NOTE: PROCEDURE CONTENTS used (Total process time):
real time 0.06 seconds
user cpu time 0.06 seconds
system cpu time 0.00 seconds
memory 3713.53k
OS Memory 25256.00k
Timestamp 02/13/2024 04:13:37 AM
Step Count 80 Switch Count 0
Page Faults 0
Page Reclaims 383
Page Swaps 0
Voluntary Context Switches 4
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 48
76 proc freq data=work.tedsa_puf_2021;
77 tables SUB1*GENDER;
78 run;
NOTE: There were 1482543 observations read from the data set WORK.TEDSA_PUF_2021.
NOTE: PROCEDURE FREQ used (Total process time):
real time 1.39 seconds
user cpu time 1.30 seconds
system cpu time 0.10 seconds
memory 3293.81k
OS Memory 25772.00k
Timestamp 02/13/2024 04:13:38 AM
Step Count 81 Switch Count 23
Page Faults 0
Page Reclaims 471
Page Swaps 0
Voluntary Context Switches 71
Involuntary Context Switches 1
Block Input Operations 0
Block Output Operations 560
79 proc freq data=work.tedsa_puf_2021;
80 tables SUB1*Region;
NOTE: There were 1482543 observations read from the data set WORK.TEDSA_PUF_2021.
NOTE: PROCEDURE FREQ used (Total process time):
real time 1.41 seconds
user cpu time 1.34 seconds
system cpu time 0.07 seconds
memory 3133.78k
OS Memory 25772.00k
Timestamp 02/13/2024 04:13:40 AM
Step Count 82 Switch Count 8
Page Faults 0
Page Reclaims 470
Page Swaps 0
Voluntary Context Switches 35
Involuntary Context Switches 1
Block Input Operations 0
Block Output Operations 576
81 proc format;
82 value mycat
83 "Alcohol", "Barbiturates", "Other Tranquilizers" = "Combined ABT" ;
ERROR: The quoted string 'Alcohol' is not acceptable to a numeric format or informat.
NOTE: The previous statement has been deleted.
84 run;
NOTE: PROCEDURE FORMAT used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 230.84k
OS Memory 23712.00k
Timestamp 02/13/2024 04:13:40 AM
Step Count 83 Switch Count 0
Page Faults 0
Page Reclaims 14
Page Swaps 0
Voluntary Context Switches 0
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 0
NOTE: The SAS System stopped processing this step because of errors.
85 proc freq data=work.tedsa_puf_2021;
86 tables sub1 * gender;
87 format sub1 mycat.;
88 run;
NOTE: There were 1482543 observations read from the data set WORK.TEDSA_PUF_2021.
NOTE: PROCEDURE FREQ used (Total process time):
real time 1.39 seconds
user cpu time 1.30 seconds
system cpu time 0.09 seconds
memory 3068.93k
OS Memory 25772.00k
Timestamp 02/13/2024 04:13:41 AM
Step Count 84 Switch Count 4
Page Faults 0
Page Reclaims 452
Page Swaps 0
Voluntary Context Switches 21
Involuntary Context Switches 2
Block Input Operations 0
Block Output Operations 560
89
90 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
100
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Ready to level-up your skills? Choose your own adventure.