Hi everyone,
I am struggling to combine values within the SUB1 variable. I want to combine alcohol, barbiturates and tranquilizers. into a new variable. I'm using SAS on Demand
Thank you
I would probably hard code it, since you don't seem super familiar with formats/recoding.
data tedsa_puf_2021_combined;
set tedsa_puf_2021;
length sub1_combined$100.;
if sub1 in (2, 14, 15) then sub1_combined = 'Alcohol, Barbiturates, Other Tranquilizers';
else sub1_combined= put(sub1, sub.);
run;
proc freq data=tedsa_puf_2021_combined;
table sub1_combined*region;
run;
The suggestion from @Astounding is the best path forward here. Excellent advice.
I got this error
This is my data set
You cannot put PART of a VALUE statement from PROC FORMAT into the middle of DATA step. And even worse into the middle of the DATA statement of that step.
You have to run the whole PROC FORMAT step separately from the DATA step or the PROC FREQ step.
May i have an example of this please?
Run the code as provided by @Astounding :
proc format;
value $mycat
"Alcohol", "Barbiturates", "Other Tranquilizers" = "Combined ABT" ;
run;
so the log has no errors but the results are unchanged.
The first post by @Astounding explains that you have to modify PROC FREQ and gives the specific command to do so.
The log here shows that you ran PROC FORMAT successfully. Having done that, the next step would be to actually apply the format you have created. For example:
proc freq data=work.tedsa_puf_2021;
tables sub1 * gender;
format sub1 $mycat.;
run;
Try that, and see if it starts to make sense.
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 68 69 proc cimport 70 71 infile="/home/u63544247/curiosity cup/TEDS_A_2021-data-sas.stc" lib=WORK; 72 73 run; NOTE: PROC CIMPORT begins to create/update catalog WORK.FORMATS NOTE: Entry ADMYR.FORMAT has been imported. NOTE: Entry AGE.FORMAT has been imported. NOTE: Entry ALCD.FORMAT has been imported. NOTE: Entry ARRESTS.FORMAT has been imported. NOTE: Entry CBSA.FORMAT has been imported. NOTE: Entry DAYWAIT.FORMAT has been imported. NOTE: Entry DETCRIM.FORMAT has been imported. NOTE: Entry DETNLF.FORMAT has been imported. NOTE: Entry DIVISION.FORMAT has been imported. NOTE: Entry DSMCRIT.FORMAT has been imported. NOTE: Entry EDUC.FORMAT has been imported. NOTE: Entry EMPLOY.FORMAT has been imported. NOTE: Entry ETH.FORMAT has been imported. NOTE: Entry FLAG.FORMAT has been imported. NOTE: Entry FREQ.FORMAT has been imported. NOTE: Entry FRSTUSE.FORMAT has been imported. NOTE: Entry GENDER.FORMAT has been imported. NOTE: Entry HLTHINS.FORMAT has been imported. NOTE: Entry IDU.FORMAT has been imported. NOTE: Entry LIVARAG.FORMAT has been imported. NOTE: Entry MARSTAT.FORMAT has been imported. NOTE: Entry METHUSE.FORMAT has been imported. NOTE: Entry NOPRIOR.FORMAT has been imported. NOTE: Entry PREG.FORMAT has been imported. NOTE: Entry PRIMINC.FORMAT has been imported. NOTE: Entry PRIMPAY.FORMAT has been imported. NOTE: Entry PSOURCE.FORMAT has been imported. NOTE: Entry PSYPROB.FORMAT has been imported. NOTE: Entry RACE.FORMAT has been imported. NOTE: Entry REGION.FORMAT has been imported. NOTE: Entry ROUTE.FORMAT has been imported. NOTE: Entry SELFHELP.FORMAT has been imported. NOTE: Entry SERVSETA.FORMAT has been imported. NOTE: Entry STFIPS.FORMAT has been imported. NOTE: Entry SUB.FORMAT has been imported. NOTE: Entry VET.FORMAT has been imported. NOTE: Total number of entries processed in catalog WORK.FORMATS: 36 NOTE: PROC CIMPORT begins to create/update data set WORK.TEDSA_PUF_2021 NOTE: Data set contains 62 variables and 1482543 observations. Logical record length is 496 NOTE: Compressing data set WORK.TEDSA_PUF_2021 decreased size by 60.42 percent. Compressed is 2223 pages; un-compressed would require 5616 pages. NOTE: PROCEDURE CIMPORT used (Total process time): real time 6.69 seconds user cpu time 6.46 seconds system cpu time 0.23 seconds memory 2467.43k OS Memory 22692.00k Timestamp 02/12/2024 07:30:55 PM Step Count 24 Switch Count 39 Page Faults 0 Page Reclaims 695 Page Swaps 0 Voluntary Context Switches 249 Involuntary Context Switches 19 Block Input Operations 0 Block Output Operations 569480 74 Proc contents data=work.tedsa_puf_2021; 75 NOTE: PROCEDURE CONTENTS used (Total process time): real time 0.06 seconds user cpu time 0.06 seconds system cpu time 0.01 seconds memory 4030.59k OS Memory 23720.00k Timestamp 02/12/2024 07:30:55 PM Step Count 25 Switch Count 0 Page Faults 0 Page Reclaims 1043 Page Swaps 0 Voluntary Context Switches 3 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 40 76 proc freq data=work.tedsa_puf_2021; 77 tables SUB1*GENDER; 78 run; NOTE: There were 1482543 observations read from the data set WORK.TEDSA_PUF_2021. NOTE: PROCEDURE FREQ used (Total process time): real time 1.41 seconds user cpu time 1.33 seconds system cpu time 0.09 seconds memory 3366.31k OS Memory 24748.00k Timestamp 02/12/2024 07:30:56 PM Step Count 26 Switch Count 20 Page Faults 0 Page Reclaims 739 Page Swaps 0 Voluntary Context Switches 66 Involuntary Context Switches 3 Block Input Operations 0 Block Output Operations 568 79 proc freq data=work.tedsa_puf_2021; 80 tables SUB1*Region; NOTE: There were 1482543 observations read from the data set WORK.TEDSA_PUF_2021. NOTE: PROCEDURE FREQ used (Total process time): real time 1.50 seconds user cpu time 1.43 seconds system cpu time 0.07 seconds memory 3086.84k OS Memory 24748.00k Timestamp 02/12/2024 07:30:58 PM Step Count 27 Switch Count 9 Page Faults 0 Page Reclaims 469 Page Swaps 0 Voluntary Context Switches 36 Involuntary Context Switches 6 Block Input Operations 0 Block Output Operations 576 81 proc format; 82 value $mycat 83 "Alcohol", "Barbiturates", "Other Tranquilizers" = "Combined ABT" ; NOTE: Format $MYCAT has been output. 84 run; NOTE: PROCEDURE FORMAT used (Total process time): real time 0.00 seconds user cpu time 0.00 seconds system cpu time 0.00 seconds memory 306.87k OS Memory 22688.00k Timestamp 02/12/2024 07:30:58 PM Step Count 28 Switch Count 0 Page Faults 0 Page Reclaims 66 Page Swaps 0 Voluntary Context Switches 0 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 0 85 proc freq data=work.tedsa_puf_2021; 86 tables sub1 * gender; 87 format sub1 $mycat.; ERROR: You are trying to use the character format $MYCAT with the numeric variable SUB1 in data set WORK.TEDSA_PUF_2021. 88 run; NOTE: The SAS System stopped processing this step because of errors. NOTE: PROCEDURE FREQ used (Total process time): real time 0.00 seconds user cpu time 0.00 seconds system cpu time 0.00 seconds memory 2383.37k OS Memory 24228.00k Timestamp 02/12/2024 07:30:58 PM Step Count 29 Switch Count 0 Page Faults 0 Page Reclaims 332 Page Swaps 0 Voluntary Context Switches 0 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 0 89 90 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 100
i get this error
The FREQ Procedure
Substance use (primary)SUB1 Frequency Percent CumulativeFrequency CumulativePercentMissing/unknown/not collected/invalidNoneAlcoholCocaine/crackMarijuana/hashishHeroinNon-prescription methadoneOther opiates and syntheticsPCPHallucinogensMethamphetamine/speedOther amphetaminesOther stimulantsBenzodiazepinesOther tranquilizersBarbituratesOther sedatives or hypnoticsInhalantsOver-the-counter medicationsOther drugs
219380 | 14.80 | 219380 | 14.80 |
37871 | 2.55 | 257251 | 17.35 |
439755 | 29.66 | 697006 | 47.01 |
70456 | 4.75 | 767462 | 51.77 |
129343 | 8.72 | 896805 | 60.49 |
255401 | 17.23 | 1152206 | 77.72 |
1509 | 0.10 | 1153715 | 77.82 |
113764 | 7.67 | 1267479 | 85.49 |
3911 | 0.26 | 1271390 | 85.76 |
1989 | 0.13 | 1273379 | 85.89 |
170220 | 11.48 | 1443599 | 97.37 |
10678 | 0.72 | 1454277 | 98.09 |
2606 | 0.18 | 1456883 | 98.27 |
14008 | 0.94 | 1470891 | 99.21 |
111 | 0.01 | 1471002 | 99.22 |
731 | 0.05 | 1471733 | 99.27 |
1635 | 0.11 | 1473368 | 99.38 |
678 | 0.05 | 1474046 | 99.43 |
602 | 0.04 | 1474648 | 99.47 |
7895 | 0.53 | 1482543 | 100.00 |
The FREQ Procedure
Substance use (primary)SUB1 Frequency Percent CumulativeFrequency CumulativePercent-912345678910111213141516171819
219380 | 14.80 | 219380 | 14.80 |
37871 | 2.55 | 257251 | 17.35 |
439755 | 29.66 | 697006 | 47.01 |
70456 | 4.75 | 767462 | 51.77 |
129343 | 8.72 | 896805 | 60.49 |
255401 | 17.23 | 1152206 | 77.72 |
1509 | 0.10 | 1153715 | 77.82 |
113764 | 7.67 | 1267479 | 85.49 |
3911 | 0.26 | 1271390 | 85.76 |
1989 | 0.13 | 1273379 | 85.89 |
170220 | 11.48 | 1443599 | 97.37 |
10678 | 0.72 | 1454277 | 98.09 |
2606 | 0.18 | 1456883 | 98.27 |
14008 | 0.94 | 1470891 | 99.21 |
111 | 0.01 | 1471002 | 99.22 |
731 | 0.05 | 1471733 | 99.27 |
1635 | 0.11 | 1473368 | 99.38 |
678 | 0.05 | 1474046 | 99.43 |
602 | 0.04 | 1474648 | 99.47 |
7895 | 0.53 | 1482543 | 100.00 |
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 68 69 proc freq data=work.tedsa_puf_2021; 70 tables sub1; 71 run; NOTE: There were 1482543 observations read from the data set WORK.TEDSA_PUF_2021. NOTE: PROCEDURE FREQ used (Total process time): real time 1.27 seconds user cpu time 1.21 seconds system cpu time 0.08 seconds memory 3527.18k OS Memory 24744.00k Timestamp 02/13/2024 12:50:32 AM Step Count 40 Switch Count 3 Page Faults 0 Page Reclaims 488 Page Swaps 0 Voluntary Context Switches 13 Involuntary Context Switches 2 Block Input Operations 0 Block Output Operations 272 72 proc freq data=tedsa_puf_2021; 73 tables sub1; 74 format sub1; 75 run; NOTE: There were 1482543 observations read from the data set WORK.TEDSA_PUF_2021. NOTE: PROCEDURE FREQ used (Total process time): real time 1.25 seconds user cpu time 1.18 seconds system cpu time 0.07 seconds memory 2778.93k OS Memory 24744.00k Timestamp 02/13/2024 12:50:34 AM Step Count 41 Switch Count 9 Page Faults 0 Page Reclaims 403 Page Swaps 0 Voluntary Context Switches 32 Involuntary Context Switches 3 Block Input Operations 0 Block Output Operations 280 76 77 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 87
Good job running the program I requested. The results are a bit difficult to read. Do they appear the same before you posted them here?
You have two tables, with the same numbers in them. That is good. The idea now is to take find the match between the displayed values, such as "Other tranqualizers" and the codes that the data uses to represent those values. Compare the two tables, and see if it's obvious which codes match which categories.
If the two tables can't be compared in that way, we'll devise another way. But for now see if it is obvious to you how to match up the codes and the categories.
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 68 69 proc cimport 70 71 infile="/home/u63544247/curiosity cup/TEDS_A_2021-data-sas.stc" lib=WORK; 72 73 run; NOTE: PROC CIMPORT begins to create/update catalog WORK.FORMATS NOTE: Entry ADMYR.FORMAT has been imported. NOTE: Entry AGE.FORMAT has been imported. NOTE: Entry ALCD.FORMAT has been imported. NOTE: Entry ARRESTS.FORMAT has been imported. NOTE: Entry CBSA.FORMAT has been imported. NOTE: Entry DAYWAIT.FORMAT has been imported. NOTE: Entry DETCRIM.FORMAT has been imported. NOTE: Entry DETNLF.FORMAT has been imported. NOTE: Entry DIVISION.FORMAT has been imported. NOTE: Entry DSMCRIT.FORMAT has been imported. NOTE: Entry EDUC.FORMAT has been imported. NOTE: Entry EMPLOY.FORMAT has been imported. NOTE: Entry ETH.FORMAT has been imported. NOTE: Entry FLAG.FORMAT has been imported. NOTE: Entry FREQ.FORMAT has been imported. NOTE: Entry FRSTUSE.FORMAT has been imported. NOTE: Entry GENDER.FORMAT has been imported. NOTE: Entry HLTHINS.FORMAT has been imported. NOTE: Entry IDU.FORMAT has been imported. NOTE: Entry LIVARAG.FORMAT has been imported. NOTE: Entry MARSTAT.FORMAT has been imported. NOTE: Entry METHUSE.FORMAT has been imported. NOTE: Entry NOPRIOR.FORMAT has been imported. NOTE: Entry PREG.FORMAT has been imported. NOTE: Entry PRIMINC.FORMAT has been imported. NOTE: Entry PRIMPAY.FORMAT has been imported. NOTE: Entry PSOURCE.FORMAT has been imported. NOTE: Entry PSYPROB.FORMAT has been imported. NOTE: Entry RACE.FORMAT has been imported. NOTE: Entry REGION.FORMAT has been imported. NOTE: Entry ROUTE.FORMAT has been imported. NOTE: Entry SELFHELP.FORMAT has been imported. NOTE: Entry SERVSETA.FORMAT has been imported. NOTE: Entry STFIPS.FORMAT has been imported. NOTE: Entry SUB.FORMAT has been imported. NOTE: Entry VET.FORMAT has been imported. NOTE: Total number of entries processed in catalog WORK.FORMATS: 36 NOTE: PROC CIMPORT begins to create/update data set WORK.TEDSA_PUF_2021 NOTE: Data set contains 62 variables and 1482543 observations. Logical record length is 496 NOTE: Compressing data set WORK.TEDSA_PUF_2021 decreased size by 60.42 percent. Compressed is 2223 pages; un-compressed would require 5616 pages. NOTE: PROCEDURE CIMPORT used (Total process time): real time 6.64 seconds user cpu time 6.35 seconds system cpu time 0.28 seconds memory 2420.28k OS Memory 25252.00k Timestamp 02/13/2024 04:13:37 AM Step Count 79 Switch Count 39 Page Faults 0 Page Reclaims 324 Page Swaps 0 Voluntary Context Switches 247 Involuntary Context Switches 6 Block Input Operations 0 Block Output Operations 569704 74 Proc contents data=work.tedsa_puf_2021; 75 NOTE: PROCEDURE CONTENTS used (Total process time): real time 0.06 seconds user cpu time 0.06 seconds system cpu time 0.00 seconds memory 3713.53k OS Memory 25256.00k Timestamp 02/13/2024 04:13:37 AM Step Count 80 Switch Count 0 Page Faults 0 Page Reclaims 383 Page Swaps 0 Voluntary Context Switches 4 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 48 76 proc freq data=work.tedsa_puf_2021; 77 tables SUB1*GENDER; 78 run; NOTE: There were 1482543 observations read from the data set WORK.TEDSA_PUF_2021. NOTE: PROCEDURE FREQ used (Total process time): real time 1.39 seconds user cpu time 1.30 seconds system cpu time 0.10 seconds memory 3293.81k OS Memory 25772.00k Timestamp 02/13/2024 04:13:38 AM Step Count 81 Switch Count 23 Page Faults 0 Page Reclaims 471 Page Swaps 0 Voluntary Context Switches 71 Involuntary Context Switches 1 Block Input Operations 0 Block Output Operations 560 79 proc freq data=work.tedsa_puf_2021; 80 tables SUB1*Region; NOTE: There were 1482543 observations read from the data set WORK.TEDSA_PUF_2021. NOTE: PROCEDURE FREQ used (Total process time): real time 1.41 seconds user cpu time 1.34 seconds system cpu time 0.07 seconds memory 3133.78k OS Memory 25772.00k Timestamp 02/13/2024 04:13:40 AM Step Count 82 Switch Count 8 Page Faults 0 Page Reclaims 470 Page Swaps 0 Voluntary Context Switches 35 Involuntary Context Switches 1 Block Input Operations 0 Block Output Operations 576 81 proc format; 82 value mycat 83 "Alcohol", "Barbiturates", "Other Tranquilizers" = "Combined ABT" ; ERROR: The quoted string 'Alcohol' is not acceptable to a numeric format or informat. NOTE: The previous statement has been deleted. 84 run; NOTE: PROCEDURE FORMAT used (Total process time): real time 0.00 seconds user cpu time 0.00 seconds system cpu time 0.00 seconds memory 230.84k OS Memory 23712.00k Timestamp 02/13/2024 04:13:40 AM Step Count 83 Switch Count 0 Page Faults 0 Page Reclaims 14 Page Swaps 0 Voluntary Context Switches 0 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 0 NOTE: The SAS System stopped processing this step because of errors. 85 proc freq data=work.tedsa_puf_2021; 86 tables sub1 * gender; 87 format sub1 mycat.; 88 run; NOTE: There were 1482543 observations read from the data set WORK.TEDSA_PUF_2021. NOTE: PROCEDURE FREQ used (Total process time): real time 1.39 seconds user cpu time 1.30 seconds system cpu time 0.09 seconds memory 3068.93k OS Memory 25772.00k Timestamp 02/13/2024 04:13:41 AM Step Count 84 Switch Count 4 Page Faults 0 Page Reclaims 452 Page Swaps 0 Voluntary Context Switches 21 Involuntary Context Switches 2 Block Input Operations 0 Block Output Operations 560 89 90 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 100
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Ready to level-up your skills? Choose your own adventure.