BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Jacob6
Calcite | Level 5

Hi everyone,

I am struggling to combine values within the SUB1 variable. I want to combine alcohol, barbiturates and tranquilizers. into a new variable. I'm using SAS on Demand

Thank you

 

Jacob6_0-1707539693681.pngJacob6_1-1707539749897.pngJacob6_2-1707539847082.png

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

I would probably hard code it, since you don't seem super familiar with formats/recoding.

 

data tedsa_puf_2021_combined;
set tedsa_puf_2021;


length sub1_combined$100.;

if sub1 in (2, 14, 15) then sub1_combined = 'Alcohol, Barbiturates, Other Tranquilizers';
else sub1_combined= put(sub1, sub.);
run;

proc freq data=tedsa_puf_2021_combined;
table sub1_combined*region;
run;

 

View solution in original post

32 REPLIES 32
Astounding
PROC Star
Here's an approach that assumes SUB1 is a character variable. If that's not the case, we'll need more details about what is actually in your data.

Don't change the data. Instead, create a format that tells PROC FREQ how to combine categories. For example you might code:
proc format;
value $mycat "Alcohol", "Barbiturates", "Other Tranquilizers" = "Combined ABT" :
run;

Type in the full list of any values that should be combined. Call the combined category anything you would like. "Combined ABT" is just an arbitrary name that i invented.

Spelling is crucial. Capitalization is crucial.

Once this is ready, group the data by adding the following statement after each TABLE statement:

format sub1 $mycat ;
PaigeMiller
Diamond | Level 26

The suggestion from @Astounding is the best path forward here. Excellent advice.

--
Paige Miller
Jacob6
Calcite | Level 5

Jacob6_0-1707589724551.png

I got this error 

Jacob6_1-1707589839007.png

This is my data set

Tom
Super User Tom
Super User

You cannot put PART of a VALUE statement from PROC FORMAT into the middle of DATA step.  And even worse into the middle of the DATA statement of that step.

 

You have to run the whole PROC FORMAT step separately from the DATA step or the PROC FREQ step.

Jacob6
Calcite | Level 5

May i have an example of this please?

SASKiwi
PROC Star

Run the code as provided by @Astounding :

proc format;
  value $mycat 
  "Alcohol", "Barbiturates", "Other Tranquilizers" = "Combined ABT" ;
run;
Jacob6
Calcite | Level 5

Jacob6_0-1707692241802.png

so the log has no errors but the results are unchanged.

PaigeMiller
Diamond | Level 26

The first post by @Astounding explains that you have to modify PROC FREQ and gives the specific command to do so.

--
Paige Miller
Astounding
PROC Star

The log here shows that you ran PROC FORMAT successfully.  Having done that, the next step would be to actually apply the format you have created.  For example:

proc freq data=work.tedsa_puf_2021;
   tables sub1 * gender;
   format sub1 $mycat.;
run;

Try that, and see if it starts to make sense.

Jacob6
Calcite | Level 5
1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 68         
 69         proc cimport
 70         
 71            infile="/home/u63544247/curiosity cup/TEDS_A_2021-data-sas.stc" lib=WORK;
 72         
 73         run;
 
 NOTE: PROC CIMPORT begins to create/update catalog WORK.FORMATS
 NOTE: Entry ADMYR.FORMAT has been imported.
 NOTE: Entry AGE.FORMAT has been imported.
 NOTE: Entry ALCD.FORMAT has been imported.
 NOTE: Entry ARRESTS.FORMAT has been imported.
 NOTE: Entry CBSA.FORMAT has been imported.
 NOTE: Entry DAYWAIT.FORMAT has been imported.
 NOTE: Entry DETCRIM.FORMAT has been imported.
 NOTE: Entry DETNLF.FORMAT has been imported.
 NOTE: Entry DIVISION.FORMAT has been imported.
 NOTE: Entry DSMCRIT.FORMAT has been imported.
 NOTE: Entry EDUC.FORMAT has been imported.
 NOTE: Entry EMPLOY.FORMAT has been imported.
 NOTE: Entry ETH.FORMAT has been imported.
 NOTE: Entry FLAG.FORMAT has been imported.
 NOTE: Entry FREQ.FORMAT has been imported.
 NOTE: Entry FRSTUSE.FORMAT has been imported.
 NOTE: Entry GENDER.FORMAT has been imported.
 NOTE: Entry HLTHINS.FORMAT has been imported.
 NOTE: Entry IDU.FORMAT has been imported.
 NOTE: Entry LIVARAG.FORMAT has been imported.
 NOTE: Entry MARSTAT.FORMAT has been imported.
 NOTE: Entry METHUSE.FORMAT has been imported.
 NOTE: Entry NOPRIOR.FORMAT has been imported.
 NOTE: Entry PREG.FORMAT has been imported.
 NOTE: Entry PRIMINC.FORMAT has been imported.
 NOTE: Entry PRIMPAY.FORMAT has been imported.
 NOTE: Entry PSOURCE.FORMAT has been imported.
 NOTE: Entry PSYPROB.FORMAT has been imported.
 NOTE: Entry RACE.FORMAT has been imported.
 NOTE: Entry REGION.FORMAT has been imported.
 NOTE: Entry ROUTE.FORMAT has been imported.
 NOTE: Entry SELFHELP.FORMAT has been imported.
 NOTE: Entry SERVSETA.FORMAT has been imported.
 NOTE: Entry STFIPS.FORMAT has been imported.
 NOTE: Entry SUB.FORMAT has been imported.
 NOTE: Entry VET.FORMAT has been imported.
 NOTE: Total number of entries processed in catalog WORK.FORMATS: 36
 
 NOTE: PROC CIMPORT begins to create/update data set WORK.TEDSA_PUF_2021
 NOTE: Data set contains 62 variables and 1482543 observations. 
       Logical record length is 496
 
 NOTE: Compressing data set WORK.TEDSA_PUF_2021 decreased size by 60.42 percent. 
       Compressed is 2223 pages; un-compressed would require 5616 pages.
 NOTE: PROCEDURE CIMPORT used (Total process time):
       real time           6.69 seconds
       user cpu time       6.46 seconds
       system cpu time     0.23 seconds
       memory              2467.43k
       OS Memory           22692.00k
       Timestamp           02/12/2024 07:30:55 PM
       Step Count                        24  Switch Count  39
       Page Faults                       0
       Page Reclaims                     695
       Page Swaps                        0
       Voluntary Context Switches        249
       Involuntary Context Switches      19
       Block Input Operations            0
       Block Output Operations           569480
       
 
 74         Proc contents data=work.tedsa_puf_2021;
 75         
 
 NOTE: PROCEDURE CONTENTS used (Total process time):
       real time           0.06 seconds
       user cpu time       0.06 seconds
       system cpu time     0.01 seconds
       memory              4030.59k
       OS Memory           23720.00k
       Timestamp           02/12/2024 07:30:55 PM
       Step Count                        25  Switch Count  0
       Page Faults                       0
       Page Reclaims                     1043
       Page Swaps                        0
       Voluntary Context Switches        3
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           40
       
 
 76         proc freq data=work.tedsa_puf_2021;
 77         tables SUB1*GENDER;
 78         run;
 
 NOTE: There were 1482543 observations read from the data set WORK.TEDSA_PUF_2021.
 NOTE: PROCEDURE FREQ used (Total process time):
       real time           1.41 seconds
       user cpu time       1.33 seconds
       system cpu time     0.09 seconds
       memory              3366.31k
       OS Memory           24748.00k
       Timestamp           02/12/2024 07:30:56 PM
       Step Count                        26  Switch Count  20
       Page Faults                       0
       Page Reclaims                     739
       Page Swaps                        0
       Voluntary Context Switches        66
       Involuntary Context Switches      3
       Block Input Operations            0
       Block Output Operations           568
       
 
 79         proc freq data=work.tedsa_puf_2021;
 80         tables SUB1*Region;
 
 NOTE: There were 1482543 observations read from the data set WORK.TEDSA_PUF_2021.
 NOTE: PROCEDURE FREQ used (Total process time):
       real time           1.50 seconds
       user cpu time       1.43 seconds
       system cpu time     0.07 seconds
       memory              3086.84k
       OS Memory           24748.00k
       Timestamp           02/12/2024 07:30:58 PM
       Step Count                        27  Switch Count  9
       Page Faults                       0
       Page Reclaims                     469
       Page Swaps                        0
       Voluntary Context Switches        36
       Involuntary Context Switches      6
       Block Input Operations            0
       Block Output Operations           576
       
 
 81         proc format;
 82         value $mycat
 83          "Alcohol", "Barbiturates", "Other Tranquilizers" = "Combined ABT" ;
 NOTE: Format $MYCAT has been output.
 84         run;
 
 NOTE: PROCEDURE FORMAT used (Total process time):
       real time           0.00 seconds
       user cpu time       0.00 seconds
       system cpu time     0.00 seconds
       memory              306.87k
       OS Memory           22688.00k
       Timestamp           02/12/2024 07:30:58 PM
       Step Count                        28  Switch Count  0
       Page Faults                       0
       Page Reclaims                     66
       Page Swaps                        0
       Voluntary Context Switches        0
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           0
       
 
 85         proc freq data=work.tedsa_puf_2021;
 86            tables sub1 * gender;
 87            format sub1 $mycat.;
 ERROR: You are trying to use the character format $MYCAT with the numeric variable SUB1 in data set WORK.TEDSA_PUF_2021.
 88         run;
 
 NOTE: The SAS System stopped processing this step because of errors.
 NOTE: PROCEDURE FREQ used (Total process time):
       real time           0.00 seconds
       user cpu time       0.00 seconds
       system cpu time     0.00 seconds
       memory              2383.37k
       OS Memory           24228.00k
       Timestamp           02/12/2024 07:30:58 PM
       Step Count                        29  Switch Count  0
       Page Faults                       0
       Page Reclaims                     332
       Page Swaps                        0
       Voluntary Context Switches        0
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           0
       
 89         
 90         OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 100        

i get this error

Astounding
PROC Star
This type of error can easily occur when you import data. You don't really know what is in the data, and so you need to explore to find out.

Run this test as the initial exploratory step:
proc freq data=have;
tables sub1;
run;
proc freq data=have;
tables sub1;
format sub1;
run;
Of course you would not use the word "have" as your data set name. Use the actual name of your imported data set.
Jacob6
Calcite | Level 5

The FREQ Procedure

 Substance use (primary)SUB1 Frequency Percent CumulativeFrequency CumulativePercentMissing/unknown/not collected/invalidNoneAlcoholCocaine/crackMarijuana/hashishHeroinNon-prescription methadoneOther opiates and syntheticsPCPHallucinogensMethamphetamine/speedOther amphetaminesOther stimulantsBenzodiazepinesOther tranquilizersBarbituratesOther sedatives or hypnoticsInhalantsOver-the-counter medicationsOther drugs

21938014.8021938014.80
378712.5525725117.35
43975529.6669700647.01
704564.7576746251.77
1293438.7289680560.49
25540117.23115220677.72
15090.10115371577.82
1137647.67126747985.49
39110.26127139085.76
19890.13127337985.89
17022011.48144359997.37
106780.72145427798.09
26060.18145688398.27
140080.94147089199.21
1110.01147100299.22
7310.05147173399.27
16350.11147336899.38
6780.05147404699.43
6020.04147464899.47
78950.531482543100.00

The FREQ Procedure

 Substance use (primary)SUB1 Frequency Percent CumulativeFrequency CumulativePercent-912345678910111213141516171819

21938014.8021938014.80
378712.5525725117.35
43975529.6669700647.01
704564.7576746251.77
1293438.7289680560.49
25540117.23115220677.72
15090.10115371577.82
1137647.67126747985.49
39110.26127139085.76
19890.13127337985.89
17022011.48144359997.37
106780.72145427798.09
26060.18145688398.27
140080.94147089199.21
1110.01147100299.22
7310.05147173399.27
16350.11147336899.38
6780.05147404699.43
6020.04147464899.47
78950.531482543100.00
1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 68         
 69         proc freq data=work.tedsa_puf_2021;
 70         tables sub1;
 71         run;
 
 NOTE: There were 1482543 observations read from the data set WORK.TEDSA_PUF_2021.
 NOTE: PROCEDURE FREQ used (Total process time):
       real time           1.27 seconds
       user cpu time       1.21 seconds
       system cpu time     0.08 seconds
       memory              3527.18k
       OS Memory           24744.00k
       Timestamp           02/13/2024 12:50:32 AM
       Step Count                        40  Switch Count  3
       Page Faults                       0
       Page Reclaims                     488
       Page Swaps                        0
       Voluntary Context Switches        13
       Involuntary Context Switches      2
       Block Input Operations            0
       Block Output Operations           272
       
 
 72         proc freq data=tedsa_puf_2021;
 73         tables sub1;
 74         format sub1;
 75         run;
 
 NOTE: There were 1482543 observations read from the data set WORK.TEDSA_PUF_2021.
 NOTE: PROCEDURE FREQ used (Total process time):
       real time           1.25 seconds
       user cpu time       1.18 seconds
       system cpu time     0.07 seconds
       memory              2778.93k
       OS Memory           24744.00k
       Timestamp           02/13/2024 12:50:34 AM
       Step Count                        41  Switch Count  9
       Page Faults                       0
       Page Reclaims                     403
       Page Swaps                        0
       Voluntary Context Switches        32
       Involuntary Context Switches      3
       Block Input Operations            0
       Block Output Operations           280
       
 
 76         
 77         OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 87         
Astounding
PROC Star

Good job running the program I requested.  The results are a bit difficult to read.  Do they appear the same before you posted them here?  

 

You have two tables, with the same numbers in them.  That is good.  The idea now is to take find the match between the displayed values, such as "Other tranqualizers" and the codes that the data uses to represent those values.  Compare the two tables, and see if it's obvious which codes match which categories. 

 

If the two tables can't be compared in that way, we'll devise another way.  But for now see if it is obvious to you how to match up the codes and the categories.

Jacob6
Calcite | Level 5
1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 68         
 69         proc cimport
 70         
 71            infile="/home/u63544247/curiosity cup/TEDS_A_2021-data-sas.stc" lib=WORK;
 72         
 73         run;
 
 NOTE: PROC CIMPORT begins to create/update catalog WORK.FORMATS
 NOTE: Entry ADMYR.FORMAT has been imported.
 NOTE: Entry AGE.FORMAT has been imported.
 NOTE: Entry ALCD.FORMAT has been imported.
 NOTE: Entry ARRESTS.FORMAT has been imported.
 NOTE: Entry CBSA.FORMAT has been imported.
 NOTE: Entry DAYWAIT.FORMAT has been imported.
 NOTE: Entry DETCRIM.FORMAT has been imported.
 NOTE: Entry DETNLF.FORMAT has been imported.
 NOTE: Entry DIVISION.FORMAT has been imported.
 NOTE: Entry DSMCRIT.FORMAT has been imported.
 NOTE: Entry EDUC.FORMAT has been imported.
 NOTE: Entry EMPLOY.FORMAT has been imported.
 NOTE: Entry ETH.FORMAT has been imported.
 NOTE: Entry FLAG.FORMAT has been imported.
 NOTE: Entry FREQ.FORMAT has been imported.
 NOTE: Entry FRSTUSE.FORMAT has been imported.
 NOTE: Entry GENDER.FORMAT has been imported.
 NOTE: Entry HLTHINS.FORMAT has been imported.
 NOTE: Entry IDU.FORMAT has been imported.
 NOTE: Entry LIVARAG.FORMAT has been imported.
 NOTE: Entry MARSTAT.FORMAT has been imported.
 NOTE: Entry METHUSE.FORMAT has been imported.
 NOTE: Entry NOPRIOR.FORMAT has been imported.
 NOTE: Entry PREG.FORMAT has been imported.
 NOTE: Entry PRIMINC.FORMAT has been imported.
 NOTE: Entry PRIMPAY.FORMAT has been imported.
 NOTE: Entry PSOURCE.FORMAT has been imported.
 NOTE: Entry PSYPROB.FORMAT has been imported.
 NOTE: Entry RACE.FORMAT has been imported.
 NOTE: Entry REGION.FORMAT has been imported.
 NOTE: Entry ROUTE.FORMAT has been imported.
 NOTE: Entry SELFHELP.FORMAT has been imported.
 NOTE: Entry SERVSETA.FORMAT has been imported.
 NOTE: Entry STFIPS.FORMAT has been imported.
 NOTE: Entry SUB.FORMAT has been imported.
 NOTE: Entry VET.FORMAT has been imported.
 NOTE: Total number of entries processed in catalog WORK.FORMATS: 36
 
 NOTE: PROC CIMPORT begins to create/update data set WORK.TEDSA_PUF_2021
 NOTE: Data set contains 62 variables and 1482543 observations. 
       Logical record length is 496
 
 NOTE: Compressing data set WORK.TEDSA_PUF_2021 decreased size by 60.42 percent. 
       Compressed is 2223 pages; un-compressed would require 5616 pages.
 NOTE: PROCEDURE CIMPORT used (Total process time):
       real time           6.64 seconds
       user cpu time       6.35 seconds
       system cpu time     0.28 seconds
       memory              2420.28k
       OS Memory           25252.00k
       Timestamp           02/13/2024 04:13:37 AM
       Step Count                        79  Switch Count  39
       Page Faults                       0
       Page Reclaims                     324
       Page Swaps                        0
       Voluntary Context Switches        247
       Involuntary Context Switches      6
       Block Input Operations            0
       Block Output Operations           569704
       
 
 74         Proc contents data=work.tedsa_puf_2021;
 75         
 
 NOTE: PROCEDURE CONTENTS used (Total process time):
       real time           0.06 seconds
       user cpu time       0.06 seconds
       system cpu time     0.00 seconds
       memory              3713.53k
       OS Memory           25256.00k
       Timestamp           02/13/2024 04:13:37 AM
       Step Count                        80  Switch Count  0
       Page Faults                       0
       Page Reclaims                     383
       Page Swaps                        0
       Voluntary Context Switches        4
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           48
       
 
 76         proc freq data=work.tedsa_puf_2021;
 77         tables SUB1*GENDER;
 78         run;
 
 NOTE: There were 1482543 observations read from the data set WORK.TEDSA_PUF_2021.
 NOTE: PROCEDURE FREQ used (Total process time):
       real time           1.39 seconds
       user cpu time       1.30 seconds
       system cpu time     0.10 seconds
       memory              3293.81k
       OS Memory           25772.00k
       Timestamp           02/13/2024 04:13:38 AM
       Step Count                        81  Switch Count  23
       Page Faults                       0
       Page Reclaims                     471
       Page Swaps                        0
       Voluntary Context Switches        71
       Involuntary Context Switches      1
       Block Input Operations            0
       Block Output Operations           560
       
 
 79         proc freq data=work.tedsa_puf_2021;
 80         tables SUB1*Region;
 
 NOTE: There were 1482543 observations read from the data set WORK.TEDSA_PUF_2021.
 NOTE: PROCEDURE FREQ used (Total process time):
       real time           1.41 seconds
       user cpu time       1.34 seconds
       system cpu time     0.07 seconds
       memory              3133.78k
       OS Memory           25772.00k
       Timestamp           02/13/2024 04:13:40 AM
       Step Count                        82  Switch Count  8
       Page Faults                       0
       Page Reclaims                     470
       Page Swaps                        0
       Voluntary Context Switches        35
       Involuntary Context Switches      1
       Block Input Operations            0
       Block Output Operations           576
       
 
 81         proc format;
 82           value mycat
 83           "Alcohol", "Barbiturates", "Other Tranquilizers" = "Combined ABT" ;
 ERROR: The quoted string 'Alcohol' is not acceptable to a numeric format or informat.
 NOTE: The previous statement has been deleted.
 84         run;
 
 NOTE: PROCEDURE FORMAT used (Total process time):
       real time           0.00 seconds
       user cpu time       0.00 seconds
       system cpu time     0.00 seconds
       memory              230.84k
       OS Memory           23712.00k
       Timestamp           02/13/2024 04:13:40 AM
       Step Count                        83  Switch Count  0
       Page Faults                       0
       Page Reclaims                     14
       Page Swaps                        0
       Voluntary Context Switches        0
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           0
       
 NOTE: The SAS System stopped processing this step because of errors.
 
 
 85         proc freq data=work.tedsa_puf_2021;
 86            tables sub1 * gender;
 87            format sub1 mycat.;
 88         run;
 
 NOTE: There were 1482543 observations read from the data set WORK.TEDSA_PUF_2021.
 NOTE: PROCEDURE FREQ used (Total process time):
       real time           1.39 seconds
       user cpu time       1.30 seconds
       system cpu time     0.09 seconds
       memory              3068.93k
       OS Memory           25772.00k
       Timestamp           02/13/2024 04:13:41 AM
       Step Count                        84  Switch Count  4
       Page Faults                       0
       Page Reclaims                     452
       Page Swaps                        0
       Voluntary Context Switches        21
       Involuntary Context Switches      2
       Block Input Operations            0
       Block Output Operations           560
       
 
 89         
 90         OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 100        

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 32 replies
  • 1540 views
  • 1 like
  • 7 in conversation