BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
JackoNewbie
Calcite | Level 5

Hi,

 

I have a core dataset which I use to sum up the data for my assignments. The code I use currently is good for pulling out information on students who have the same first name, their age groups and the total number.

 

Core dataset:

Forename    Surname           Age

Tom                   Mayo             32

Rupi                   Ketchup        39

Harry                 Mustard         20

etc etc.

 

How I currently use the data:

 

Forename         Age<21         Age>21         Total       

Tom                        8                   10               18

Rupi                        7                   13               20

Harry                      14                  3                17

 

Want:

Forename         Age<21         Age>21         Total   

TomRupi                15                 23               38

etc etc

 

How would I write piece of code (macro i think) which would automatically work out the Age groups and total combining two students together? e.g. Tom and Rupi, Tom and Harry, Harry and Rupi etc. The key thing I want to do here is automate the code so I would not have to type the combination myself in the code. SAS would ideally look at variable 1 and variable 2 then variable 1 and variable 3 etc. To keep it simple-ish the three Forenames in the core dataset are Tom, Rupi and Harry.

 

Really sorry if this is too vague. I am happy to provide more information before any help is given.

 

Thanks

Jacko

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

This works:

data HAVE;
input FORENAME $ SURNAME $ AGE ;
cards;
Tom Mayo 32
Rupi Ketchup 39
Harry Mustard 20
run;

proc sql;
  create table WANT as
  select  
        catx('-',a.FORENAME, b.FORENAME)
      , sum((a.AGE<22)+(b.AGE<22)) as AGEUNDER21
      , sum((a.AGE>21)+(b.AGE>21)) as AGEOVER21
      , sum(calculated AGEUNDER21, calculated AGEOVER21) as TOTAL
  from HAVE a, HAVE b
  where a.FORENAME > b.FORENAME
  group by 1;
quit;
  AGEUNDER21 AGEOVER21 TOTAL
Rupi-Harry 1 1 2
Tom-Harry 1 1 2
Tom-Rupi 0 2 2

 

View solution in original post

6 REPLIES 6
ballardw
Super User

You actually want to combine names??? Why?

 

HOW are you currently creating that output? And are you looking for a data set for further processing or a report for people to read?

How do you want to indicate which values to pull?

 

Variable1 and variable2, Variable1 and Variable3??? Please actually use variable names.

Reeza
Super User

Can you please post a full example? Include your input data and EXACTLY what you expect as output from that input data.

 

I suspect generating a custom multilabel format is the quickest solution here. You can use PROC MEANS/TABULATE to get the output. 

 

1. Generate a format for age

2. Generate a format for names, a multilabel format with each 2 way combination of names

3. Use PROC TABULATE to apply the formats and calculate your summaries.

 

You don't need a macro for this.

 

Sorry, don't have time to fully work through this, but here's a start. Pretty sure the format part is right, but not the tabulate part.

proc format;
value age_fmt
low - 14 = '<14'
14 - high = '14+';
run;

proc sql;
create table name_list_fmt as
select t1.name as start, catx(" - ", t1.name,  t2.name) as label, 'C' as type, 'M' as HLO, "name_fmt" as fmtname
from sashelp.class as t1, sashelp.class as t2
where t1.name>t2.name;
quit;

proc format cntlin=name_list_fmt;
run;

proc tabulate data=sashelp.class;
class age;
class name / mlf;
format age age_fmt. name $name_fmt.;
table name, age*N / printmiss;
run;

 

 

 

 

 

Reeza
Super User

@JackoNewbie wrote:

Hi,

 

I have a core dataset which I use to sum up the data for my assignments. The code I use currently is good for pulling out information on students who have the same first name, their age groups and the total number.

 

Core dataset:

Forename    Surname           Age

Tom                   Mayo             32

Rupi                   Ketchup        39

Harry                 Mustard         20

etc etc.

 

The key thing I want to do here is automate the code so I would not have to type the combination myself in the code. SAS would ideally look at variable 1 and variable 2 then variable 1 and variable 3 etc. To keep it simple-ish the three Forenames in the core dataset are Tom, Rupi and Harry.

 

 


You've only shown name and age and how they're used. Where do these other variables come from? How are they related?

JackoNewbie
Calcite | Level 5

So sorry. 

 

The surname doesn't come into this as it is just part of my core dataset. 

 

data person;
   infile datalines delimiter=','; 
   input Forename $ Surname $ Age;
   datalines;                      
Tom,Mayo,32
Rupi,Ketchup,39
Harry,Mustard,20
Tom,Blog,8
Rupi,Juniper,14
Harry,Potter,18
Tom,Peach,12
Rupi,Apple,49
Harry,Maguire,1

;

proc sql;
create table final_assignment as
select 
distinct forename,
sum(case when Age < 22 then 1 else 0 end) as AgeUnder21,
sum(case when Age > 21 then 1 else 0 end) as AgeOver21,
count (age) as total

from person
group by 1
;quit;

Want:

Forenames		AgeUnder21		AgeOver21		Total
Harry - Rupi			4				2			6
Harry - Tom			5				1			6
Rupi - Tom			3				3			6
ChrisNZ
Tourmaline | Level 20

This works:

data HAVE;
input FORENAME $ SURNAME $ AGE ;
cards;
Tom Mayo 32
Rupi Ketchup 39
Harry Mustard 20
run;

proc sql;
  create table WANT as
  select  
        catx('-',a.FORENAME, b.FORENAME)
      , sum((a.AGE<22)+(b.AGE<22)) as AGEUNDER21
      , sum((a.AGE>21)+(b.AGE>21)) as AGEOVER21
      , sum(calculated AGEUNDER21, calculated AGEOVER21) as TOTAL
  from HAVE a, HAVE b
  where a.FORENAME > b.FORENAME
  group by 1;
quit;
  AGEUNDER21 AGEOVER21 TOTAL
Rupi-Harry 1 1 2
Tom-Harry 1 1 2
Tom-Rupi 0 2 2

 

ballardw
Super User

And this bit shows exactly why I find the combined names to be a very questionable item for a report:

Forenames		AgeUnder21		AgeOver21		Total
Harry - Rupi			4				2			6
Harry - Tom			5				1			6
Rupi - Tom			3				3			6

Even occasional duplicates of the name appearing in a list makes it questionable on how to interpret the data.

I might make sense to use a unique personal identifier and combine those so it is somewhat more obvious that there data is not just plain inconsistent.

 

So, please describe exactly what helpful information the combine forenames adds to that report. Or how the output data will actually be used.

 

Your particular example above would really make me believe that "Harry-Rupi" should be a combined version of "Harry-Tom" and "Rupi-Tom" in some form.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 792 views
  • 4 likes
  • 4 in conversation