Hi,
I have a core dataset which I use to sum up the data for my assignments. The code I use currently is good for pulling out information on students who have the same first name, their age groups and the total number.
Core dataset:
Forename Surname Age
Tom Mayo 32
Rupi Ketchup 39
Harry Mustard 20
etc etc.
How I currently use the data:
Forename Age<21 Age>21 Total
Tom 8 10 18
Rupi 7 13 20
Harry 14 3 17
Want:
Forename Age<21 Age>21 Total
TomRupi 15 23 38
etc etc
How would I write piece of code (macro i think) which would automatically work out the Age groups and total combining two students together? e.g. Tom and Rupi, Tom and Harry, Harry and Rupi etc. The key thing I want to do here is automate the code so I would not have to type the combination myself in the code. SAS would ideally look at variable 1 and variable 2 then variable 1 and variable 3 etc. To keep it simple-ish the three Forenames in the core dataset are Tom, Rupi and Harry.
Really sorry if this is too vague. I am happy to provide more information before any help is given.
Thanks
Jacko
This works:
data HAVE;
input FORENAME $ SURNAME $ AGE ;
cards;
Tom Mayo 32
Rupi Ketchup 39
Harry Mustard 20
run;
proc sql;
create table WANT as
select
catx('-',a.FORENAME, b.FORENAME)
, sum((a.AGE<22)+(b.AGE<22)) as AGEUNDER21
, sum((a.AGE>21)+(b.AGE>21)) as AGEOVER21
, sum(calculated AGEUNDER21, calculated AGEOVER21) as TOTAL
from HAVE a, HAVE b
where a.FORENAME > b.FORENAME
group by 1;
quit;
You actually want to combine names??? Why?
HOW are you currently creating that output? And are you looking for a data set for further processing or a report for people to read?
How do you want to indicate which values to pull?
Variable1 and variable2, Variable1 and Variable3??? Please actually use variable names.
Can you please post a full example? Include your input data and EXACTLY what you expect as output from that input data.
I suspect generating a custom multilabel format is the quickest solution here. You can use PROC MEANS/TABULATE to get the output.
1. Generate a format for age
2. Generate a format for names, a multilabel format with each 2 way combination of names
3. Use PROC TABULATE to apply the formats and calculate your summaries.
You don't need a macro for this.
Sorry, don't have time to fully work through this, but here's a start. Pretty sure the format part is right, but not the tabulate part.
proc format;
value age_fmt
low - 14 = '<14'
14 - high = '14+';
run;
proc sql;
create table name_list_fmt as
select t1.name as start, catx(" - ", t1.name, t2.name) as label, 'C' as type, 'M' as HLO, "name_fmt" as fmtname
from sashelp.class as t1, sashelp.class as t2
where t1.name>t2.name;
quit;
proc format cntlin=name_list_fmt;
run;
proc tabulate data=sashelp.class;
class age;
class name / mlf;
format age age_fmt. name $name_fmt.;
table name, age*N / printmiss;
run;
@JackoNewbie wrote:
Hi,
I have a core dataset which I use to sum up the data for my assignments. The code I use currently is good for pulling out information on students who have the same first name, their age groups and the total number.
Core dataset:
Forename Surname Age
Tom Mayo 32
Rupi Ketchup 39
Harry Mustard 20
etc etc.
The key thing I want to do here is automate the code so I would not have to type the combination myself in the code. SAS would ideally look at variable 1 and variable 2 then variable 1 and variable 3 etc. To keep it simple-ish the three Forenames in the core dataset are Tom, Rupi and Harry.
You've only shown name and age and how they're used. Where do these other variables come from? How are they related?
So sorry.
The surname doesn't come into this as it is just part of my core dataset.
data person; infile datalines delimiter=','; input Forename $ Surname $ Age; datalines; Tom,Mayo,32 Rupi,Ketchup,39 Harry,Mustard,20 Tom,Blog,8 Rupi,Juniper,14 Harry,Potter,18 Tom,Peach,12 Rupi,Apple,49 Harry,Maguire,1 ; proc sql; create table final_assignment as select distinct forename, sum(case when Age < 22 then 1 else 0 end) as AgeUnder21, sum(case when Age > 21 then 1 else 0 end) as AgeOver21, count (age) as total from person group by 1 ;quit; Want: Forenames AgeUnder21 AgeOver21 Total Harry - Rupi 4 2 6 Harry - Tom 5 1 6 Rupi - Tom 3 3 6
This works:
data HAVE;
input FORENAME $ SURNAME $ AGE ;
cards;
Tom Mayo 32
Rupi Ketchup 39
Harry Mustard 20
run;
proc sql;
create table WANT as
select
catx('-',a.FORENAME, b.FORENAME)
, sum((a.AGE<22)+(b.AGE<22)) as AGEUNDER21
, sum((a.AGE>21)+(b.AGE>21)) as AGEOVER21
, sum(calculated AGEUNDER21, calculated AGEOVER21) as TOTAL
from HAVE a, HAVE b
where a.FORENAME > b.FORENAME
group by 1;
quit;
And this bit shows exactly why I find the combined names to be a very questionable item for a report:
Forenames AgeUnder21 AgeOver21 Total Harry - Rupi 4 2 6 Harry - Tom 5 1 6 Rupi - Tom 3 3 6
Even occasional duplicates of the name appearing in a list makes it questionable on how to interpret the data.
I might make sense to use a unique personal identifier and combine those so it is somewhat more obvious that there data is not just plain inconsistent.
So, please describe exactly what helpful information the combine forenames adds to that report. Or how the output data will actually be used.
Your particular example above would really make me believe that "Harry-Rupi" should be a combined version of "Harry-Tom" and "Rupi-Tom" in some form.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.