Solved: Macro - Automatic Combination

JackoNewbie · Posted 06-01-2020 11:03 AM

Hi,

I have a core dataset which I use to sum up the data for my assignments. The code I use currently is good for pulling out information on students who have the same first name, their age groups and the total number.

Core dataset:

Forename Surname Age

Tom Mayo 32

Rupi Ketchup 39

Harry Mustard 20

etc etc.

How I currently use the data:

Forename Age<21 Age>21 Total

Tom 8 10 18

Rupi 7 13 20

Harry 14 3 17

Want:

Forename Age<21 Age>21 Total

TomRupi 15 23 38

etc etc

How would I write piece of code (macro i think) which would automatically work out the Age groups and total combining two students together? e.g. Tom and Rupi, Tom and Harry, Harry and Rupi etc. The key thing I want to do here is automate the code so I would not have to type the combination myself in the code. SAS would ideally look at variable 1 and variable 2 then variable 1 and variable 3 etc. To keep it simple-ish the three Forenames in the core dataset are Tom, Rupi and Harry.

Really sorry if this is too vague. I am happy to provide more information before any help is given.

Thanks

Jacko

ChrisNZ · Posted 06-02-2020 03:01 AM

This works:

data HAVE;
input FORENAME $ SURNAME $ AGE ;
cards;
Tom Mayo 32
Rupi Ketchup 39
Harry Mustard 20
run;

proc sql;
  create table WANT as
  select  
        catx('-',a.FORENAME, b.FORENAME)
      , sum((a.AGE<22)+(b.AGE<22)) as AGEUNDER21
      , sum((a.AGE>21)+(b.AGE>21)) as AGEOVER21
      , sum(calculated AGEUNDER21, calculated AGEOVER21) as TOTAL
  from HAVE a, HAVE b
  where a.FORENAME > b.FORENAME
  group by 1;
quit;

	AGEUNDER21	AGEOVER21	TOTAL
Rupi-Harry	1	1	2
Tom-Harry	1	1	2
Tom-Rupi	0	2	2

High-Performance SAS Coding - Third Edition

View solution in original post

ballardw · Posted 06-01-2020 11:21 AM

You actually want to combine names??? Why?

HOW are you currently creating that output? And are you looking for a data set for further processing or a report for people to read?

How do you want to indicate which values to pull?

Variable1 and variable2, Variable1 and Variable3??? Please actually use variable names.

Reeza · Posted 06-01-2020 11:24 AM

Can you please post a full example? Include your input data and EXACTLY what you expect as output from that input data.

I suspect generating a custom multilabel format is the quickest solution here. You can use PROC MEANS/TABULATE to get the output.

1. Generate a format for age

2. Generate a format for names, a multilabel format with each 2 way combination of names

3. Use PROC TABULATE to apply the formats and calculate your summaries.

You don't need a macro for this.

Sorry, don't have time to fully work through this, but here's a start. Pretty sure the format part is right, but not the tabulate part.

proc format;
value age_fmt
low - 14 = '<14'
14 - high = '14+';
run;

proc sql;
create table name_list_fmt as
select t1.name as start, catx(" - ", t1.name,  t2.name) as label, 'C' as type, 'M' as HLO, "name_fmt" as fmtname
from sashelp.class as t1, sashelp.class as t2
where t1.name>t2.name;
quit;

proc format cntlin=name_list_fmt;
run;

proc tabulate data=sashelp.class;
class age;
class name / mlf;
format age age_fmt. name $name_fmt.;
table name, age*N / printmiss;
run;

Reeza · Posted 06-01-2020 11:32 AM

@JackoNewbie wrote:

Hi,

I have a core dataset which I use to sum up the data for my assignments. The code I use currently is good for pulling out information on students who have the same first name, their age groups and the total number.

Core dataset:

Forename Surname Age

Tom Mayo 32

Rupi Ketchup 39

Harry Mustard 20

etc etc.

The key thing I want to do here is automate the code so I would not have to type the combination myself in the code. SAS would ideally look at variable 1 and variable 2 then variable 1 and variable 3 etc. To keep it simple-ish the three Forenames in the core dataset are Tom, Rupi and Harry.

You've only shown name and age and how they're used. Where do these other variables come from? How are they related?

JackoNewbie · Posted 06-02-2020 02:20 AM

So sorry.

The surname doesn't come into this as it is just part of my core dataset.

data person;
   infile datalines delimiter=','; 
   input Forename $ Surname $ Age;
   datalines;                      
Tom,Mayo,32
Rupi,Ketchup,39
Harry,Mustard,20
Tom,Blog,8
Rupi,Juniper,14
Harry,Potter,18
Tom,Peach,12
Rupi,Apple,49
Harry,Maguire,1

;

proc sql;
create table final_assignment as
select 
distinct forename,
sum(case when Age < 22 then 1 else 0 end) as AgeUnder21,
sum(case when Age > 21 then 1 else 0 end) as AgeOver21,
count (age) as total

from person
group by 1
;quit;

Want:

Forenames		AgeUnder21		AgeOver21		Total
Harry - Rupi			4				2			6
Harry - Tom			5				1			6
Rupi - Tom			3				3			6

ChrisNZ · Posted 06-02-2020 03:01 AM

This works:

data HAVE;
input FORENAME $ SURNAME $ AGE ;
cards;
Tom Mayo 32
Rupi Ketchup 39
Harry Mustard 20
run;

proc sql;
  create table WANT as
  select  
        catx('-',a.FORENAME, b.FORENAME)
      , sum((a.AGE<22)+(b.AGE<22)) as AGEUNDER21
      , sum((a.AGE>21)+(b.AGE>21)) as AGEOVER21
      , sum(calculated AGEUNDER21, calculated AGEOVER21) as TOTAL
  from HAVE a, HAVE b
  where a.FORENAME > b.FORENAME
  group by 1;
quit;

	AGEUNDER21	AGEOVER21	TOTAL
Rupi-Harry	1	1	2
Tom-Harry	1	1	2
Tom-Rupi	0	2	2

High-Performance SAS Coding - Third Edition

ballardw · Posted 06-02-2020 12:25 PM

And this bit shows exactly why I find the combined names to be a very questionable item for a report:

Forenames		AgeUnder21		AgeOver21		Total
Harry - Rupi			4				2			6
Harry - Tom			5				1			6
Rupi - Tom			3				3			6

Even occasional duplicates of the name appearing in a list makes it questionable on how to interpret the data.

I might make sense to use a unique personal identifier and combine those so it is somewhat more obvious that there data is not just plain inconsistent.

So, please describe exactly what helpful information the combine forenames adds to that report. Or how the output data will actually be used.

Your particular example above would really make me believe that "Harry-Rupi" should be a combined version of "Harry-Tom" and "Rupi-Tom" in some form.

Macro - Automatic Combination

Re: Macro - Automatic Combination

Re: Macro - Automatic Combination

Re: Macro - Automatic Combination

Re: Macro - Automatic Combination

Re: Macro - Automatic Combination

Re: Macro - Automatic Combination

Re: Macro - Automatic Combination

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away