So, what I'm asking is not exactly a transpose (from my knowledge) and I'm finding it very tricky! Any help will be immensely appreciated.
I have some sample data:
DATA Unclean ;
INFILE DATALINES ;
input ID VAR & $19. CLASSIFICATION $ ;
DATALINES;
201 PepsiCo XYZ
201 111 main street XYZ
201 Charity XYZ
331 CocaCola XYZ
331 1823 unicorn street XYZ
331 Charity XYZ
331 Nike XYZ
331 123 brock avenue XYZ
331 Charity XYZ
331 Adidas XYZ
331 14 Market street XYZ
331 Charity XYZ
354 McDonalds XYZ
354 111 Obesity road XYZ
354 Charity XYZ
;
RUN;
And my goal would be to get an output like this:
DATA CLEAN;
INFILE DATALINES ;
input ID Name & $10. Address & $19. Classification $ ;
DATALINES;
201 PepsiCo 111 main street Charity
331 CocaCola 1823 unicorn street Charity
331 Nike 123 brock avenue Charity
331 Adidas 14 Market street Charity
354 McDonalds 111 Obesity road Charity
;
RUN;
Specifically, there are a few challenges I am having.
1) The IDs have uneven companies attached to them
2) The value "XYZ" needs to be changed to reflect things like Name, Address, and Classification.
Regarding 2) My fix has been that whenever we see an observation "Charity", the next observation is always a "Name".
Is there another way, more flexible way to approach this?
3) What would be the best function or command to achieve this?
Any help is very much appreciated!
You should have placed a SAMPLE Data Set that represent your real Data SET. I have given a solution based on your sample Data Set.
Now you say there will be varying number of records - 2 records instead of 3. I hope you don't come with newer issues with your original Data Set.
The solution is based on first creating triplets with adding "NONE" when "Charity" is missing. Next the same program without using LAST.ID. We can simply check for either "NONE" or "Charity" to output a record at a time.
data source;
INFILE DATALINES ;
input ID VAR & $19. CLASSIFICATION $ ;
DATALINES;
201 PepsiCo XYZ
201 111 main street XYZ
201 Charity XYZ
331 CocaCola XYZ
331 1823 unicorn street XYZ
331 Charity XYZ
331 Nike XYZ
331 123 brock avenue XYZ
331 Charity XYZ
331 Adidas XYZ
331 14 Market street XYZ
331 Charity XYZ
354 McDonalds XYZ
354 111 Obesity road XYZ
354 Charity XYZ
354 KFC XYZ
354 111 Fat Avenue XYZ
441 Burger King XYZ
441 112 Skinny ROAD XYZ
441 Adidas XYZ
441 Socks Blvd XYZ
441 Tiege Henley XYZ
441 Skincare Road XYZ
441 Charity XYZ
;
RUN;
proc print data = source;
run;
data have;
set source;
if Var = "Charity" then i = 0;
else i+1;
if i = 3 and Var ^= "Charity" then do;
temp = Var;
Var = "NONE";
output;
Var = temp;
output;
temp =' ';
i = 1;
end;
else output;
drop temp i;
run;
proc print data = have;
run;
The Data Set, HAVE, will have triplets either with "NONE" or "Charity".
The next the old program without LAST.ID checks.
data clean;
length ID 8 Name $10 Address $19 Classification $8 Var $19;
do i = 1 by 1 until(VAR in ("NONE","Charity"));
set have;
select(mod(i,3));
when(1) Name=Var;
when(2) Address = Var;
when(0) Classification = Var;
otherwise;
end;
end;
drop i Var;
run;
proc print data = clean;
run;
this link should assist you with understanding how to transpose data:
https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-long-to-wide-using-proc-transpose/
It looks like the x = row 1, y = row 2, z = row 3
output your record.
start you count over for x,y.z;
basically looks like 3 records per final record.
yes you may need to hold record 1 and record 2 and output after obtaining record 3
proc transpose data=have out=want(drop=_name_);
by id;
var var;
run;
you will need to pass the var-names to the columns
For now the adding the correct var names are not too important, so I can address that later.
But using a PROC TRANSPOSE gives me something like this:
It fails to re-iterate values for ID 331, instead just lumping them into the same row.
you need to clean your data as @Reeza has noted above first. Then use the proc transpose.
with cleaned up data based on @Reeza datastep
data source;
INFILE DATALINES ;
input ID VAR & $19. CLASSIFICATION $ ;
DATALINES;
201 PepsiCo XYZ
201 111 main street XYZ
201 Charity XYZ
331 CocaCola XYZ
331 1823 unicorn street XYZ
331 Charity XYZ
331 Nike XYZ
331 123 brock avenue XYZ
331 Charity XYZ
331 Adidas XYZ
331 14 Market street XYZ
331 Charity XYZ
354 McDonalds XYZ
354 111 Obesity road XYZ
354 Charity XYZ
;
RUN;
Data have;
length class $7;
Set source;
X=mod(_n_, 3);
If x=1 then class="name";
Else if x =2 then class = "address";
Else if x=0 then class = "type";
Retain record;
If x=1 then record+1;
Run;
Proc transpose data=have(drop=record _name_) out=want;
By Id record;
Var var;
Id class;
Run;
I suppose you don't insist on PROC TRANSPOSE. In one Data Step you will get the required output. Assumed that the Data Set is sorted by ID.
The sequential records provide one of the three Variables where LASTID can be used(except for ID = 331). For ID = 331, the alternate check with "Charity" breaks them into several records. The use of MOD(i,3) picks the relevant Value for Variables.
data clean;
length ID 8 Name $10 Address $19 Classification $8 Var $19;
do i = 1 by 1 until(last.id | Var = 'Charity');
set unclean;
by id notsorted;
select(mod(i,3));
when(1) Name=Var;
when(2) Address = Var;
when(0) Classification = Var;
otherwise;
end;
end;
drop i Var;
run;
Hey datasp,
So one thing I just realised was that the code is not quite flexible enough; I was wondering if you had a fix.
Some of my data is organised in a way that the first 1/4 might have the recurring pattern for 9 rows, the 2nd quarter for 8 rows, the third quarter for 9 rows again and the final quarter for 7 rows.
I understand it is a little vague since I don't have a sample of the dataset here (which I can provide if it's needed), but do you have an alternate strategy for the case above?
Would it my best bet be to break the data into separate pieces and then merge it again later, or is there another way?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.