BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Demographer
Pyrite | Level 9

Hello,

I have this super huge crosstable (attached) in an Excel format that represents age/region (line) by sex/education (row). Is it possible to tranpose it quickly into a hierarchical format, such as (I do not need the total)

 

Region Age Sex Education Number
Agadez 0 Male Religious school XXX
Agadez 0 Male Kindergarden XXX
Agadez 0 Male CI XXX
..  
Niamey 95 Female CFDC XXX

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

Worked perfectly for me! I've attached the copy of the csv file I ran it on. Possibly your file is corrupt or has a different end of record marker.

 

Art, CEO, AnalystFinder.com

 

View solution in original post

13 REPLIES 13
Vish33
Lapis Lazuli | Level 10
Hi,

I am unable to download your file seems vey big, please post the input sample data with 10 lines of input data and how you need the output.. that helps other users as well to understand your requirement.
Demographer
Pyrite | Level 9

The first 10 lines would not be sufficient. Here is the file:

 

https://www.dropbox.com/s/c5oqybp05mpiatf/File.xlsx

 

 

Vish33
Lapis Lazuli | Level 10

this looks like created cross-tabulation report from existing data. first of all it's not structured data to transpose here and it would need  huge effort if you want to code this and import data from this Excel and transpose as you need. Try to find the source of this cross tab report that would be more helpful to you, if not found then you have to manually structure the data (at least column level) from excel then Import data in SAS.

Demographer
Pyrite | Level 9

Unfortunately, I do not have access to microdata. I'll try something with VBA.

ballardw
Super User

First step: SAVE to a delimited text file format. I might pick a less common delimiter such as a |

Second step. Write a data step to read the file.

If the file had any structure you can test first 1, 2, 3, 4 or how ever many needed to set rules for reading additional columns.

 

I have done this as the only surviving files were reports. It sucks. It is generally routine and tedious coding.

You might get some help with proc import to generate a an example data step to read the TEXT version of the file to let you know how many columns and what the widths might be. Use guessingrows=max with the proc import to have chance.

 

Good luck.

art297
Opal | Level 21

Note: updated 2:31pm, May 7, 2018, to correct for a coding error:

 

Here is one way to achieve what you want. I only wrote the code for the first nine education (?) levels, but I think you can figure out how to expand the code to capture the rest of the levels:

proc import datafile="/folders/myfolders/file.xlsx"
            out=main replace dbms=xlsx;
  getnames=yes;
  range='Instruction_3ansplus$A5:A923';
run;

data main (keep=Region Age Sex);
  set main;
  retain Region;
  length Age 8.;
  length Sex $6;
  if A eq 'ND' then do;
    call missing(age);
    Sex='Male';
    output;
    Sex='Female';
    output;
  end;
  else if anydigit(A) then do;
    Age=input(compress(A,,'kd'),8.);
    Sex='Male';
    output;
    Sex='Female';
    output;
  end;
  else if A eq 'Total' then do;
    Age=99;
    Sex='Male';
    output;
    Sex='Female';
    output;
  end;
  else Region=A;
run;

proc import datafile="/folders/myfolders/file.xlsx"
            out=first replace dbms=xlsx;
  getnames=yes;
  range='Instruction_3ansplus$B5:C923';
run;

data first (keep=Education Number);
  set first;
  length Education $20;
  retain Education 'Aucun Niveau';
  if not missing(Masculin) and not missing(F__minin) then do;
    Number=Masculin;
    output;
    sex='Female';
    Number=F__minin;
    output;
  end;
run;

data first;
  set main;
  set first;
run;

proc import datafile="/folders/myfolders/file.xlsx"
            out=second replace dbms=xlsx;
  getnames=yes;
  range='Instruction_3ansplus$E5:F923';
run;

data second (keep=Education Number);
  set second;
  length Education $20;
  retain Education 'Ecole Coranique';
  if not missing(Masculin) and not missing(F__minin) then do;
    Number=Masculin;
    output;
    sex='Female';
    Number=F__minin;
    output;
  end;
run;

data second;
  set main;
  set second;
run;

proc import datafile="/folders/myfolders/file.xlsx"
            out=third replace dbms=xlsx;
  getnames=yes;
  range='Instruction_3ansplus$H5:I923';
run;

data third (keep=Education Number);
  set third;
  length Education $20;
  retain Education 'Prescolaire';
  if not missing(Masculin) and not missing(F__minin) then do;
    Number=Masculin;
    output;
    sex='Female';
    Number=F__minin;
    output;
  end;
run;

data third;
  set main;
  set third;
run;

proc import datafile="/folders/myfolders/file.xlsx"
            out=fourth replace dbms=xlsx;
  getnames=yes;
  range='Instruction_3ansplus$K5:L923';
run;

data fourth (keep=Education Number);
  set fourth;
  length Education $20;
  retain Education 'CI';
  if not missing(Masculin) and not missing(F__minin) then do;
    Number=Masculin;
    output;
    sex='Female';
    Number=F__minin;
    output;
  end;
run;

data fourth;
  set main;
  set fourth;
run;

proc import datafile="/folders/myfolders/file.xlsx"
            out=fifth replace dbms=xlsx;
  getnames=yes;
  range='Instruction_3ansplus$N5:O923';
run;

data fifth (keep=Education Number);
  set fifth;
  length Education $20;
  retain Education 'CP';
  if not missing(Masculin) and not missing(F__minin) then do;
    Number=Masculin;
    output;
    sex='Female';
    Number=F__minin;
    output;
  end;
run;

data fifth;
  set main;
  set fifth;
run;

proc import datafile="/folders/myfolders/file.xlsx"
            out=sixth replace dbms=xlsx;
  getnames=yes;
  range='Instruction_3ansplus$Q5:R923';
run;

data sixth (keep=Education Number);
  set sixth;
  length Education $20;
  retain Education 'CE1';
  if not missing(Masculin) and not missing(F__minin) then do;
    Number=Masculin;
    output;
    sex='Female';
    Number=F__minin;
    output;
  end;
run;

data sixth;
  set main;
  set sixth;
run;

proc import datafile="/folders/myfolders/file.xlsx"
            out=seventh replace dbms=xlsx;
  getnames=yes;
  range='Instruction_3ansplus$T5:U923';
run;

data seventh (keep=Education Number);
  set seventh;
  length Education $20;
  retain Education 'CE2';
  if not missing(Masculin) and not missing(F__minin) then do;
    Number=Masculin;
    output;
    sex='Female';
    Number=F__minin;
    output;
  end;
run;

data seventh;
  set main;
  set seventh;
run;

proc import datafile="/folders/myfolders/file.xlsx"
            out=eighth replace dbms=xlsx;
  getnames=yes;
  range='Instruction_3ansplus$W5:X923';
run;

data eighth (keep=Education Number);
  set eighth;
  length Education $20;
  retain Education 'CM1';
  if not missing(Masculin) and not missing(F__minin) then do;
    Number=Masculin;
    output;
    sex='Female';
    Number=F__minin;
    output;
  end;
run;

data eighth;
  set main;
  set eighth;
run;

proc import datafile="/folders/myfolders/file.xlsx"
            out=ninth replace dbms=xlsx;
  getnames=yes;
  range='Instruction_3ansplus$Z5:AA923';
run;

data ninth (keep=Education Number);
  set ninth;
  length Education $20;
  retain Education 'CM1';
  if not missing(Masculin) and not missing(F__minin) then do;
    Number=Masculin;
    output;
    sex='Female';
    Number=F__minin;
    output;
  end;
run;

data ninth;
  set main;
  set ninth;
run;

data want;
  set first;
  set second;
  set third;
  set fourth;
  set fifth;
  set sixth;
  set seventh;
  set eighth;
  set ninth;
run;

The code simply gets region, age and sex first, then imports the male and female numbers for each education level and, finally, appends all of the levels together. I coded total to equal age 99 which, if you don't want those rows, can easily delete.

 

Art, CEO, AnalystFinder.com

 

art297
Opal | Level 21

I thought of an easier way to read your Workbook. The following, I think, captures all of the data you wanted:

 

proc import datafile="/folders/myfolders/file.xlsx"
            out=have replace dbms=xlsx;
  getnames=yes;
  range='Instruction_3ansplus$A5:EW923';
run;

data want (keep=Region Age Sex Education Number);
  set have;
  length Region $20;
  retain Region;
  length Age 8.;
  length Education $23;
  retain Education;
  length Sex $6;
  if A ne 'ND' and anydigit(A) eq 0 then do;
    if A ne 'Total' then Region=A;
  end;
  else if A eq 'ND' or anydigit(A) then do;
    if A eq 'ND' then call missing(age);
    else if anydigit(A) then Age=input(compress(A,,'kd'),8.);
    Education='Aucun Niveau';
    Sex='Male';
    Number=Masculin;
    output;
    Number=F__minin;
    Sex='Female';
    output;
    
    Education='Ecole Coranique';
    Sex='Male';
    Number=Masculin_1;
    output;
    Number=F__minin_1;
    Sex='Female';
    output;
    
    Education='Prescolaire';
    Sex='Male';
    Number=Masculin_2;
    output;
    Number=F__minin_2;
    Sex='Female';
    output;
    
    Education='CI';
    Sex='Male';
    Number=Masculin_3;
    output;
    Number=F__minin_3;
    Sex='Female';
    output;
    
    Education='CP';
    Sex='Male';
    Number=Masculin_4;
    output;
    Number=F__minin_4;
    Sex='Female';
    output;
    
    Education='CE1';
    Sex='Male';
    Number=Masculin_5;
    output;
    Number=F__minin_5;
    Sex='Female';
    output;
    
    Education='CE2';
    Sex='Male';
    Number=Masculin_6;
    output;
    Number=F__minin_6;
    Sex='Female';
    output;
    
    Education='CM1';
    Sex='Male';
    Number=Masculin_7;
    output;
    Number=F__minin_7;
    Sex='Female';
    output;
    
    Education='CM2';
    Sex='Male';
    Number=Masculin_8;
    output;
    Number=F__minin_8;
    Sex='Female';
    output;
    
    Education='NP_Primaire';
    Sex='Male';
    Number=Masculin_9;
    output;
    Number=F__minin_9;
    Sex='Female';
    output;
    
    Education='6eme';
    Sex='Male';
    Number=Masculin_10;
    output;
    Number=F__minin_10;
    Sex='Female';
    output;
    
    Education='6eme';
    Sex='Male';
    Number=Masculin_11;
    output;
    Number=F__minin_11;
    Sex='Female';
    output;
    
    Education='4eme';
    Sex='Male';
    Number=Masculin_12;
    output;
    Number=F__minin_12;
    Sex='Female';
    output;
    
    Education='3eme';
    Sex='Male';
    Number=Masculin_13;
    output;
    Number=F__minin_13;
    Sex='Female';
    output;
    
    Education='NP Sec1';
    Sex='Male';
    Number=Masculin_14;
    output;
    Number=F__minin_14;
    Sex='Female';
    output;
    
    Education='Seconde';
    Sex='Male';
    Number=Masculin_15;
    output;
    Number=F__minin_15;
    Sex='Female';
    output;
    
    Education='Premiere';
    Sex='Male';
    Number=Masculin_16;
    output;
    Number=F__minin_16;
    Sex='Female';
    output;
    
    Education='Terminale';
    Sex='Male';
    Number=Masculin_17;
    output;
    Number=F__minin_17;
    Sex='Female';
    output;
    
    Education='NP Sec2';
    Sex='Male';
    Number=Masculin_18;
    output;
    Number=F__minin_18;
    Sex='Female';
    output;
    
    Education='EP Sec1 1';
    Sex='Male';
    Number=Masculin_19;
    output;
    Number=F__minin_19;
    Sex='Female';
    output;
    
    Education='EP Sec1 2';
    Sex='Male';
    Number=Masculin_20;
    output;
    Number=F__minin_20;
    Sex='Female';
    output;
    
    Education='EP Sec1 3';
    Sex='Male';
    Number=Masculin_21;
    output;
    Number=F__minin_21;
    Sex='Female';
    output;
    
    Education='NP EP Sec1';
    Sex='Male';
    Number=Masculin_22;
    output;
    Number=F__minin_22;
    Sex='Female';
    output;
    
    Education='EP Sec2 1';
    Sex='Male';
    Number=Masculin_23;
    output;
    Number=F__minin_23;
    Sex='Female';
    output;
    
    Education='EP Sec2 2';
    Sex='Male';
    Number=Masculin_24;
    output;
    Number=F__minin_24;
    Sex='Female';
    output;
    
    Education='EP Sec2 3';
    Sex='Male';
    Number=Masculin_25;
    output;
    Number=F__minin_25;
    Sex='Female';
    output;
    
    Education='NP EP Sec2';
    Sex='Male';
    Number=Masculin_26;
    output;
    Number=F__minin_26;
    Sex='Female';
    output;
    
    Education='EP Sup1';
    Sex='Male';
    Number=Masculin_27;
    output;
    Number=F__minin_27;
    Sex='Female';
    output;
    
    Education='EP Sup2';
    Sex='Male';
    Number=Masculin_28;
    output;
    Number=F__minin_28;
    Sex='Female';
    output;
    
    Education='EP Sup3';
    Sex='Male';
    Number=Masculin_29;
    output;
    Number=F__minin_29;
    Sex='Female';
    output;
    
    Education='EP Sup4';
    Sex='Male';
    Number=Masculin_30;
    output;
    Number=F__minin_30;
    Sex='Female';
    output;
    
    Education='EP Sup5';
    Sex='Male';
    Number=Masculin_31;
    output;
    Number=F__minin_31;
    Sex='Female';
    output;
    
    Education='EP Sup6';
    Sex='Male';
    Number=Masculin_32;
    output;
    Number=F__minin_32;
    Sex='Female';
    output;
    
    Education='EP Sup7';
    Sex='Male';
    Number=Masculin_33;
    output;
    Number=F__minin_33;
    Sex='Female';
    output;
    
    Education='EP Sup8';
    Sex='Male';
    Number=Masculin_34;
    output;
    Number=F__minin_34;
    Sex='Female';
    output;
    
    Education='NP EP Sup';
    Sex='Male';
    Number=Masculin_35;
    output;
    Number=F__minin_35;
    Sex='Female';
    output;
    
    Education='Sup1';
    Sex='Male';
    Number=Masculin_36;
    output;
    Number=F__minin_36;
    Sex='Female';
    output;
    
    Education='Sup2';
    Sex='Male';
    Number=Masculin_37;
    output;
    Number=F__minin_37;
    Sex='Female';
    output;
    
    Education='Sup3';
    Sex='Male';
    Number=Masculin_38;
    output;
    Number=F__minin_38;
    Sex='Female';
    output;
    
    Education='Sup4';
    Sex='Male';
    Number=Masculin_39;
    output;
    Number=F__minin_39;
    Sex='Female';
    output;
    
    Education='Sup5';
    Sex='Male';
    Number=Masculin_40;
    output;
    Number=F__minin_40;
    Sex='Female';
    output;
    
    Education='Sup6';
    Sex='Male';
    Number=Masculin_41;
    output;
    Number=F__minin_41;
    Sex='Female';
    output;
    
    Education='Sup7';
    Sex='Male';
    Number=Masculin_42;
    output;
    Number=F__minin_42;
    Sex='Female';
    output;
    
    Education='Sup8';
    Sex='Male';
    Number=Masculin_43;
    output;
    Number=F__minin_43;
    Sex='Female';
    output;
    
    Education='NP Sup';
    Sex='Male';
    Number=Masculin_44;
    output;
    Number=F__minin_44;
    Sex='Female';
    output;
    
    Education="Cours d'adulte";
    Sex='Male';
    Number=Masculin_45;
    output;
    Number=F__minin_45;
    Sex='Female';
    output;
    
    Education='Ecole de seconde chance';
    Sex='Male';
    Number=Masculin_46;
    output;
    Number=F__minin_46;
    Sex='Female';
    output;
    
    Education='classe passerelle';
    Sex='Male';
    Number=Masculin_47;
    output;
    Number=F__minin_47;
    Sex='Female';
    output;
    
    Education='classe alternative';
    Sex='Male';
    Number=Masculin_48;
    output;
    Number=F__minin_48;
    Sex='Female';
    output;
    
    Education='CFDC';
    Sex='Male';
    Number=Masculin_49;
    output;
    Number=F__minin_49;
    Sex='Female';
    output;
  end;
run;

Art, CEO, AnalystFinder.com

 

Demographer
Pyrite | Level 9
Thank you, but I have an error message saying that "ERROR: Numeric length cannot be used with character variable AGE". I guess it's because the column for age and region is the same.
art297
Opal | Level 21

My initial guess is that you didn't use the range statement I had in my suggested code, when you ran the code I proposed.

 

If you had, SAS would never have seen the string AGE.

 

Art, CEO, AnalystFinder.com

 

Demographer
Pyrite | Level 9
You are right. I don't know why, but when I try to import an xlsx file, it says the format is not supported by SAS. So I converted my file into a csv (and then I cannot use the range statement).
art297
Opal | Level 21

You can accomplish the same thing as range, with a CSV file, by using the datarow option. e.g.:

FILENAME REFFILE '/folders/myfolders/File.csv';
PROC IMPORT DATAFILE=REFFILE DBMS=CSV OUT=WORK.HAVE REPLACE; GETNAMES=NO; DATAROW=6; RUN; data want (keep=Region Age Sex Education Number); set have; length Region $20; retain Region; length Age 8.; length Education $23; retain Education; length Sex $6; if VAR1 ne 'ND' and anydigit(VAR1) eq 0 then do; if VAR1 ne 'Total' then Region=VAR1; end; else if VAR1 eq 'ND' or anydigit(VAR1) then do; if VAR1 eq 'ND' then call missing(age); else if anydigit(VAR1) then Age=input(compress(VAR1,,'kd'),8.); Education='Aucun Niveau'; Sex='Male'; Number=VAR2; output; Number=VAR3; Sex='Female'; output; Education='Ecole Coranique'; Sex='Male'; Number=VAR5; output; Number=VAR6; Sex='Female'; output; Education='Prescolaire'; Sex='Male'; Number=VAR8; output; Number=VAR9; Sex='Female'; output; Education='CI'; Sex='Male'; Number=VAR11; output; Number=VAR12; Sex='Female'; output; Education='CP'; Sex='Male'; Number=VAR14; output; Number=VAR15; Sex='Female'; output; Education='CE1'; Sex='Male'; Number=VAR17; output; Number=VAR18; Sex='Female'; output; Education='CE2'; Sex='Male'; Number=VAR20; output; Number=VAR21; Sex='Female'; output; Education='CM1'; Sex='Male'; Number=VAR23; output; Number=VAR24; Sex='Female'; output; Education='CM2'; Sex='Male'; Number=VAR26; output; Number=VAR27; Sex='Female'; output; Education='NP_Primaire'; Sex='Male'; Number=VAR29; output; Number=VAR30; Sex='Female'; output; Education='6eme'; Sex='Male'; Number=VAR32; output; Number=VAR33; Sex='Female'; output; Education='6eme'; Sex='Male'; Number=VAR35; output; Number=VAR36; Sex='Female'; output; Education='4eme'; Sex='Male'; Number=VAR38; output; Number=VAR39; Sex='Female'; output; Education='3eme'; Sex='Male'; Number=VAR41; output; Number=VAR42; Sex='Female'; output; Education='NP Sec1'; Sex='Male'; Number=VAR44; output; Number=VAR45; Sex='Female'; output; Education='Seconde'; Sex='Male'; Number=VAR47; output; Number=VAR48; Sex='Female'; output; Education='Premiere'; Sex='Male'; Number=VAR50; output; Number=VAR51; Sex='Female'; output; Education='Terminale'; Sex='Male'; Number=VAR53; output; Number=VAR54; Sex='Female'; output; Education='NP Sec2'; Sex='Male'; Number=VAR56; output; Number=VAR57; Sex='Female'; output; Education='EP Sec1 1'; Sex='Male'; Number=VAR59; output; Number=VAR60; Sex='Female'; output; Education='EP Sec1 2'; Sex='Male'; Number=VAR62; output; Number=VAR63; Sex='Female'; output; Education='EP Sec1 3'; Sex='Male'; Number=VAR65; output; Number=VAR66; Sex='Female'; output; Education='NP EP Sec1'; Sex='Male'; Number=VAR68; output; Number=VAR69; Sex='Female'; output; Education='EP Sec2 1'; Sex='Male'; Number=VAR71; output; Number=VAR72; Sex='Female'; output; Education='EP Sec2 2'; Sex='Male'; Number=VAR74; output; Number=VAR75; Sex='Female'; output; Education='EP Sec2 3'; Sex='Male'; Number=VAR77; output; Number=VAR78; Sex='Female'; output; Education='NP EP Sec2'; Sex='Male'; Number=VAR80; output; Number=VAR81; Sex='Female'; output; Education='EP Sup1'; Sex='Male'; Number=VAR83; output; Number=VAR84; Sex='Female'; output; Education='EP Sup2'; Sex='Male'; Number=VAR86; output; Number=VAR87; Sex='Female'; output; Education='EP Sup3'; Sex='Male'; Number=VAR89; output; Number=VAR90; Sex='Female'; output; Education='EP Sup4'; Sex='Male'; Number=VAR92; output; Number=VAR93; Sex='Female'; output; Education='EP Sup5'; Sex='Male'; Number=VAR95; output; Number=VAR96; Sex='Female'; output; Education='EP Sup6'; Sex='Male'; Number=VAR98; output; Number=VAR99; Sex='Female'; output; Education='EP Sup7'; Sex='Male'; Number=VAR101; output; Number=VAR102; Sex='Female'; output; Education='EP Sup8'; Sex='Male'; Number=VAR104; output; Number=VAR105; Sex='Female'; output; Education='NP EP Sup'; Sex='Male'; Number=VAR107; output; Number=VAR108; Sex='Female'; output; Education='Sup1'; Sex='Male'; Number=VAR110; output; Number=VAR111; Sex='Female'; output; Education='Sup2'; Sex='Male'; Number=VAR113; output; Number=VAR114; Sex='Female'; output; Education='Sup3'; Sex='Male'; Number=VAR116; output; Number=VAR117; Sex='Female'; output; Education='Sup4'; Sex='Male'; Number=VAR119; output; Number=VAR120; Sex='Female'; output; Education='Sup5'; Sex='Male'; Number=VAR122; output; Number=VAR123; Sex='Female'; output; Education='Sup6'; Sex='Male'; Number=VAR125; output; Number=VAR126; Sex='Female'; output; Education='Sup7'; Sex='Male'; Number=VAR128; output; Number=VAR129; Sex='Female'; output; Education='Sup8'; Sex='Male'; Number=VAR131; output; Number=VAR132; Sex='Female'; output; Education='NP Sup'; Sex='Male'; Number=VAR134; output; Number=VAR135; Sex='Female'; output; Education="Cours d'adulte"; Sex='Male'; Number=VAR137; output; Number=VAR138; Sex='Female'; output; Education='Ecole de seconde chance'; Sex='Male'; Number=VAR140; output; Number=VAR141; Sex='Female'; output; Education='classe passerelle'; Sex='Male'; Number=VAR143; output; Number=VAR144; Sex='Female'; output; Education='classe alternative'; Sex='Male'; Number=VAR146; output; Number=VAR147; Sex='Female'; output; Education='CFDC'; Sex='Male'; Number=VAR149; output; Number=VAR150; Sex='Female'; output; end; run;

Art, CEO, AnalystFinder.com

 

Demographer
Pyrite | Level 9

It almost worked. The only problem I see is that there is no value in the column "Region" of the file "Want". While importing the csv file, I got several of this message: "NOTE: Invalid data for VAR1 in line XXXXXX."

art297
Opal | Level 21

Worked perfectly for me! I've attached the copy of the csv file I ran it on. Possibly your file is corrupt or has a different end of record marker.

 

Art, CEO, AnalystFinder.com

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 13 replies
  • 1457 views
  • 0 likes
  • 4 in conversation