BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Wub_SAS
Obsidian | Level 7

Hi Everyone,

 

I have a character variable (p1q1_living) for which data was entered inconsistently and some respondents have picked two age ranges (sample SAS dataset attached). I need to find a way to extract the start and last years of the reported year ranges and put them together using either hyphen or "to". I couldn't come up with a proper string function. So, I tried the attached manual conversion but it at least didn't work for people who reported more than one year ranges.

 

Thank you for your help in advance.

Have.PNGWant.PNG

Data PRISM.p1q1_rawdata4; 
	Set PRISM.p1q1_rawdata3;
	If p1q1_living = "18 to 27 years" then p1q1_living= "18-27"; If p1q1_living = "19 to 35 years" then p1q1_living= "19-35";
 	If p1q1_living = "19 to 42 years" then p1q1_living= "19-42"; If p1q1_living = "19 to 50 years" then p1q1_living= "19-50"; 
	If p1q1_living = "20 to 32 years" then p1q1_living= "20-32"; If p1q1_living = "20 to 39 years" then p1q1_living= "20-39"; 
	If p1q1_living = "20 to 40 years" then p1q1_living= "20-40"; If p1q1_living = "21 to 30 years" then p1q1_living= "21-30"; 

	If p1q1_living = "19-33 years, 36-52 years" then p1q1_living= "19-33" AND p1q1_living= "36-52";
	If p1q1_living = "22-39, 39-41" then p1q1_living= "22-39" AND p1q1_living= "39-41";
	
	If p1q1_living = "21 to 39 years" then p1q1_living= "21-39"; If p1q1_living = "22 to 46 years" then p1q1_living= "22-46"; 
	If p1q1_living = "23 to 46 years" then p1q1_living= "23-46"; If p1q1_living = "24 to 38 years" then p1q1_living= "24-38"; 
	If p1q1_living = "24 to 43 years" then p1q1_living= "24-43"; If p1q1_living = "25 to 35 years" then p1q1_living= "25-35"; 
	If p1q1_living = "26 to 40 years" then p1q1_living= "26-40"; If p1q1_living = "26 to 56 years" then p1q1_living= "26-56"; 
	If p1q1_living = "27 to 46 years" then p1q1_living= "27-46"; If p1q1_living = "29 to 41 years" then p1q1_living= "29-41"; 
	If p1q1_living = "29 to 49 years" then p1q1_living= "29-49"; If p1q1_living = "32 to 40 years" then p1q1_living= "32-40"; 
	If p1q1_living = "32 to 43 years" then p1q1_living= "32-43"; If p1q1_living = "32 to 47 years" then p1q1_living= "32-47"; 
	If p1q1_living = "32 to 48 years" then p1q1_living= "32-48"; If p1q1_living = "34 to 43 years" then p1q1_living= "34-43"; 
	If p1q1_living = "35 to 45 years" then p1q1_living= "35-45"; If p1q1_living = "45 to 62 years" then p1q1_living= "45-62"; 
	If p1q1_living = "18 to 28" then p1q1_living= "18-28";	If p1q1_living = "22 to 42" then p1q1_living= "22-42";
	If p1q1_living = "23 to 43" then p1q1_living= "23-43";	If p1q1_living = "24 to 38" then p1q1_living= "24-38";
	If p1q1_living = "25 to 37" then p1q1_living= "25-37";	If p1q1_living = "28 to 43" then p1q1_living= "28-43";
	If p1q1_living = "30 to 43" then p1q1_living= "30-43";	If p1q1_living = "31 to 51" then p1q1_living= "31-51";
	If p1q1_living = "ASKU" or p1q1_living= "NA" OR p1q1_living= "REF" then p1q1_living= " "; 

Run;

 

ID

p1q1_living

        

12007c

17-35

        

12024c

18 to 27 years

        

12025c

18-25

        

12032c

18-30

        

12033c

18-45

        

12043c

19 to 50 years

        

12044

19-27

        

12045c

19-33 years, 36-52 years

        

12048c

20 to 40 years

        

12049c

20-34

        

12314c

21 to 39 years

        

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
/*Same idea*/
data have;
  infile datalines truncover dlm='|';
  input ID:$10. p1q1_living:$30.;
  datalines;
12007c|17-35
12024x|
12024c|18 to 27 years
12025c|18-25
12032c|18-30
12033c|18-45
12043c|19 to 50 years
12044|19-27
12045c|19-33 years, 36-52 years
12048c|20 to 40 years
12049c|20-34
12314c|21 to 39 years
;

data want;
 set have;
 n=countw(p1q1_living,,'kd');
 if mod(n,2)=0 and n ne 0 then do;
 do i=1 to n by 2;
  want=catx('-',scan(p1q1_living,i,,'kd'),scan(p1q1_living,i+1,,'kd'));
  output;
 end;
 end;
 else output;
 
drop n i;
run;

proc print noobs;run;

Ksharp_0-1678885269316.png

 

View solution in original post

6 REPLIES 6
Wub_SAS
Obsidian | Level 7

Hi Everyone,

 

I have a character variable (p1q1_living) for which data was entered inconsistently and some respondents have picked two age ranges (sample SAS dataset attached). I need to find a way to extract the start and last years of the reported year ranges and put them together using either hyphen or "to". I couldn't come up with a proper string function. So, I tried the attached manual conversion but it at least didn't work for people who reported more than one year ranges.

 

Thank you for your help in advance.

 

Have.PNGWant.PNG

 

 

 

Data p1q1_rawdata4; 
	Set PRISM.p1q1_rawdata3;
	If p1q1_living = "18 to 27 years" then p1q1_living= "18-27"; If p1q1_living = "19 to 35 years" then p1q1_living= "19-35";
 	If p1q1_living = "19 to 42 years" then p1q1_living= "19-42"; If p1q1_living = "19 to 50 years" then p1q1_living= "19-50"; 
	If p1q1_living = "20 to 32 years" then p1q1_living= "20-32"; If p1q1_living = "20 to 39 years" then p1q1_living= "20-39"; 
	If p1q1_living = "20 to 40 years" then p1q1_living= "20-40"; If p1q1_living = "21 to 30 years" then p1q1_living= "21-30"; 

	If p1q1_living = "19-33 years, 36-52 years" then p1q1_living= "19-33" AND p1q1_living= "36-52";
	If p1q1_living = "22-39, 39-41" then p1q1_living= "22-39" AND p1q1_living= "39-41";
	
	If p1q1_living = "21 to 39 years" then p1q1_living= "21-39"; If p1q1_living = "22 to 46 years" then p1q1_living= "22-46"; 
	If p1q1_living = "23 to 46 years" then p1q1_living= "23-46"; If p1q1_living = "24 to 38 years" then p1q1_living= "24-38"; 
	If p1q1_living = "24 to 43 years" then p1q1_living= "24-43"; If p1q1_living = "25 to 35 years" then p1q1_living= "25-35"; 
	If p1q1_living = "26 to 40 years" then p1q1_living= "26-40"; If p1q1_living = "26 to 56 years" then p1q1_living= "26-56"; 
	If p1q1_living = "27 to 46 years" then p1q1_living= "27-46"; If p1q1_living = "29 to 41 years" then p1q1_living= "29-41"; 
	If p1q1_living = "29 to 49 years" then p1q1_living= "29-49"; If p1q1_living = "32 to 40 years" then p1q1_living= "32-40"; 
	If p1q1_living = "32 to 43 years" then p1q1_living= "32-43"; If p1q1_living = "32 to 47 years" then p1q1_living= "32-47"; 
	If p1q1_living = "32 to 48 years" then p1q1_living= "32-48"; If p1q1_living = "34 to 43 years" then p1q1_living= "34-43"; 
	If p1q1_living = "35 to 45 years" then p1q1_living= "35-45"; If p1q1_living = "45 to 62 years" then p1q1_living= "45-62"; 
	If p1q1_living = "18 to 28" then p1q1_living= "18-28";	If p1q1_living = "22 to 42" then p1q1_living= "22-42";
	If p1q1_living = "23 to 43" then p1q1_living= "23-43";	If p1q1_living = "24 to 38" then p1q1_living= "24-38";
	If p1q1_living = "25 to 37" then p1q1_living= "25-37";	If p1q1_living = "28 to 43" then p1q1_living= "28-43";
	If p1q1_living = "30 to 43" then p1q1_living= "30-43";	If p1q1_living = "31 to 51" then p1q1_living= "31-51";
	If p1q1_living = "ASKU" or p1q1_living= "NA" OR p1q1_living= "REF" then p1q1_living= " "; 

Run;
ballardw
Super User

Example data as data step, or at least plain text so we can create a data set.

You do not say what you expect to have for output for the "people who reported more than one year ranges".

Doesn't work is awful vague.

Are there errors in the log?: Post the code and log in a code box opened with the "</>" to maintain formatting of error messages.

No output? Post any log in a code box.

Unexpected output? Provide input data in the form of data step code pasted into a code box, the actual results and the expected results. Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the "</>" icon or attached as text to show exactly what you have and that we can test code against.

 

I bet that you have some conversion of numeric to character notes. Probably from this line as an example which is why request logs.

If p1q1_living = "19-33 years, 36-52 years" then p1q1_living= "19-33" AND p1q1_living= "36-52";
421  data example;
422     p1q1_living = "19-33 years, 36-52 years";
423     If p1q1_living = "19-33 years, 36-52 years" then p1q1_living= "19-33" AND p1q1_living=
423! "36-52";
424  run;

NOTE: Character values have been converted to numeric values at the places given by:
      (Line):(Column).
      423:66
NOTE: Numeric values have been converted to character values at the places given by:
      (Line):(Column).
      423:74
NOTE: Invalid numeric data, '19-33' , at line 423 column 66.
p1q1_living=0 _ERROR_=1 _N_=1
NOTE: The data set WORK.EXAMPLE has 1 observations and 1 variables.

Column 66 is the "19-33" after the "then p1q1_living=" and happens because the AND that follows is treated as a LOGICAL comparison, which in SAS are numeric 1/0 values for true/false. The second conversion at column 74 is at the AND as trying to get the types to match.

This is where you have to describe and better show us what you expect for a result.

 

s_lassen
Meteorite | Level 14

This code

 

	If p1q1_living = "19-33 years, 36-52 years" then p1q1_living= "19-33" AND p1q1_living= "36-52";
	If p1q1_living = "22-39, 39-41" then p1q1_living= "22-39" AND p1q1_living= "39-41";

obviously does not work as intended. You cannot assign two values to the same variable at the same time. What it does is this: It assigns (or tries to assign) a boolean value. On the left side of the AND is the character value "22-39" which SAS tries to convert to a number, returning a missing value. On the right side is the expression  p1q1_living= "39-41", which evaluates to 0 (false). So the expression evaluates to . AND 0, which again evaluates to 0. Your variable finally gets the value "0", as the result is converted to character.

 

What do you really want do do? If you want both values, there are two possibilities:

  1. Create a new variable, which contains the second value, e.g.
     If p1q1_living = "22-39, 39-41" then do;
      p1q1_living= "22-39";
      p1q1_living2= "39-41";
      end;
    
    
  2. Output the second value to another observation:
     If p1q1_living = "22-39, 39-41" then do;
      p1q1_living= "22-39";
      output;
      p1q1_living= "39-41";
      end;
    
    if you go with this solution, remember to put another OUTPUT statement at the end of your datastep.

 

But if you just want to get the first age and the last age, (e.g. "19-33 years, 36-52 years" becomes "19-52"), use SCAN:

p1q1_living = cats(scan(p1q1_living,1,' -','A'),'-',scan(p1q1_living,-1,' -,','A'));

(by adding the 'A' modifier to the scan function, we also consider alphabetical characters delimiters, so that only the numbers are considered words)

 

 

Patrick
Opal | Level 21

Below how this could work for the sample data you've provided. 

If there are two date ranges then the code will create two observations. If you want a different treatment for such cases then you need to tell us exactly what the desired result should be.

data have;
  infile datalines truncover dlm='|';
  input ID:$10. p1q1_living:$30.;
  datalines;
12007c|17-35
12024x|
12024c|18 to 27 years
12025c|18-25
12032c|18-30
12033c|18-45
12043c|19 to 50 years
12044|19-27
12045c|19-33 years, 36-52 years
12048c|20 to 40 years
12049c|20-34
12314c|21 to 39 years
;

data want;
  set have;
  length cleansed $10;

  do _i=1 by 1;
    cleansed=scan(p1q1_living,_i,',');
    if _i>1 and missing(cleansed) then leave;
    cleansed=prxchange('s/^(\d+)[^\d]+(\d+).*[^\d]/$1-$2/oi',-1,strip(cleansed));
    output;
  end;
  drop _i;
run;

proc print data=want;
run;

Patrick_0-1678880178243.png

 

yabwon
Amethyst | Level 16

An alternative to @Patrick 's regular experssions:

data have;
  input ID $ 6. p1q1_living $ 8-40;
  dumy_variable= _N_; /* just for obs number */
cards;        
12007c 17-35         
12024c 18 to 27 years         
12025c 18-25         
12032c 18-30         
12033c 18-45         
12043c 19 to 50 years         
12044  19-27         
12045c 19-33 years, 36-52 years         
12048c 20 to 40 years         
12049c 20-34         
12314c 21 to 39 years
;
run;
proc print;
run;

data want;
  set have;
  p1q1_living=compress(p1q1_living,', -',"KD"); /* keep digits, commas, hyphens, and spaces only */
 
  do _N_=1 to countw(p1q1_living,",");
    p1q1_living_NEW=translate(compbl(strip(scan(p1q1_living,_N_,","))),"-"," ");
    output;
  end;
run;
proc print;
run; 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Ksharp
Super User
/*Same idea*/
data have;
  infile datalines truncover dlm='|';
  input ID:$10. p1q1_living:$30.;
  datalines;
12007c|17-35
12024x|
12024c|18 to 27 years
12025c|18-25
12032c|18-30
12033c|18-45
12043c|19 to 50 years
12044|19-27
12045c|19-33 years, 36-52 years
12048c|20 to 40 years
12049c|20-34
12314c|21 to 39 years
;

data want;
 set have;
 n=countw(p1q1_living,,'kd');
 if mod(n,2)=0 and n ne 0 then do;
 do i=1 to n by 2;
  want=catx('-',scan(p1q1_living,i,,'kd'),scan(p1q1_living,i+1,,'kd'));
  output;
 end;
 end;
 else output;
 
drop n i;
run;

proc print noobs;run;

Ksharp_0-1678885269316.png

 

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1773 views
  • 3 likes
  • 6 in conversation