Solved: Re: How to shift record values?

kevinma28 · Posted 10-18-2021 01:21 PM

Hello, I'm still relatively new to SAS. I got a dataset which needs to be cleaned. The dataset has several rows shifted to the left, so for example the age is in the name column, etc.

Example:

ID	Name	Age	Sex
1001	Marcel	20	M
.	25	M	.

Desired output:

ID	Name	Age	Sex
1001	Marcel	20	M
.	.	25	M

How can I shift the affected rows to the right? Missing values are allowed. Thanks before.

Edit:

I attached the Listings.csv dataset that I was given to be cleaned up and analyzed. I noticed some strange values in neighbourhood_group column and after looking at the records, I recognized that those records are shifted. The reason I want to shift these records is to retrieve any valuable data for analysis from these records to get a more accurate result.

Kurt_Bremser · Posted 10-19-2021 08:58 AM

I ran this code after uploading the file to my ODA account:

data want;
infile '~/Listings.csv' dlm="," dsd truncover firstobs=2;
input
  id :$10.
  name :$50.
  host_id :$10.
  host_name :$20.
  neighbourhood_group :$20.
  neighbourhood :$20.
  latitude
  longitude
  room_type :$20.
  price
  minimum_nights
  number_of_reviews
  last_review :mmddyy10.
  reviews_per_month
  calculated_host_listings_count
  availability_365
;
format last_review yymmdd10.;
run;

I found 12 problematic rows.

The offending rows are caused by CR/LF/CRLF characters in the name column. You need to clean those out first before you read the data. Search the communities for "line breaks in text files" or similar; you will find a suitable method for removing those unwanted line terminators. To know what you need to filter out, use an advanced text editor like Notepad++ that can show you the characters.

Edit: TERMSTR=CRLF, as suggested by @Ksharp fixes it. So you have single CR or LF characters in a Windows-style terminated text file.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

PeterClemmensen · Posted 10-18-2021 01:29 PM

How do you know what observations are affected?

The DATA to DATA Step Macro
Blog: SASnrd

AMSAS · Posted 10-18-2021 01:34 PM

What @PeterClemmensen said, how do you know?
It may seem obvious to us humans (at least this specific example) but not so obvious to the computer
Where's the data coming from? It probably needs correcting at the source, or maybe you have a coding issue when reading it in

ballardw · Posted 10-18-2021 01:58 PM

@kevinma28 wrote:

Hello, I'm still relatively new to SAS. I got a dataset which needs to be cleaned. The dataset has several rows shifted to the left, so for example the age is in the name column, etc.

Example:

ID Name Age Sex

1001 Marcel 20 M

. 25 M .

Desired output:

ID Name Age Sex

1001 Marcel 20 M

. . 25 M

How can I shift the affected rows to the right? Missing values are allowed. Thanks before.

I am going to say that your desired output should be:

ID	Name	Age	Sex
1001	Marcel	20	M
1001	Marcel	25	M

Records without all identification values are just waiting to be a serious problem. Sort on ID? then that Age and Sex no longer have any meaning. Sort on Name, same thing. Attempt to match records to another data set on ID and you have a failure, or incomplete data without the ID values on every record.

One suspects the culprit is how the data was read into SAS. Were you responsible for that? If so, show the code used. Best would be to provide an example of the data source file. If you did not read the data, go talk to whoever did read the data as this type of problem indicates a process failure.

Having seen such things in the past one suspects that a report type file was read and may have many other missing elements or incorrect data. For instance, if your AGE is supposed to be numeric then you have an issue with this data as it must be Character.

kevinma28 · Posted 10-19-2021 08:24 AM

Thanks for your reply, I've edited my question and attached the data source file.

Kurt_Bremser · Posted 10-19-2021 08:58 AM

I ran this code after uploading the file to my ODA account:

data want;
infile '~/Listings.csv' dlm="," dsd truncover firstobs=2;
input
  id :$10.
  name :$50.
  host_id :$10.
  host_name :$20.
  neighbourhood_group :$20.
  neighbourhood :$20.
  latitude
  longitude
  room_type :$20.
  price
  minimum_nights
  number_of_reviews
  last_review :mmddyy10.
  reviews_per_month
  calculated_host_listings_count
  availability_365
;
format last_review yymmdd10.;
run;

I found 12 problematic rows.

The offending rows are caused by CR/LF/CRLF characters in the name column. You need to clean those out first before you read the data. Search the communities for "line breaks in text files" or similar; you will find a suitable method for removing those unwanted line terminators. To know what you need to filter out, use an advanced text editor like Notepad++ that can show you the characters.

Edit: TERMSTR=CRLF, as suggested by @Ksharp fixes it. So you have single CR or LF characters in a Windows-style terminated text file.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

kevinma28 · Posted 10-19-2021 09:09 AM

Ah, I see. Thank you for the solution.

Patrick · Posted 10-18-2021 02:32 PM

@kevinma28 How did you create the data set with the "shifted" values? Ideally the issue gets resolved as early in the process as possible.

andreas_lds · Posted 10-19-2021 03:47 AM

Looks like something went wrong during importing data, so please post the code you have used to read the file and the log, so that we can help fixing the problem where it occurred.

kevinma28 · Posted 10-19-2021 08:15 AM

Hello guys, thanks for the replies. I edited my question with more information.

This is what I get if I open the dataset at Excel. I believe there is no problem in my import code as this is a problem in the dataset.

When I filter it by neighbourhood_group, there should have been only 5 regions (Central, West, East, North, Northeast) in the 5th column. However I noticed that the filter have other values and I realized that it is the value from another column besides it. I need to clean the data so I want to shift the columns so at least some values can still be used for analysis.

Ksharp · Posted 10-19-2021 08:39 AM

I didn't see any shift record .everything is ok.

filename x 'c:\temp\listings.csv' encoding='utf8' termstr=crlf;
proc import datafile=x out=have dbms=csv replace;
getnames=yes;
guessingrows=max;
run;

kevinma28 · Posted 10-19-2021 09:10 AM

Thank you for the solution

Registration is open

SAS Training: Just a Click Away