Monday, 21 March 2016

2. DATA STEP

   
Used for: Names the SAS data set and creates the dataset.
In a Data step the following can be performed.
  •  defining the variables
  •  read input files
  •  assign values to the variables,
  •  creating new variables,
  •  merging two or more data sets
  •  formatting and labeling variables
  •  assignment of missing values.



If a variable is used in a SAS program but not initialized then SAS automatically assigns a missing value to it.
Numeric missing values are represented by a single period (.). Character missing values are represented by a single
blank enclosed in quotes (' ').

Syntax:
DATA <SOMENAME>;

The DATA step starts with the DATA statement.
The data set name should be 1-32 characters and must begin with a letter or underscore.
Any text within <> represents optional material or a user defined name or value.

Example:
DATA EMPDAT;

The following SAS statements can be used in a DATA step.

2.1 INFILE AND INPUT STATEMENT

Used for: INFILE is used to read external files (such as mainframe file, text files, comma delimited files etc).
INPUT is used to define names and order of variables for the SAS dataset.

Syntax:
INFILE file-specification <options>;
INPUT variable variable_type column(s);

Example1:

DATA EMPFL;
 INFILE 'c:\emp\external\emp1.dat';
 INPUT
 @001 EMPNO $CHAR6.
 @007 NAME $CHAR15.
 @022 AGE 3.;
 RUN;

In this example the external file “emp1.dat” which is stored in the location “c:\emp\external” will be read and copied into SAS file EMPFL. Three variables EMPNO, NAME and AGE will be read.

2.2 SET STATEMENT

Used for: Reads one or more existing SAS datasets
Syntax:

SET <SAS dataset name> <(OBS=n)>;

Where “n” is the number of observations you want to read from the file.
Example1:

Copies first 100 records of OLDFILE1 to NEWFILE.

DATA NEWFILE;
 SET OLDFILE1 (OBS=100);
RUN;

Example2:

Concatenate two datasets.
Input SAS datasets:
DATA COMMON;
 SET NWEST SWEST;
RUN;


2.3 IF/THEN; ELSE; STATEMENT

Used for: Used for conditional checking. Used in a data step.

Syntax:

IF expression THEN statement;
<ELSE statement;>
Example:
IF LANG='Spanish' or LANG='French' THEN
NEWLANG='NotEngl';
ELSE
NEWLANG='English';

In this example ‘Spanish’ is a character constant.

Example 2:

IF status=‘M' AND type=1 THEN  count=count+1;

2.4 SUBSETTING “IF” STATEMENT

Used for: To subset, or take a portion of the data set

Syntax:

IF expression;

Where expression is any SAS expression.
Example:

DATA FORGNER;
 IF LANG= ‘ENGLISH’ AND TAX >= 20000;
RUN;

In the above example the subsetting IF statements will select all observations where LANG equals ‘ENGLISH’ and
where TAX greater than and equal to 20000.
Notice that values for character variables must be enclosed in quotes and values must match exactly including case.

2.5 LIBNAME STATEMENT

Used for: Associates a libref with a SAS library. It’s kind of location pointer.
This is generally used when we want to save the SAS dataset in a permanent location. When a libref is not
associated with SAS dataset, SAS assumes it is created or read from SAS work area which is temporary in nature.

Syntax:

LIBNAME libref <engine>'('SAS-data-library-1' <,..'SAS-data-library-n'> ) ';

Example:
In windows –

LIBNAME EMPLIB1 ‘C:\data\SAS\EMPDATA’;

The path ‘C:\data\SAS\EMPDATA’ will be assigned to libref EMPLIB1. Say there is a SAS dataset EMPDAT in that

library, it can be access as,
DATA EMPFILE1;
 SET EMPLIB1.EMPDAT;
RUN;
In Unix –

LIBNAME CLAIM ‘\data\CL\CLAIM\INPUT’;
DATA CLAIM.LOSS_HISTORY;
 SET LOSS_TEMP;
RUN;
In this example the temporary file LOSS_TEMP will be stored in a permanent location ‘\data\CL\CLAIM\INPUT’
with a new name LOSS_HISTORY.

2.6 MERGE STATEMENT
Used for: Joins corresponding observations from two or more SAS data sets.
Syntax:

DATA new-sasdataset;
 MERGE SAS dataset name-1 SAS dataset name-2 ... SAS dataset name-n;
 BY var1 var2 varm;
RUN;

Input data sets must be sorted by the same BY variables before you can merge them.
Example:
Merge the two files SURVY and NAMES by the variable NAME.

DATA NEW;
 MERGE NAMES
 SURVY;
 BY NAME;
RUN;



No comments:

Post a Comment