In this post, we put together best SAS interview questions with answers. The interview questions for SAS beginners, intermediate and experienced candidates. The questions are filterable by different topics for easier navigation.

What does PROC GLM do?

The functions of PROC GLM are covariance analysis, variance analysis, multivariate, and repeated analysis of variance.

What is Base SAS?

Base SAS is a text-based, basic IDE with an older interface. Enterprise Guide (EG) is a more GUI-like IDE with wizards to assist with writing code for various processes.

Define the STD function.

With the help of the STD function, the standard deviation will be returned for the nonmissing statements.

What is BY-Group processing?

This type of term is used to make sure that the data which is process is grouped, indexed or even ordered based depending upon the variables.

If a variable contains letters or special characters, can it be numeric data type?

No, it must be character data type.

How would you identify a macro variable?

With the Ampersand (&) sign.

When will you use SELECT construct instead of IF statement?

When we have a long series of mutually exclusive conditions and the comparison is numeric, we use SELECT construct rather than IF-THEN or IF-THEN-ELSE statement because CPU time is reduced.

What are Global Statements in SAS?

Global statements are the one used anywhere in a SAS program and takes effect until it is changed, canceled or SAS session ends.

What are SYMGET and SYMPUT?

SYMPUT puts the value from a dataset into a macro variable where asSYMGET gets the value from the macro variable to the dataset.

What is SAS program?

A SAS program is a sequence of statements executed in order. Every SAS statement ends with a semicolonand the statement can be in upper-or lowercase letters.

Full form for SAS?

SAS means “Statistical Analysis Software”.

What are the 3 components in SAS programming?
  • Statements
  • Variables
  • Dataset
Difference between Informat and Format.

Informat - To tell SAS that a number should be read in a particular format. Format - To tell SAS how to print the variables.

Elucidate the FILECLOSE data set option.

When a data set is closed, its tape positioning is defined by FILECLOSE.

How to limit decimal places for variable using PROC MEANS?

By using MAXDEC= option.

What is Linear Regression?

Linear regression is a statistical technique where the score of a variable Y is predicted from the score of a second variable X. X is referred to as the predictor variable and Y as the criterion variable.

does SAS 'Translate'(compile) or 'interpret'?

Compile

How Data Step Merge and PROC SQL handle many-to-many relationship?

Data Step MERGE does not create a cartesian product incase of a many-to-many relationship. Whereas, Proc SQL produces a cartesian product.

Can you explain the process of CALENDAR?

The prime aim of CALENDAR is to make the data of the calendar on monthly basis be visible in the format of the SAS data set.

How do you use the do loop if you don’t know how many times you should execute the do loop?

We can use ‘do until’ or ‘do while’ to specify the condition.

How many data types are there in SAS?

There are two data types in SAS. Character and Numeric. Apart from this, dates are also considered as characters although there are implicit functions to work upon dates.

What is the one statement to set the criteria of data that can be coded in any step?

WHERE statement can sets the criteria for any data set in a data step or a proc step.

What is the difference between PROC MEANS and PROC Summary?

The difference between the two procedures is that PROC MEANS produces a report by default. By contrast, to produce a report in PROC SUMMARY, you must include a PRINT option in the PROC SUMMARY statement.

Explain the use of PROC GPLOT.

PROC GPLOT identifies the data set that contains the plot variables. It has more options and, therefore, can create more colorful and fancier graphics.

What is the use of $BASE64X?

By using $BASE64X encoding, the character data is converted into ASCII text.

What is PDV?

Program Data Vector (PDV) is the area of memory where data sets are created through the SAS system, one at a time. When a program is executed, an Input Buffer is created that reads data values and makes them assigned to their respective variables.

How to redirect SAS user folder?

Use LIBNAME statement to redirect SAS user folder.

What are _N_ and _ERROR_ in SAS?
  • N is a data counter variables used to indicate the number of times that SAS has looped through the data step.
  • ERROR is a implicit variable created by SAS during data processing. It gives the total number of records SAS has iterated in a dataset.
What is the difference between %LOCAL and %GLOBAL?

% Local is a macro variable defined inside a macro. %Global is a macro variable defined in open code (outside the macro or can use anywhere).

What do you understand by CALL MISSING Routine?

The character or numeric variables that are specified can be assigned missing values through the CALL MISSING routine.

do you know what CALL PRXCHANGE routine is?

CALL PRXCHANGE routine helps to perform the pattern matching replacement.

What is the purpose of trailing @ and @@?
  • The single trailing @ tells the SAS system to “hold the line”.
  • The double trailing @@ tells the SAS system to “hold the line more strongly”.
How to count unique values by a grouping variable?

You can use PROC SQL with COUNT(DISTINCT variable_name) to determine the number of unique values for a column.

What is the difference between do while and do until?

An important difference between the DO UNTIL and DO WHILE statements is that the DO WHILE expression is evaluated at the top of the DO loop. If the expression is false the first time it is evaluated, then the DO loop never executes. Whereas DO UNTIL executes at least once.

When grouping is in effect, can the WHERE clause be used in PROC SQL to subset data?

No. In order to subset data when grouping is in effect, the HAVING clause must be used. The variable specified in having clause must contain summary statistics.

What do you mean by the ALTER= Data Set option?

It is used for assigning an ALTER password, which will stop users from changing the file.

What is the work of tranwrd function?

TRANWRD function replaces or removes all occurrences of a pattern of characters within a character string.

How to specify variables to be processed by the FREQ procedure?

By using TABLES Statement.

What are the statements in PROC SQL?

Select, From, Where, Group By, Having, Order.

Name statements that function at both compile and execution time.

Options, title, footnote

What is APPEND procedure?

It is all about adding at the end so that in case of SAS, there can be one more SAS data which you can add and further more other data set can automatically be added.

What does ODS stand for?

ODS stands for the Output Delivery System.

Given an unsorted data set, how to read the last observation to a new data set?

We can read the last observation to a new data set using end= data set option.

Give some examples where PROC REPORT’s defaults are same as PROC PRINT’s defaults?
  • Variables/Columns in position order.
  • Rows ordered as they appear in data set.
What is the difference between using drop = data set option in data statement and set statement?

If you don’t want to process certain variables and you do not want them to appear in the new data set, then specify drop = data set option in the set statement.

Whereas If want to process certain variables and do not want them to appear in the new data set, then specify drop = data set option in the data statement.

How to sort in descending order?

Use DESCENDING keyword in PROC SORT code.

Give some examples where PROC REPORT’s defaults are different than PROC PRINT’s defaults?
  • No Record Numbers in Proc Report.
  • Labels (not var names) used as headers in Proc Report.
  • REPORT needs NOWINDOWS option.
How would you define the end of a macro?

The end of the macro is defined by %Mend Statement

What is PROC UNIVARIATE?

The purpose of using such type of detail is for analysis the elementary at numeric level. It will help you examine how well is the data actually distributed

What is the use of the DIVIDE function?

The DIVIDE function is used to return the division result.

Difference between Missover and Truncover.

Missover -When the MISSOVER option is used on the INFILE statement, the INPUT statement does not jump to the next line when reading a short line. Instead, MISSOVER sets variables to missing. Truncover - It assigns the raw data value to the variable even if the value is shorter than the length that is expected by the INPUT statement.

Briefly explain Input and Put function.

Input function – Character to numeric conversion- Input(source,informat).

put function – Numeric to character conversion- put(source,format).

do you know the functions that are used for Character handling functions?

There are basically two functions which are used for Character handling functions namely UPCASE and LOWCASE.

What is the purpose of _error_?

It has only 2 values, 1 for error and 0 for no error.

What are the default statistics for means procedure?

n-count, mean, standard deviation, minimum, and maximum

What is DATA _NULL_?

The DATA NULL is mainly used to create macro variables. It can also be used to write output without creating a dataset.The idea of ""null"" here is that we have a data step that actually doesn't create a data set.

Explain BOR function?

It is a bitwise logical operation and is used for returning bitwise logical OR between two statements.

Explain the COMPRESS= Data set option.

It is used for compressing the data into new output.

What is RUN-Group processing?

It is used for submitting the step of a PROC which is used more specifically in RUN statement. It ends without any kind of process.

What are the special Input Delimiters?

Input delimiters are DLM and DSD.

Which command is used to save logs in the external file?

PROC PRINTTO command is used to save logs in the external file.

How to limit decimal places for variable using PROC MEANS?

By using MAXDEC= option

Is using ‘group’ the only way to define variables in a ‘PROC report’?

Using the ‘group’ definition isn’t the only way to define the variables. There are quite a few definitions that you can use (i.e. analysis).

What are the features of SAS?
  • Business Solutions: SAS provides business analysis that can be used as business products for various companies to use.
  • Analytics: SAS is the market leader in the analytics of various business products and services.
  • Data Access & Management: SAS can also be use as a DBMS software.
  • Reporting & Graphics: Hello SAS helps to visualize the analysis in the form of summary, lists and graphic reports.
  • Visualization: We can visualize the reports in the form of graphs ranging from simple scatter plots and bar charts to complex multi-page classification panels.
Can you explain about CALL PRXFREE Routine?

It focuses on Character String Matching to allocate the free memory.

Describe CROSSLIST option in TABLES statement

Adding the CROSSLIST option to TABLES statement displays crosstabulation tables in ODS column format.

Difference between SET and MERGE.

SET concatenates the data sets where as MERGE matches the observations of the data sets.

Can PROC MEANS analyze ONLY the character variables?

No, Proc Means requires at least one numeric variable.

Difference between NODUP and NODUPKEY Options?

The NODUPKEY option removes duplicate observations where value of a variable listed in BY statement is repeated while NODUP option removes duplicate observations where values in all the variables are repeated (identical observations).

What is the difference between Order and Group variable in proc report?
  • If the variable is used as group variable, rows that have the same values are collapsed.
  • Group variables produce list report whereas order variable produces summary report.
How would you include common or reuse code to be processed along with your statements?
  • Using SAS Macros.
  • Using a %include statement
What is the maximum length of the macro variable?

32 characters long.

Which are the statements whose placement in the DATA step is critical?

DATA, INPUT, RUN, CARDS ,INFILE,WHERE,LABEL,SELECT,INFORMAT,FORMAT

What is the difference between reading data from an external file and reading data from an existing data set?

The main difference is that while reading an existing data set with the SET statement, SAS retains the values of the variables from one observation to the next. Whereas when reading the data from an external file, only the observations are read. The variables will have to re-declared if they need to be used.

Describe the VFORMATX function.

The VFORMATX function is used to return the format that is assigned with the value of a given statement.

What is SAS?
  • SAS is a software suite for advanced analytics, multivariate analyses, business intelligence, data management and predictive analytics
  • It is developed by SAS Institute.
  • SAS provides a graphical point-and-click user interface for non-technical users and more advanced options through the SAS language.
For what purpose would you use the RETAIN statement?

A RETAIN statement tells SAS not to set variables to missing when going from the current iteration of the DATA step to the next. Instead, SAS retains the values.

What are the default statistics that PROC MEANS produce?

PROC MEANS produce the “default” statistics of N, MIN, MAX, MEAN and STD DEV.

What does the function CATX syntax do?

CATX syntax inserts delimiters, removes trailing and leading blanks, and returns a concatenated character string.

What is scan function in sas and how it is used?

The scan function searches for a particular string and puts the value in the target variable, the target variable length using the scan function is 200 chars.

How to debug SAS Macros?

There are some system options that can be used to debug SAS Macros:MPRINT, MLOGIC, SYMBOLGEN.

What is ANYDIGIT function?

The focus of such function is to search the character string and return it soon after it is found.

What is the function of Stop statement in a SAS Program?

Stop statement causes SAS to stop processing the current data step immediately and resume processing statement after the end of current data step.

What are the differences between sum function and using “+” operator?

SUM function returns the sum of non-missing arguments whereas “+” operator returns a missing value if any of the arguments are missing.

What is PROC SORT?

It is used for sorting the SAS data for which variable are set. This way, it becomes possible to set a new data for further usage.

How to create list output for crosstabulations in proc freq?

To generate list output for crosstabulations, add a slash (/) and the LIST option to the TABLES statement in your PROC FREQ step.

TABLES variable-1variable-2 < … variable-n> / LIST;

What is Debugging?

Debugging is a technique for testing the program logic, and this can be done with the help of Debugger.

What is the function of output statement in a SAS Program?

You can use the OUTPUT statement to save summary statistics in a SAS data set. This information can then be used to create customized reports or to save historical information about a process.

You can use options in the OUTPUT statement to

  • Specify the statistics to save in the output data set,
  • Specify the name of the output data set, and
  • Compute and save percentiles not automatically computed by the CAPABILITY procedure.
Where do you use PROC MEANS over PROC FREQ?

We will use PROC MEANS for numeric variables whereas we use PROC FREQ for categorical variables.

Define in detail about the TRANSLATE function?

Under this function there are few characters which are specified in a string. They are then replaced with the other characters which are usually specified.

What does the trace option do?

ODS Trace is used to find the names of the particular output objects when several of them are created by some procedure.ODS TRACE ON;ODS TRACE Off;

Difference between SCAN and SUBSTR.

SCAN extracts words within a value that is marked by delimiters. SUBSTR extracts a portion of the value by stating the specific location. It is best used when we know the exact position of the sub string to extract from a character value.

What is the length assigned to the target variable by the scan function?

200

What is interleaving in SAS?

Interleaving combines individual, sorted SAS data sets into one sorted SAS data set.

Differentiate ‘CEIL’ and ‘FLOOR’.

The CEIL function, when issued, retrieves the smallest integer, while FLOOR does the opposite and retrieves the biggest one.

What are the statements that are executed only?

INFILE, INPUT, Output, Call routines

Name statements that are recognized at compile time only.

drop, keep, rename, label, format, informat, attrib, where, by, retain, length, array.

Name few SAS functions?

Scan, Substr, trim, Catx, Index, tranwrd, find, Sum.

What are _numeric_ and _character_ and what do they do?
  1. NUMERIC specifies all numeric variables that are already defined in the current DATA step.
  2. CHARACTER specifies all character variables that are currently defined in the current DATA step.
What is BMDP procedure?

This type of process is more basically used for analysis the data and ensure that whatever is received is accurate and comes without any kind of single error.

What is the difference between SAS functions and procedures?

Functions expect argument values to be supplied across an observation in a SAS data set whereas a procedure expects one variable value per observation.