Index

General

  1. What is the difference between seq(4) and seq_along(4)?
  2. What is the power analysis?
  3. What happens when the application object does not handle an event?
  4. Explain the purpose of using UIWindow object?
  5. What is GGobi?
  6. What is the use of lattice package?
  7. Which data structures are used to perform statistical analysis and create graphs.
  8. What is the use of diagnostic plots?
  9. Define relaimpo package.
  10. Define robust package.
  11. Define Survival analysis.
  12. What is the use of MASS package?
  13. What is the use of forecast package?
  14. What is the full form of CFA?
  15. What is clustering? What is the difference between kmeans clustering and hierarchical clustering?
  16. What is kmeans clustering?
  17. What is hierarchical clustering?
  18. What is a factor?
  19. How can you load a .csv file in R?
  20. What are the different components of grammar of graphics?
  21. What is Rmarkdown? What is the use of it?
  22. Name some packages in R, which can be used for data imputation?
  23. Name some functions available in “dplyr” package.
  24. What packages are used for data mining in R?
  25. What is t-tests() in R?
  26. What is the use of subset() and sample() function in R?
  27. What is the memory limit of R?
  28. How impossible values are represented in R?
  29. Which function is used for sorting in R?


The Questions
General
1. What is the difference between seq(4) and seq_along(4)? ↑

Seq(4) means vector from 1 to 4 (c(1,2,3,4)) whereas seq_along(4) means a vector of the length(4) or 1(c(1)).

2. What is the power analysis? ↑

It is used for experimental design .It is used to determine the effect of given sample size.

3. What happens when the application object does not handle an event? ↑

The event will be dispatched to your delegate for processing.

4. Explain the purpose of using UIWindow object? ↑

UIWindow object coordinates the one or more views presenting on the screen.

5. What is GGobi? ↑

GGobi is an open source program for visualization for exploring high dimensional typed data.

6. What is the use of lattice package? ↑

lattice package is to improve on base R graphics by giving better defaults and it have the ability to easily display multivariate relationships.

7. Which data structures are used to perform statistical analysis and create graphs. ↑

Data structures are vectors, arrays, data frames and matrices.

8. What is the use of diagnostic plots? ↑

It is used to check the normality, heteroscedasticity and influential observations.

9. Define relaimpo package. ↑

It is used to measure the relative importance of each of the predictor in the model.

10. Define robust package. ↑

It provides a library of robust methods including regression.

11. Define Survival analysis. ↑

It includes number of techniques which is used for modeling the time to an event.

12. What is the use of MASS package? ↑

MASS functions include those functions which performs linear and quadratic discriminant function analysis.

13. What is the use of forecast package? ↑

It provides the functions which are used for automatic selection of ARIMA and exponential models.

14. What is the full form of CFA? ↑

CFA stands for Confirmatory Factor Analysis.

15. What is clustering? What is the difference between kmeans clustering and hierarchical clustering? ↑

Cluster is a group of objects that belongs to the same class. Clustering is the process of making a group of abstract objects into classes of similar objects.

Let us see why clustering is required in data analysis:

  • Scalability − We need highly scalable clustering algorithms to deal with large databases.
  • Ability to deal with different kinds of attributes − Algorithms should be capable of being applied to any kind of data such as interval-based (numerical) data, categorical, and binary data.
  • Discovery of clusters with attribute shape − The clustering algorithm should be capable of detecting clusters of arbitrary shape. They should not be bounded to only distance measures that tend to find spherical cluster of small sizes.
  • High dimensionality − The clustering algorithm should not only be able to handle low-dimensional data but also the high dimensional space.
  • Ability to deal with noisy data − Databases contain noisy, missing or erroneous data. Some algorithms are sensitive to such data and may lead to poor quality clusters.
  • Interpretability − The clustering results should be interpret-able, comprehensible, and usable.
16. What is kmeans clustering? ↑

K-means clustering is a well known partitioning method. In this method objects are classified as belonging to one of K-groups. The results of partitioning method are a set of K clusters, each object of data set belonging to one cluster. In each cluster there may be a centroid or a cluster representative. In the case where we consider real-valued data, the arithmetic mean of the attribute vectors for all objects within a cluster provides an appropriate representative; alternative types of centroid may be required in other cases.

Example: A cluster of documents can be represented by a list of those keywords that occur in some minimum number of documents within a cluster. If the number of the clusters is large, the centroids can be further clustered to produce hierarchy within a dataset. K-means is a data mining algorithm which performs clustering of the data samples. In order to cluster the database, K-means algorithm uses an iterative approach.

17. What is hierarchical clustering? ↑

This method creates a hierarchical decomposition of the given set of data objects. We can classify hierarchical methods on the basis of how the hierarchical decomposition is formed. There are two approaches here:

Agglomerative Approach:

This approach is also known as the bottom-up approach. In this, we start with each object forming a separate group. It keeps on merging the objects or groups that are close to one another. It keeps on doing so until all of the groups are merged into one or until the termination condition holds.

Divisive Approach:

This approach is also known as the top-down approach. In this, we start with all of the objects in the same cluster. In the continuous iteration, a cluster is split up into smaller clusters. It is down until each object in one cluster or the termination condition holds. This method is rigid, i.e., once a merging or splitting is done, it can never be undone.

18. What is a factor? ↑

Conceptually, factors are variables in R which take on a limited number of different values; such variables are often referred to as categorical variables. One of the most important use of factors is in statistical modeling; since categorical variables enter into statistical models differently than continuous variables, storing data as factors ensures that the modeling functions will treat such data correctly.

19. How can you load a .csv file in R? ↑

All you need to do is use the “read.csv()” function and specify the path of the file.

house<-read.csv("C:/Users/John/Desktop/house.csv")

20. What are the different components of grammar of graphics? ↑

Broadly speaking these are different components in grammar of graphics:

  • Data layer
  • Aesthetics layer
  • Geometry layer
  • Facet layer
  • Co-ordinate layer
  • Themes layer
21. What is Rmarkdown? What is the use of it? ↑

RMarkdown is a reporting tool provided by R. With the help of Rmarkdown, you can create high quality reports of your R code.

The output format of Rmarkdown can be:

  • HTML
  • PDF
  • WORD
22. Name some packages in R, which can be used for data imputation? ↑

These are some packages in R which can used for data imputation

  • MICE
  • Amelia
  • missForest
  • Hmisc
  • Mi
  • imputeR
23. Name some functions available in “dplyr” package. ↑

Functions in dplyr package:

  • filter
  • select
  • mutate
  • arrange
  • count
24. What packages are used for data mining in R? ↑

Some packages used for data mining in R:

  • data.table- provides fast reading of large files.
  • rpart and caret- for machine learning models.
  • Arules- for associaltion rule learning.
  • GGplot- provides varios data visualization plots.
  • tm- to perform text mining.
  • Forecast- provides functions for time series analysis.
25. What is t-tests() in R? ↑

It is used to determine that the means of two groups are equal or not by using t.test() function.

26. What is the use of subset() and sample() function in R? ↑

Subset() is used to select the variables and observations and sample() function is used to generate a random sample of the size n from a dataset.

27. What is the memory limit of R? ↑

In 32 bit system memory limit is 3Gb but most versions limited to 2Gb and in 64 bit system memory limit is 8Tb.

28. How impossible values are represented in R? ↑

In R NaN is used to represent impossible values.

29. Which function is used for sorting in R? ↑

order() function is used to perform the sorting.

Want to test this skill? Check out Adaface assessments

R Online Test

View Test