Q1 Scaling is not a trivial process if your data size is not very large, your best bet is to run PredictionIO on a ____

A1 Single Machine

Q2 To make a prediction ___ collects mainly three types of data: User Data, Item Data, Behavioral Data

A2 PredicitonIO

Q3 Estimates similarity between items based on previous behaviors of users of these items

A3 Mahout’s Item Similarity Collaborative Filtering




Q1 yarn node -list

A1 lists all nodes running

Q2 yarn logs

A2 dumps the container logs

Q3 yarn classpath

A3 Prints the classpath needed to get the Hadoop jar and the required libraries

Q4 yarn version

A4 prints the version of yarn

Q5 yarn resourcemanager

A5 starts the resourcemanager

Q6 yarn nodemanager

A6 starts the nodemanager

Q7 yarn rmadmin

A7 runs the resource manager admin client

Q8 ____ is the central authority that manages resources and schedules applications running atop of YARN

A8 ResourceManager


Linear Algebra

Q1 In a normed vector space a vector whose length is 1 (the unit length)

A1 unit vector

Q2 A unit vector is often denoted by a ___

A2 lowercase letter with a hat î (“i-hat”)

Q3 In Euclidean space, the dot product of two unit vectors is simply the ___ of the angle between them

A3 cosine

Q4  In Euclidean space, the ____ of two unit vectors is simply the cosine of the angle between them

A4 dot product

Q5 the term ___ is sometimes used as a synonym for unit vector

A5 normalized vector

Q6 The elements of a ___ are usually chosen to be unit vector

A6 basis

Q7 In the three dimensional Cartesian coordinate system, the unit vectors codirectional with the x, y, and z axes are sometimes referred to as ___ of the coordinate system

A7 versors




Q1 Full text search can be viewed as doing _____ of the term-document matrix by the query vector (giving a vector over documents where the components are the relevance score)

A1 matrix multiplication

Q2 computing co-occurrences is a collaborative filtering context (people who viewed X also viewed Y, or ratings based collaborative filtering) is taking the squaring of the ___

A2 user-item interaction matrix

Q3 calculating users who are k-degrees separated from each other in a social network or web-graph can be found by looking at the k-fold product of the ___

A3 graph adjacency matrix

Q4 Sparsity is often problematic because any given two rows (or columns) of the matrix may have ___

A4 zero overlap

Q5 One of the more useful approaches to dealing with huge ___ data sets is dimensionality reduction

A5 sparse

Q6 In a reduced dimensional space, “important” components to distance between points are ___

A6 exaggerated

Q7 In a reduced dimensional space, sparsity of rows is traded for drastically reduced dimensional but dense ___

A7 signatures

Q8 Sparse matrices which don’t fit in ____ need special treatment as far as decomposition is concerned. Parallelizable and/or stream-oriented algorithms are needed.



A9 Singular Value Decomposition

Q10 The ___ algorithm is designed for eigen-decomposition but like any such algorithm getting singular values out of it is immediate

A10 Lanczos

Q11 The Lancos algorithm is designed for ____ but like any such algorithm getting singular values out of it is immediate

A11 eigen decomposition

Q12 The Lancos algorithm is designed for eigen-decomposition but like any such algorithm getting  ___ out of it is immediate


Q13 The Lancos algorithm is designed for eigen-decomposition but like any such algorithm getting  singular values  out of it is ____

A13  immediate

Q14 <MAHOUT_HOME>/bin/mahout svd  (What’s this do?)

A14 Invokes the DirstributedLanzcosSolver

Q15 ConvNetJS implements DeepLearning models and learning algorithms as well as nice browser based demos, all in ___

A15 Javascript

Q16 ____ learning model a decision or problem with instances or examples of training data deemed important or required to the model

A16 instance based

Q17 k-NearestNeighbor (kNN), Learning Vector Quantization (LVQ), Self-Organizing Map (SOM) are examples of ___ learning models

A17 instance based

Q18 An extension made to another method (typically regression methods) tat penalizes models based on their complexity, favoring simpler models that are also better at generalizing.

A18 Regularization Methods

Q19 Ridge Regression, Least Absolute Shrinkage and Selection Operator (LASSO), Elastic Net are examples of ?

A19 Regularization methods

Q20 Iterative Dichotomiser3 (ID3), C4.5, Chi-squared Automatic Interaction Detection (CHAID), Multivariate Adaptive Regression Splines (MARS), Gradient Boosting Machines (GBM) – These are examples of what type of learning model?

A20 decision tree

Q21 Bayesian methods are those that explicitly apply ___ for problems such as classification and regression

A21 Bayes’ Theorem

Q22 Kernel Methods are best known for the popular method ___

A22 Support Vector Machines

Q23  ___ are best known for the popular method Support Vector Machines

A23 Kernel Methods

Q24 concerned with mapping data into a higher dimensional vector space where some classification or regression problems are easier to model

A24 kernel methods

Q25 kernel methods concerned with mapping data into a ___ vector space where some classification or regression problems are easier to model

A25 higher dimensional

Q26 kernel methods concerned with mapping data into a higher dimensional vector space where some classification or regression problems are ___

A26 easier to model

Q27 k-means and Expectation Maximisation (EM) are what method of learning?

A27 clustering

Q28 Apriori algorithm and Eclat algorithm are what method of learning?

A28 Association rule of learning

Q29 Principal Component Analysis, Partial Least Squares Regression, Sammon Mapping, Multi-dimensional Scaling, projection pursuit are examples of ?

A29 Dimensionality Reduction

Linear Algebra

Q1 If we accept the ___ every vector space has a basis

A1 axiom of choice

Q2 If we accept the axiom of choice, every vector space has a ___

A2 basis

Q3 If v is finite-dimensional U is a subspace of V, the dim U ? dim V

A3 <= (less than or equal to)

Q4 A fundamental theorem of linear algebra states that all vector spaces of the same dimension are ____

A4 isomorphic

Q5 The condition that v1,v2…,vn span V guarantees that each vector v can be assigned ___ whereas the linear independence v1, v2, …., vn assures that these are ____

A5 coordinates, unique

Q6 once a basis of a vector space V over F has been chosen, V may be identified with the coordinate n-space ____

A6 Fn



Q1 a[href$ = ".jpg"]{color:red;} – What’s this do?

A1 Makes all links to JPEGs display in red

Q2 In Javascript, ___ can be passed around like any other value

A2 functions

Q3 Javascript _____ case sensitive

A3 is

Q4 In Javascript, the semicolon is ___

A4 optional

Q5 When must you use semicolons in Javascript

A5 When you write multiple statements on one line

Q6 JS : local variables must be declared with the “var keyword” otherwise they will automatically become ____

A6 global variables

Q7 JS: ____ mode does not allow undeclared variables

A7 strict

Q8 JS is ____ typed

A8 loosely

Q9 A JS variable ____ can contain different data types and change its data type

A9 can

Q10 JS : The == comparison operator always converts to ____ before comparison

A10 matching types

Q11 JS: The === forces comparison of values and ____

A11 types

Q12 JS: ___ in names can be confused with subtraction (never use them in names)

A12 hyphens

Q13 Javascript functions are defined with the ____ keyword

A13 function

Q14 JS function declarations _____ usually ended with a semicolon

A14 Are not (because it’s not an executable statement)

Q15 A JS function expression ____ be stored in a variable

A15 can

Q16 function expressions will execute automatically if the expression is followed by ____

A16 ()

Q17 JavaScript functions have both ____ and ____

A17 properties and methods

Q18 The arguments.length property returns ____

A18 the number of arguments received when the function was invoked

Q19 The toString() method returns the function ____

A19 as a string



Q1 An ____ is a reference type, similar to a class, that can contain only constants, method signatures, default methods, static methods, and nested types.

A1 nested types

Q2 ___ exist only for default methods and static methods

A2 method bodies

Q3 Interfaces cannot be instantiated – they can only be implemented or ____ by other interfaces

A3 extended

Q4 To use an interface, you write a ____ that implements the interface

A4 class

Q5 When an instantiatable class implements an interface, it provides a ___ for each of the methods declared in the interface

A5 method body



Q1 What’s Oozie?

A1 Tool for managing Hadoop workflow


A2 A realtime loader for streaming your data into Hadoop

Q3 Whirr

A3 Cloud provisioning for Hadoop

Q4 What’s Fuse?

A4 Makes the HDFS system look like a regular file system

Q5 T/F you can use Pig to store data to a DB other than the one it pulled the data from?

A5 T

Q6 Who offers pre-written Pig UDFs?

A6 DataFu

Q7 T/F you can run shell commands from Pig

A7 True

Q8 You can JOIN more than two tables at a time with Pig?

A8 True


AWS Cloudformation

Q1 Options for launching applications via CloudFormation  include: Pre-install your application to a custlaunching applications via cloudformation om ____

A1 AMI (amazon machine image)

Q2 Options for launching applications via CloudFormation include: Customize an EC2 instance at launch time using ____

A2 Cloud-Init

Q3 Options for launching applications via CloudFormation include: Use other tools such as Jenkins, Chef, or Puppet, optionally triggering them from inside a ___

A3 Cloudformation Template

Q4 You can use ____ to access the application bootstrapping logs

A4 Amazon Cloudwatch

Q5 Cloud-init.log, cfn-init.log, cfn-hup.log, cfn-wire.log – What are these files?

A5 application bootstrapping logs

Q6 If you are creating the log group inside the stack you are debugging then remember to disable the ____ while kicking off stack creation. This way, the logs are retained even when stack creation fails. Alternatively, you can create a LogGroup separately and optionally use it for ___

A6 automatic rollback, multiple stacks

Q7 cfn-hup

A7 The CloudFormation daemon to automatically apply subsequent configuration updates

Q8 Puppet and Chef are ?

A8 Configuration Management Tools

Q9 When your web browser or your mobile device makes a TCP connection to an Elastic Load Balancer, the connection is used for the request and the response and then stays open for a short amount of time for possible reuse. This time period is known as the ___  for the Load Balancer and is set to 60 seconds.

A9 Idle Timeout

Q10 An Elastic Load Balancing connection to an EC2 instance also has an ____ of 60 seconds

A10 idle timeout

Q11 $elb-modify-lb-attributes myTestELB –connection -settings “ideltimeout=120″ –header

What’s that do?

A11 Sets Elastic Load Balancer idle timeouts to 120 seconds

Q12 Distributed Random Forest Powered by Velocity software

What’s this?

A12 An AMI


Linear Algebra

Q1 An identity element leaves other elements _______ when combined with them

A1 unchanged

Q2 the term ______ is often shortened to ‘identity’

A2 identity element

Q3 let (S,*) be a set S with a binary operation * on it. Then an element e of S is called a left identity if _______ for all a in S

A3 e*a=a

Q4 let (S,*) be a set S with a binary operation * on it. Then an element e of S is called a right identity if _______ for all a in S

A4 a*e =a

Q5 let (S,*) be a set S with a binary operation if e*a=a for all a in S and a*e=a for all a in S it is called a two sided identity or simply an  ____

A5 identity

Q6 An identity with respect to addition is called an additive identity (often denoted as _____)

A6 0

Q7 An identity with respect to multiplication is called a multiplication identity (often denoted as ____)

A7 1

Q8 There can be several right identities and several left identities but if there is both a left and right then ____

A8 They are equal and there is just a single two sided identity.

Q9 Let (S, *) be a set S with a binary operation * Is it possible for (S,*) to have no identity element .

A9 yes

Q10 Logistic Regression can also handle ____ classification such as 0,1,2,3,4

A10 multiclass

Q11 PyMVPA stands for?

A11 Multivariate Pattern Analysis in Python


A12 Support Vector Machine

Q13 The ___ of a linear operator L is the set of all operands v for which L(v) = 0

A13 kernel

Q14 if L:V ->W, ker(L) = {v V:L(v) = 0 } where 0 denotes the null vector in WQ – How do you say this?

A14 The kernel of a linear operator L is the set of all operands v for which L(v)=0 

Q15 the kernel of a linear operator Rm->Rn is the same as the ______ of the correspoding n x m matrix

A15 nullspace

Q16 A collection of linear equations involving the same set of variables.

A16 System of linear equations (or linear system)

Q17 A ____ to a linear system is an assignment of numbers to the variables such that all the equations are simultaneously satisfied.

A17 solution

Q18 How do you solve a linear system of two equations

A18 Solve the first equation for x, plug the resulting expression in for x in the second equation -> solve for y -> sub y back in, solve for x

Q19 Term for the number of vectors in a basis.

A19 Its dimension

Q20 The set of all possible solutions for a linear system is called the ______

A20 solution set

Q21 for a linear system involving two variables, each linear equation determines a ______ on the xy-plane

A21 line

Q22 For a linear system involving three variables, each linear equation determines a _____ in 3D space

A22 plane

Q23 For a linear system involving n variables, each linear equation determines a ______ in n-dimensional space

A23 hyperplane

Q24 For a linear system involving n variables, each linear equation determines a hyperplane in _____ space

A24 n-dimensional

Q25 usually, a system with fewer equations than unknowns has _____ solutions

A25 infinitely many

Q26 Usually a system with the same number of equations and unknowns has _____

A26 a single unique solution

Q27 Usually a system with more equations than unknowns has ____

A27 no solution

Q28 The equations of a linear system are ____ if none of the equations can be derived algebraically from the others

A28 independent

Q29 3x + 2y = 6 and 6x + 4y =12 are not _____? They are the same equations when scaled by a factor of two. They would produce identical graphs.

A29 independent





are not ____ because the third equation is the sum of the other two

A30 independent

Q31 A linear system is ___ if it has a solution and ___ otherwise

A31 consistent, inconsistent

Q32 It is possible for three linear equations to be ____, even though any two of them are consistent together.

A32 inconsistent

Q33 Two linear systems using the same set of variables are ___ if each of the equations in the second system can be derived algebraically from the equations in the first system and vice-versa

A33 equivalent

Q34 f: X->Y – how this read?

A34 f is a function from domain X to codomain Y

Q35 The ___ of a function is the set of all outputs of the function

A35 image

Q36 The image of a function is always a subset of the ___

A36 codomain

Q37 the ____ or target set of a function is the set Y into which all of the output of the function is constrained to fall. It is the set Y in the notation f:X->Y

A37 codomain

Q38 In Category theory, one deals with ____ instead of functions

A38 morphisms

Q39 ____ are arrows from one object to another

A39 morphisms

Q40 The ____ in any morphism is the object from which an arrow starts

A40 domain

Q41 A system of linear equations is considered _____ if there are more equations than unknowns.

A41 overdetermined

Q42 A mapping is invertible if and only if the ____ is non-zero

A42 determinant

Q43 Linear Algebra provides the formal setting for the linear combination of equations used in the ____ ?

A43 Gaussian method

Q44 The ____ method is used to determine the best fit line for a set of data

A44 least squares

Q45 The least squares method will minimize the sum of the squares of the ___

A45 residuals

Q46 An algebraic operation that takes two equal-length sequences of numbers and returns a single number

A46 dot product

Q47 Two non-zero vectors a and b are ___ if and only if a x b = 0

A47 orthogonal

Q48 A function that assigns a strictly positive length or size to each vector in a vector space other than the zero vector

A48 norm

Q49 A vector space on which a norm is defined is called a ____

A49 normed vector space

Q50 The absolute value is a ____ on the real numbers

A50 norm

Q51 Combined with ___ linear algebra facilitates the solution of linear systems of differential equations

A51 calculus

Q52 The study of linear algebra first emerged from the study of _____

A52 determinants

Q53 By ___ a theory of linear transformations of finite-dimensional vector spaces had emerged

A53 1900

Q54 The ____ of a matrix A is denoted det(A), det A, or |A|


A54 determinant

Q55 The ___ of a matrix is zero if and only if the column vectors (or the row vectors) of the matrix are linearly dependent.

A55 determinant



What are we searching for?