View the Project on GitHub

Last modified on 27 Nov 2018

Comp 558

Course Website

Lecture 1 - 2018/09/04

  • Applications of Computer Vision
    • Image editing
    • Object scanning for 3D printing
    • Handwriting recognition
    • Face detection and recognition
    • Image search & retrieval
    • People detection and tracking
    • Navigation systems
    • Medical imaging
    • Biological imaging

Lecture 2 - 2018/09/06

  • 720p = 1MP; 1080p = 2MP
  • #f0f magenta; #ff0 yellow; #0ff cyan
  • The human eye has varying sensitivity to RGB at varying wavelengths
    • We are more sensitive to green
  • Bayer pattern - checkerboard pattern of [G, R; B, G]
    • 50% G, 25% R, 25% B
  • IRGB(x) = CRGB(λ) S(x, λ) dλ
  • ExposureRGB(x) = IRGB(x) * t
  • Analog to digital conversion
    • gain control - adjust overall sensitivity of responses
    • white balance - adjust relative sensitivity of R, G, B

Lecture 3 - 2018/09/11

  • Image noise - random variation of brightness or color information; usually electronic noise
  • Image smoothing - averaging the intensities of RGBs to reduce noise
  • Local average (1D) - Ismooth(x) = ¼ I(x + 1) + ½ I(x) + ¼ I(x - 1)
  • Smoothing an edge will decrease the slope of steps, which blends the edge colors.
  • Gaussian function: G(x) = (1/ sqrt(2 π) σ) e^-(x^2/(2 σ^2))
    • Integrates to 1
    • Often called normal distribution
    • Can use it for smoothing weights
    • The bigger the σ, the more smoothing there is
  • Local difference: dI(x)/dx m ≈ ½ I(x + 1) - ½ I(x - 1)
  • Cross correlation: f(x) cross I(x) ≡ Σu f(u - x) I(u)
    • Sliding a template across an image, and taking the inner product
    • Shows how well a filter matches the image
    • For the local average (1D)
      • f(x) = ¼ when x = +1, -1
      • f(x) = ½ when x = 0
      • f(x) = 0 otherwise
  • I is the image, f is the filter or kernel
  • Convolution: f(x) * I(x) ≡ Σu f(x - u) I(u)
    • Summing weighted, shifted versions of the function f
    • Shows how much image intensity contributes to the filtered output
  • Impulse function: σ(x) = 1 when x = 0, 0 otherwise
    • σ(x) * I(x) = I(x)
    • Outputs an impulse response

Lecture 4 - 2018/09/13

  • Motivation for edge detection
    • Find boundaries of objects
    • Match features in images
  • Edge detection based on threshold
    • I(x + 1) - I(x - 1) > τ
      where τ is the threshold
    • Can also use partial derivatives to account for changes in both x and y
      • Prewitt used gradients [1, 0, -1; 1, 0, -1; 1, 0, -1] and [-1, -1, -1; 0, 0, 0; 1, 1, 1]
      • Sobel used gradients [1, 0, -1; 2, 0, -2; 1, 0, -1] and [-1, -2, -1; 0, 0, 0; 1, 2, 1]

Lecture 5 - 2018/09/18

  • Canny edge detection
    • Mark pixel as candidate edge if gradient magnitude is greater than some threshold τhigh
    • Non-maximum suppression - compare gradient magnitude with neighbours nearest to the direction of that gradient. Eliminate edge pixels if they are not a local maximum
    • Hysteresis thresholding

Lecture 6 - 2018/09/20

  • Inliers - points explained by line model
  • Outliers - points not explained by line model
  • Reduce sensitivity to outliers by reducing penalty for large distances
    • Eg distance rather than distance squared
    • Eg constant penalty after certain distance
  • Hough transform
    • Lines can be represented with some positive value r (from origin), at some angle θ between 0 and 2π. The line is perpendicular to the line at angle θ, crossing the line r away from origin.
    • For Hough transform, for every θ in [0, 2π), compute r for line in direction θ through point (xi, yi). Vote for this (r, θ), and return the (r, θ) with the maximum count
    • To detect peaks, we can smooth the voting map to remove noise
  • RANSAC - RANdom SAmple Consensus - reduces sensitivity
    • Pick two different points, and fit a line
      • Good model if many other points lie near the line
      • We can find consensus set by counting the number of remaining points that are within τ from the line
    • Do this many times and find the best line model (highest consensus size)
    • Take best model, find consensus, then refit using least square method
    • Can be scaled up to n points, where n is minimum # of points needed for exact model fit
    • Why minimum # of points?
      • Trade off is that more points lead to greater chance of including an outlier
  • Vanishing points

Lecture 7 - 2018/09/25

  • Recall
    • Convolution: f(x, y) * I(x, y) = Σu, v f(x - u, y - v) I(u, v)
    • Zero mean gaussian kernel: G(x, y, σ) = 1/(2πσ2) e-(x2 + y2)/(2σ2)
  • Gaussian scale space - family of images smoothed by Gaussians of increasing σ
  • Laplace’s equation (aka heat equation or isotropic diffusion): ∂I(x, y, t)/∂t = ΔI(x, y, t) = ∂2I(x, y, t)/∂x2 + ∂2I(x, y, t)/∂y2
    • Also equivalent to div(∇I(x, y, t))
  • Persona-Malik introduces coefficient c(x, y, t), which is a montonically decreasing function of the magnitude of the image gradient
    • ∂I(x, y, t)/∂t = div(c(x, y, t)∇I(x, y, t)) = c(x, y, t)ΔI(x, y, t) + ∇c(x, y, t)∇I(x, y, t)
  • It is more efficient and faster to compute with coarser/smaller versions of an input image

Lecture 8 - 2018/09/27

  • Gaussian pyramid - smoothing + subsampling
  • Think of ∇I as 2 x 1 matrix (or a column vector)
  • StructureTensor is a 2 x 2 matrix given by ∇I (∇I)T
  • The eigenvalues & eigenvectors of matrix reveal invariant geometric properties
    • Order the eigenvalues in descending order (λ1 ≥ λ2 > 0)
    • Then e1 is the direction aligned with the local gradient
    • Special case is when λ2 = 0, I is constant along e2
    • If both are around 0, then no dominant gradient direction locally
  • To find corners, find a way to capture locations where I(x0, y0) is different from that of its neighbours
    • Keep in mind
      • We always ‘smooth’ a little bit (convolution with a kernel) so that ∇I is acceptable
      • I(x0 + Δx, y0 + Δy)

From lecture notes (not in class)

  • Corner - region where two edges come together at sharp angle
  • Intensities at a point are locally distinctive if the following “error” is large, relative to distance (Δx, Δy) to local neighbour
    • Σ(x, y) ∈ N gd(x0, y0) (I(x0, y0) - I(x0 + Δx, y0 + Δy))2
  • Second moment matrix (M)
    • Σ(x, y) ∈ N gd(x0, y0) (∇I)(∇I)T = [Σ(∂I/∂x)2, Σ(∂I/∂x)(∂I/∂y) \\ Σ(∂I/∂x)(∂I/∂y), Σ(∂I/∂y)2]
  • Harris corners
    • Computing eigenvalues involves solving a quadratic equation, which is slow
    • We may note that a matrix with eigenvalues λ1 and λ2 has a determinant of λ1λ2 and a trace of λ1 + λ2
    • With a small constant k, λ1λ2 - k(λ1 + λ2)2:
      • Is negative if one eigenvalue is 0
      • Is small if both eigenvalues are small
    • Trace and determinant can be used to find edges and corners

Lecture 9 - 2018/10/02

  • Midterm material ends this week
  • Normalized correlation - dcorrel(H1, H2) = Σ(H’1(i) · H’2) / √(Σ(H’12(i) · H’22(i)))
    • 1 - perfect match
    • -1 - perfect mismatch
    • 0 - no correlation
  • Chi-square - dchi-square(H1, H2) = Σi (H1(i) - H2(i))2 / (H1(i) + H2(i))
    • 0 - perfect match
    • ∞ - mismatch
  • Bhattacharya
    • 0 - perfect match
    • 1 - total mismatch
  • Histogram equalization
  • Histogram of oriented gradients
    • Compute gradients
    • Divide into bins by orientation
    • Count number of entries in each bin
    • Dense computation with chosen cell size and block size
  • SIFT - scale invariant feature transform
    • Feature detector
    • Features are local and sparse; invariant to translation, rotation, scale
    • Matches feature vectors using correlation
    • Translation is easy to handle; most extraction methods are invariant to translation
    • Rotation is harder; needs canonical orientation
      • Uses orientation at peak of smoothed histogram
      • Sometimes there may be multiple peaks passed threshold; in that case, generate multiple keypoints for each orientation
    • Scaling is even harder; uses scale-space theory

Lecture 10 - 2018/10/04

See Stanford’s CS231 for more material

  • Convolutional neural networks (CNN)
    • Black box
      • Lots of nodes computing weighted summation of inputs
      • Weighted sums are sampled by activation function to create sparsity
        • Sigmoid - f(z) = (1 + e-z)-1
        • Hyperbolic tan - f(z) = tanh(z) = (ez - e-z)/(ez + e-z)
        • RELU - f(z) = max(0, z)
        • Softmax - given vector, transforms it so that each entry is in [0, 1] and all entries sum to 1
      • Initial layer convolutional, later ones fully connected
  • Propagation
    • Forward - writing down layer (l + 1)’s activations given layer l’s activations then iterating layer through layer
    • Backward - propagating gradients in reverse direction to minimize a loss function

Lecture 11 - 2018/10/09

Midterm next week until lecture 10

  • CNNs
    • Convolutional layers (MLPs)
    • Activation functions, tanh ,sigmoid, RELV
    • Pooling, avg, max, others
  • Backward Propagation
    • Gol is to estimate W and b to minimize sum of square error loss function
  • Review properties of 2nd moment matrix

Lecture 12 - 2018/10/11

  • RANSAC on midterm
  • Image registration
    • Given two similar images, for each (x0, y0), find (hx, hy) that minimizes the difference
  • Assume that intensity variation is linear
  • Windows need to be “appropriate”; smaller windows -> finer matches
  • Deformation matrix
    • [s, 0 \\ 0, s] - scales by s
    • [cosθ, -sinθ \\ sinθ, cosθ] - CCW rotation by θ

2018/10/16

Midterm

Lecture 13 - 2018/10/18

  • TODO

Lecture 14 - 2018/10/23

  • Beginning of 3D computer vision
  • Left handed frame (Z goes into canvas vs out of canvas)
  • To go from 3D to 2D, we can just drop the z coordinate (orthogonal view?)
  • Projection view - take depth into account by drawing images such that each ray goes to a single center of projection
  • Plane: aX + bY + cZ = d
  • f/Z - scale factor in perspective projection
  • Multiplying plane by f/Z, we can rewrite to ax + by + cf = fd/Z
    • When Z → ∞, ax = by + cf = 0, line at infinity
  • Ground plane
    • Y = -h
    • y = fY/Z = -fh/Z (when Z → ∞, y → 0)
    • aka ‘horizon’

Lecture 15 - 2018/10/25

  • TODO

Lecture 16 - 2018/10/30

  • Majority of notes done on paper. Below reflects some bigger points.
  • Rotation: [x(t) \ y(t) \ z(t)] = [cos(Ωt) & 0 & sin(Ωt) \ 0 & 1 & 0 \ -sin(Ωt) & 0 & cos(Ωt)] = [x0 \ y0 \ z0]
  • Rotation about y axis
    • x(t) = f(X(t))/Z(t)
    • y(t) = f(Y(t))/Z(t)
    • Derive both where t = 0
    • Vx = f(Ωx02 + Ωx02)/Z02
    • Vy = fΩx0y0/z02 = Ωxy/f
  • Rotation about x axis
    • By symmetry, exchange roles of x and y
    • Vx = Ωxy/f
    • Vy = fΩ(1 + (y/f)2)
  • Rotation about z axis
    • Vx = Ω - y
    • Vy = Ω x
  • Discrete rotation
    • [R] [x0 \ y0 \ z0] = [xr \ yr \ zr]
    • R satisfies:
      • det(R) = 1
      • RRT = RTR = I3x3
      • Ie orthogonal matrix, RT = R-1
  • Rotation matrix preserves angle between 2 vectors
  • Rotation matrix has 3 eigenvalues, 2 of which are complex, while the third is 1
    • RP = λP
    • If you choose the third eigenvalue and its associated eigenvector, then RP = P; P is the axis of rotation
  • To simplify cross product, let [a]xb = a x b = [0 & -a3 & a2 \ a3 & 0 & a1 \ a2 & -a1 & 0]
  • To map 3D point to 4D point, [x \ y \ z] → [x \ y \ z \ 1]
    • Translation [x + tx \ y + ty \ z + tz \ 1] = [1 & 0 & 0 & tx \ 0 & 1 & 0 & ty \ 0 & 0 & 1 & tz \ 0 & 0 & 0 & 1] [x \ y \ z \ 1]
    • Scaling [σxX \ σyY \ σzZ \ 1] = [σx & 0 & 0 & 0 \ 0 & σy & 0 & 0 \ 0 & 0 & σz & 0 \ 0 & 0 & 0 & 1][x \ y \ z \ 1]
    • Rotation appears in the top left 3x3, with all other elements as 0, and cell 4, 4 = 1.

Lecture 17 - 2018/11/01

  • TODO

Lecture 18 - 2018/11/06

  • TODO

Lecture 19 - 2018/11/08

  • TODO

Lecture 20 - 2018/11/13

  • Homograph - invertible 3 x 3 matrix that allows you to map pixels in on camera’s image to pixels in another camera’s image
    • Case 1 - look at 3D scene plane
      • Projection of location on 3D plane
    • Case 2 - multiple cameras
    • Case 3 - scene no longer planer, but camera undergoes pure rotation

Case 1 (plane)

  • We assume that (s, t) are 2D coordinates on scene plane
  • Bottom left hand corner of plane has coordinates (x0, y0, z0) in a world coordinate frame
  • Unit vectors in the directions of the axes respectively in world coordinates
    • a = (ax, ay, az)
    • b = (bx, by, bz)
  • Coordinates of point (s, t) on plane in 3D
    • (x0, y0, z0) + t = (x, y, z)
    • t = sa + tb
    • t = sax + sa</sub>y</sub> + sa z + tbx + tby + tbz
  • To get pixel coordinates, pre-multiply P3 x 4

Lecture 21 - 2018/11/15

Case 2 (stereo (2) cameras)

  • Rectification - transform (warp) both cameras’ images such that their axes are in alignment
    • T is written with respect to camera 1’s coordinate frame
      • Eg T = (1, 0, 0) implies that it is aligned with camera 1’s x axis
    • Rrect is a 3 x 3 matrix which, when used to pre-multiply T, rotates it so that you end up with an axis system for camera 1 where the X axis is aligned with T
    • To align camera 2, use RrectR1R2-1
    • To warp camera one - K1RrectK1-1
    • To warp camera two - K1RrectR1R2-1K2-1
  • 3D scene point X1 = (X1, Y1, Z1)T is coordinate w.r.t. camera 1
  • Line of sight from one camera’s optical center to the second camera’s center intersects the image planes at the two epipoles, e1, e2
  • A unique 3D scene point defines an epipolar plane π
  • A unique 3D scenepoint, π, intersects the 2 camera image planes at conjugated epipolar lines
  • X1 (above), along with T1 = (Tx, Ty, Tz)T and (X1 - T1)T lie in the epipolar plane

Lecture 22 - 2018/11/20

TODO

Lecture 23 - 2018/11/22

TODO

Lecture 24 - 2018/11/27

  • Lambertian shading - surface whose reflected image obeys the equation:
    • I(X) = N(X) &centerdot; L
  • Classical “shape from shading”
    • I(x, y) = N(x, y) &centerdot; L
    • Too simplistic
    • Most surfaces do not have Lambertian reflectance
    • Cannot solve for single N without further constraints
  • Hard to distinguish shading and shadows
  • Scene colours depend on the illumination colour, though computer and human vision models typically normalize the image, assuming global colour bias
  • Some non Lambertian surface materials
    • Plastic
    • Polished fruit
    • Metal
    • Wet surfaces
  • Focal length = 1/field of view angle