10/29 wk6 str remain & wk7 Func/ Gradient Desc/ Classification

Gradient Descent and Classification: 梯度下降法和分类

Week 6 strings

Earthquakes Example

date_express <- "^[0-9]{4}/[0-9]{2}/[0-9]{2}"
head(grep(quakes, pattern = date_express))
# return the rows

head(grep(quakes, pattern = date_express, value = TRUE))
# value = T ==> return values
# similar to quakes[grep(quakes, pattern = date_express)]

grep( ) Family

grep() returns a logical indicating a match. (which row of the parrten)
regexpr() returns the location of the first match with attributes like the length of the match.
gregexpr() works similarly to regexpr(), but returns all matching locations. ‘g’ for global. (all matching patterns' locations in each row)
regmatches() takes strings and the output of regexpr() or gregexpr() and returns the actual matching strings.

# Is there a match?
# find if partten in the lines
grep("a[a-z]", "Alabama")

# Information about the first match.
# find in each string, the 1st time partten shows up
regexpr("a[a-z]", "Alabama")
[3]

# Information on all matches.
# find the all position that partten shows up
gregexpr("a[a-z]", "Alabama")
[[1]] 3 5
# first a in 3rd, second a in 5th

# What are the matches?
# use gregexpr/regexpr find the values in each string which contains the partten
regmatches("Alabama", gregexpr("a[a-z]", "Alabama"))
[[1]]
[1] "ab" "am"

Earthquakes Example

# with/without "-", 1 or more digit num, with ".",
# and follow 4 digit num in the end
coord_exp <- "-?[0-9]+\\\\.[0-9]{4}"
full_exp <- paste(coord_exp, "\\\\s+", coord_exp, sep = "")

# . ==> any character
# \\. ==> special character for "."
# \\s ==> space

If we using regmatches() get the result with list format, and the 2 value in SAME sting, we can use sapply() to split the string

also, according to the lab3, if rhe 2 values in one line of list but in DIFF string, we can unlist or trans-direction (row to column or inverse) with sapply().

coords_split <- sapply(coords, strsplit, split="\\\\s+")

p1char_num <- data.frame(t(sapply(regmatches(p1char_group, 
													p1char_findnum), as.numeric)))

Web Scraping

see lecture slides