R - Checking Html For Formatting Tags (bold, Italics Etc.)
I am using edgarWebR to parse 10K (SEC EDGAR) filings. I am trying to write an algorithm to deduce whether each HTML element is normal text, a subheading or a heading by checking h
Solution 1:
I think all you're looking for is if a particular string contains html markup that indicates something in that string should be bold and/or italics.
S <- '<p style="margin-top:18px;margin-bottom:0px"><font style="font-family:ARIAL" size="2"><b><i>Our quarterly operating results have fluctuated in the past and might continue to fluctuate, causing the value of our common stock to decline substantially. </i></b></font></p>'
grepl("<b>|<font-weight\\s*=\\s*bold", S, ignore.case = TRUE)
# [1] TRUE
grepl("<i>|<font-style\\s*=\\s*italic", S, ignore.case = TRUE)
# [1] TRUE
Post a Comment for "R - Checking Html For Formatting Tags (bold, Italics Etc.)"