Thursday, October 6, 2022

Regex - Common useful ones Part 1

 To extract all the words from a given sentence 

The + character is a special character in regex. It is used to match 1 or more repetitions of the preceding regular expression or class which in our case is [a-z]. So it matches 1 or more repetitions of lower case alphabets and hence we get the above list. If we wanted to include 1 or more repetitions of both lower and upper case alphabets, we can create the pattern as follows:

words_pattern = '[a-zA-Z]+'

Extracting Words Followed by Specific Pattern

Let’s assume that our usernames can only contain alphabets and anything followed by an '@' without any space is a username.

comment = "This is an great article @Bharath. You have explained the complex topic in a very simplistic manner. @Yashwant, you might find this article to be useful."

Let’s create a regex pattern that can be used to search all the usernames tagged in the comment.

username_pattern = '@([a-zA-Z]+)'

re.findall('@([a-zA-Z]+)', comment)


Find all words that are having .ing in it. 

# regex pattern

pattern = ".(ing){1,}"# write regex to extract words ending with 'ing'

# store results in the list 'result'

result =  re.findall(pattern,string) # extract words having the required pattern, using the findall function


referencs:

https://medium.com/quantrium-tech/extracting-words-from-a-string-in-python-using-regex-dac4b385c1b8

No comments:

Post a Comment