Jump to content

Any webscrappers here


Midnightsun

Recommended Posts

Need to extract titles from html but getting request rejected

job_url = 'https://www.certipedia.com/companies/446478?locale=en'
resp = requests.get(job_url)
soup = BeautifulSoup(resp.content, 'html.parser')
In [16]:
title_tag_text = soup.title.text
print(title_tag_text)
Link to comment
Share on other sites

First try in postman or other rest test tools. May be they are checking user agent and rejecting. Try to mimick request similar to how browser is sending. Chrome lo network tool lo request lo pampe headers anni pampi choodu. If it works, try removing one header at a time to find what they require. 

Link to comment
Share on other sites

6 minutes ago, NeneRajuNeneManthri said:

First try in postman or other rest test tools. May be they are checking user agent and rejecting. Try to mimick request similar to how browser is sending. Chrome lo network tool lo request lo pampe headers anni pampi choodu. If it works, try removing one header at a time to find what they require. 

I'm using Anaconda/ jupyter lab.

Postman / rest test tools tho lagocha....

can you post example how headers are sent...I'm still novice in this webscrap game

Link to comment
Share on other sites

7 minutes ago, Midnightsun said:

I'm using Anaconda/ jupyter lab.

Postman / rest test tools tho lagocha....

can you post example how headers are sent...I'm still novice in this webscrap game

Try postman tool and make a get request on that url.

 

Chrome lo right click and open inspect mode. Switch to network tab. Then reload the page. You will see what is send in request header. First do this before trying to implement in code.

 

Then in postman try to populate those request header and values from chrome. Google for more details on how to use postman. 

Link to comment
Share on other sites

7 minutes ago, NeneRajuNeneManthri said:

Try postman tool and make a get request on that url.

 

Chrome lo right click and open inspect mode. Switch to network tab. Then reload the page. You will see what is send in request header. First do this before trying to implement in code.

 

Then in postman try to populate those request header and values from chrome. Google for more details on how to use postman. 

thanks, will look into that

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...