How to programmatically log in to a website to screenscape?

You’d make the request as though you’d just filled out the form. Assuming it’s POST for example, you make a POST request with the correct data. Now if you can’t login directly to the same page you want to scrape, you will have to track whatever cookies are set after your login request, and include them in your scraping request to allow you to stay logged in.

It might look like:

HttpWebRequest http = WebRequest.Create(url) as HttpWebRequest;
http.KeepAlive = true;
http.Method = "POST";
http.ContentType = "application/x-www-form-urlencoded";
string postData="FormNameForUserId=" + strUserId + "&FormNameForPassword=" + strPassword;
byte[] dataBytes = UTF8Encoding.UTF8.GetBytes(postData);
http.ContentLength = dataBytes.Length;
using (Stream postStream = http.GetRequestStream())
{
    postStream.Write(dataBytes, 0, dataBytes.Length);
}
HttpWebResponse httpResponse = http.GetResponse() as HttpWebResponse;
// Probably want to inspect the http.Headers here first
http = WebRequest.Create(url2) as HttpWebRequest;
http.CookieContainer = new CookieContainer();
http.CookieContainer.Add(httpResponse.Cookies);
HttpWebResponse httpResponse2 = http.GetResponse() as HttpWebResponse;

Maybe.

Leave a Comment