Show language: C# VB.NET Both

Forms Authentication Apps./Sites

Crawling and indexing applications that use "Forms Authentication" (.NET 1+), "ASP.NET Membership" (.NET 2+), or any type of custom login mechanism is possible, however a little preparation is required to make it work.

ASP.NET Membership (.NET 2+)

In this example we'll have one page called 'login.aspx' which shows the user the login form, and validates their input to log them in. Typically the validation logic for ASP.NET Membership is inside the "Login" control, however it can be performed externally in one of two ways;

Use of the Login.Authenticate event

    protected void Login1_Authenticate(object sender, AuthenticateEventArgs e)
    {
        e.Authenticated = myCustomLogic();     
    }

Call to Membership.ValidateUser

   if (Membership.ValidateUser(user, pass))
                FormsAuthentication.RedirectFromLoginPage(user, false);

The latter code does 2 things, it authenticates the username and password, and then it redirects the user to the desired page. During redirection the user is logged in (via a cookie).

To allow the crawler/indexer to login we need to modify the code to accept the username and password from GET request parameters, not just from the textboxes (since the crawler/indexer is unable to fill in the textboxes itself). To do this we simply change the code to try authenticating from the form parameters as well.

C#
    protected void Page_Load(object sender, EventArgs e)
    {
        string user = Page.Request.Params["searchusername"];
        string pass = Page.Request.Params["searchpassword"];

		//if it wasn't a postback and we do have a username, then login the user
        if (!IsPostBack && user != null && pass != null)
        {
            if (Membership.ValidateUser(user, pass))
                FormsAuthentication.RedirectFromLoginPage(user, false);

        } else if (!IsPostBack && Membership.GetUser() !=null && Request.Params["ReturnUrl"] !=null){ 
			//not a postback and we didn't get any user info,
			//so if the user is already logged in, redirect them as usual
            FormsAuthentication.RedirectFromLoginPage(Membership.GetUser().UserName, false);
        }

    }
VB.NET
    Protected Sub Page_Load(ByVal sender As Object, ByVal e As EventArgs) Handles Me.Load
        'get any supplied username for the search engine
        Dim user As String = Page.Request.Params("searchusername")
        Dim pass As String = Page.Request.Params("searchpassword")

        'if it wasn't a postback and we do have a username, then login the user
        If (Not IsPostBack _
                    AndAlso ((Not (user) Is Nothing) _
                    AndAlso (Not (pass) Is Nothing))) Then

            If Membership.ValidateUser(user, pass) Then
                FormsAuthentication.RedirectFromLoginPage(user, False)
            End If

        ElseIf Not IsPostBack _
                    AndAlso Not Membership.GetUser() Is Nothing And Not Request.Params("ReturnUrl") Is Nothing Then 

			'not a postback and we didn't get any user info,
			'so if the user is already logged in, redirect them as usual
            FormsAuthentication.RedirectFromLoginPage(Membership.GetUser().UserName, False)
        End If
    End Sub

Importing

In the Index Management tool, the import is configured to start from the login page, using the username and password required. For a website import the start URL is set, or for a filesystem import use the 'More Options' button to set the login URL.

Eg. the URL is set to

    http://localhost/login.aspx?searchusername=myUserName&searchpassword=myPassword

    [optionally you may also want to specify "&ReturnUrl=someURL.aspx" to make the crawler visit 'someURL.aspx' after logging in.]

Security note: by using the parameters named searchusername and searchpassword, the search engine will automatically remove the username and password from the result URL.

If you have difficulty (i.e. the crawler finds only 1 link) then it is of course useful to debug the login process. This can be performed as usual, just start the web application like normal, set a break point on the login method (leave the browser open), and start the crawler. If you treat the crawler like any other browser you will find it easy to debug. When the crawler visits the first page you'll see the break point hit.

It is advisable not to let the application's 'logout' page logout the search engine user, as this may be visited during the crawl operation.

Forms Authentication (.NET 1+)

    void ProcessLogin(object sender, EventArgs e){

        if (FormsAuthentication.Authenticate(txtUser.Text, txtPassword.Text)) {

            FormsAuthentication.RedirectFromLoginPage(txtUser.Text, chkPersistLogin.Checked);

        } else {

            ErrorMessage.InnerHtml = "<b>Invalid username/password combination.</b>";

		}
	}

This code is called by the login button.

<asp:Button Id="cmdLogin" OnClick="ProcessLogin" Text="Login" runat="server" />

The code does 2 things, it authenticates the username and password, and then it redirects the user to the desired page. During redirection the user is logged in (via a cookie).

To allow the crawler/indexer to login we need to modify the code to accept the username and password from GET request parameters, not just from the textboxes (since the crawler/indexer is unable to fill in the textboxes itself). To do this we simply change the code to try authenticating from the form parameters as well.

void ProcessLogin(object sender, EventArgs e){
        string username = txtUser.Text;
        string password = txtPassword.Text;
        if (Request.Params["searchusername"] != null) {//we got a searchusername through the GET request
            username = Request.Params["searchusername"];
            password = Request.Params["searchpassword"];
        }
    if (FormsAuthentication.Authenticate(username, password)) {
        FormsAuthentication.RedirectFromLoginPage(username,
    chkPersistLogin.Checked);
    } else {
        ErrorMessage.InnerHtml = "<b>Invalid username/password combination.</b>";
    }

and then call this method if we receive the GET parameter, from the page load, eg;

private void Page_Load(object sender, System.EventArgs e)
{
        // Put user code to initialize the page here
        if (Request.Params["searchusername"] != null) ProcessLogin(null, null);
}

Then, in the Index Management tool, the crawler is configured to start from this page, using the username and password required. Eg. the crawl import URL is set to

    http://localhost/login.aspx?searchusername=myUserName&searchpassword=myPassword

    [optionally you may also want to specify "&ReturnUrl=someURL.aspx" to make the crawler visit 'someURL.aspx' after logging in.]

Security note: by using the parameters named searchusername and searchpassword, the search engine will automatically remove the username and password from the result URL.

If you have difficulty (i.e. the crawler finds only 1 link) then it is of course useful to debug the login process. This can be performed as usual, just start the web application like normal, set a break point on the login method (leave the browser open), and start the crawler. If you treat the crawler like any other browser you will find it easy to debug. When the crawler visits the first page you'll see the break point hit.

It is advisable not to let the application's 'logout' page logout the search engine user, as this may be visited during the crawl operation.

Example Code With Modified Authentication Method

<%@ Page language="c#" Codebehind="WebForm4.aspx.cs" %>
<%@Import Namespace="System.Web.Security" %>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" >

<script language="C#" runat="server">
private void Page_Load(object sender, System.EventArgs e)
{
    // Put user code to initialize the page here
    if (Request.Params["searchusername"] != null) ProcessLogin(null, null);
}

void ProcessLogin(object sender, EventArgs e){
    string username = txtUser.Text;
    string password = txtPassword.Text;
    if (Request.Params["searchusername"] != null) {

//we got a searchusername through the GET request
            username = Request.Params["searchusername"];
            password = Request.Params["searchpassword"];
        }
    if (FormsAuthentication.Authenticate(username, password)) {
        FormsAuthentication.RedirectFromLoginPage(username,
chkPersistLogin.Checked);
    } else {
        ErrorMessage.InnerHtml = "<b>Invalid username/password combination.</b>";
    }

}
</script>
<html>
<head>
<title>Standard Forms Authentication Login Form</title>
</head>
<body >
<form runat="server" ID="Form1">
<table width="400" border="0" cellspacing="0" cellpadding="0">
    <tr>
        <td width="80">Username : </td>
        <td><asp:TextBox Id="txtUser" width="150" runat="server"/></td>
    </tr>
    <tr>
        <td>Password : </td>
        <td><asp:TextBox Id="txtPassword" width="150" TextMode="searchpassword"
runat="server"/></td>
    </tr>
    <tr>
    <tr>
        <td></td>
        <td><asp:CheckBox id="chkPersistLogin" runat="server" />Remember my
credentials<br>
        </td>
    </tr>
    <tr>
        <td> </td>
        <td><asp:Button Id="cmdLogin" OnClick="ProcessLogin" Text="Login"
runat="server" /></td>
    </tr>
</table>
<br><div id="ErrorMessage" runat="server" />
</form>
</body>
</html>