Posts Tagged ‘process’

Working with Regular Expressions in C#

Saturday, August 23rd, 2008

I’ve been working on a program that needs to parse a html file for form data.  So when I was deciding what method to use, a few popped right into my mind.

The first being a character by character search through the string.  Parsing through the data and flagging sections that fit the signature of what was being searched for.

The second would be automating that by using the built-in string functions to split up the string and drill down until the needed data was extracted.

The third, and one I chose to use, was with regular expressions.  This in my mind is the most “poetic” method of the three, which would allow me to make the a robust and reliable function.

While I’ve used regular expressions a lot throughout the years. I NEVER seem to remember enough to construct a decent statement. I had recently bought a pocket reference (link below), so I used that to get a statement constructed. It had a total of about 6 pages for C#, but I pretty much got what I needed from it. Anything else I just searched the Internet for.

Included below is most of the code to extract an unlimited number of forms from a html document:

//create a new instance that will be sent back as a reference parameter
//there may be multiple forms, so we have to use a data structure
returnHtmlFormData = new List<HtmlForm>();

//take the initial response text and process it for FORM tags, this can handle an "unlimited" number of them

//a regular expression to extract each form tag as well as the action attribute [0] and [1] in the group collection
Regex formExtractor = new Regex(@"<form\b[^>]*action=""?(.*?)[""|\s].*?>.*?</form>",
    RegexOptions.IgnoreCase | RegexOptions.Compiled | RegexOptions.Singleline);

//a regular expression to extract all of the input tags (we want the name and the value of each)
Regex inputTagExtractor = new Regex(@"<input\b[^>]*name=""?(.*?)[""|\s].*?value=""?(.*?)[""|\s].*?[/??|>]",
    RegexOptions.IgnoreCase | RegexOptions.Compiled | RegexOptions.Singleline);

List<HtmlForm> is a collection of a class that I created. It basically just stores:
1. The action page, which we would need if we want to use this data later to send a POST.
2. All of the fields that were in the form.

The two Regex instances are what do all of the work here. The first searches the text for form tags. It takes into account quite a few possibilities like if there are spaces or property fields in the starting tag. Also, notice the three regex options. Ignoring case is obvious, compiled helps improve execution time, and the singleline option means the regex expression engine will consider “character return” and “line feed” as normal white-space.

The second searches within the form tag (as shown later on) for all input fields). It uses the parenthesis characters to save certain pieces of the data into a buffer called a GroupCollection we will look at later. It also takes into account things like if properties have or do not have quotes around their values.

//attempt to extract out all forms in the passed string
MatchCollection formList = formExtractor.Matches(initialResponseText);

This line above takes some text data and performs the first regular expression on it. It returns all instances of a match back in a MatchCollection object.

//for each form tag that is found, process it
foreach (Match formMatch in formList)
{
    //create a new element in the list data structure so we can fill it with form data
    returnHtmlFormData.Add(new HtmlForm());

    //get a temporary copy of the current element of the list we want to be filling with data
    int activeListElement = returnHtmlFormData.Count - 1;

    //extract the regex variables from the result, so we can continue processing
    //anywhere where you see () in the regex statement, will be a variable in here
    //the first element, [0], will be the whole result though
    GroupCollection formTagMatchValues = formMatch.Groups;

    //assign the action page value we extracted from the current form element to our data structure
    returnHtmlFormData[activeListElement].ActionPage = formTagMatchValues[1].ToString();

    //attempt to extract all of the names/values for the input tags
    MatchCollection inputTagMatches = inputTagExtractor.Matches(formTagMatchValues[0].ToString());

    //loop through the results (multiple input tags should be returned)
    foreach (Match inputMatch in inputTagMatches)
    {
        GroupCollection inputMatchValues = inputMatch.Groups;

        //save the input field data to our data structure
        returnHtmlFormData[activeListElement].addInputField(inputMatchValues[1].ToString(), inputMatchValues[2].ToString());
    }
}

As you can see above, I use my List collection returnHtmlFormData to hold a list of classes… of my special form storage class. I’ll let my code comments above speak for themselves, but basically you start from a MatchCollection, from there you loop through as single items of class Match, each Match can then be processed further by reading the GroupCollection for the actual data we wanted extracted. It’s quite and ingenious construct of classes, but it took be a while to figure out…

Amazon.com Book Link:
Regular Expression Pocket Reference: Regular Expressions for Perl, Ruby, PHP, Python, C, Java and .NET

Drupal: CMS Migration To A Server

Friday, August 10th, 2007

I’ve been modifying Drupal (drupal.org) for a neighborhood organization website. It took a little research as, like most similar open-source projects, the documentation leaves something to be desired. I now have the site to a point where I want to put it up on the Internet. I have some hosting, so I got the domain… and bam, it’s on the web right? Well no.

I feel it is necessary to make a general process overview as it took a good deal of research and tweaking to get it working:
-I have “reseller hosting.” That means I can host multiple domains with the space/bandwidth that comes with the yearly fee. I bought the neighborhood domain, added a special hosting plan (defines storage, bandwidth, and other properties), and tried configuring the dns information

-Seeing as I have not done too much with the whole domainname-dns-nameserver-whatever configuration process I had some trouble figuring it out. What I ended up doing was having my domain name registrar create 2 name servers from the IPs that link to the hosting server(s). Then I used the hosting’s name servers as well. ns1.site.com, ns2.site.com, ns1.hosting.com, ns2.hosting.com… all being able to process the name translation. On the hosting side for “editing DNS Zones” I set it up with ns1.nameservice.com as NS records, the IP as and ‘A’ record, the www as a cname for site.com…. etc. Most of the data was preconfigured by cPanel, but for some reason I had to change a few things from the defaults to get it working properly. This whole aspect will take a lot more research to understand fully.

-Copy by hand or install Drupal from cPanel (assuming your host uses cPanel)

-If you are developing on a windows machine, make sure you reconfigure your temporary directory from c:\windows\temp to /tmp. Drupal will most likely make a folder for you. I didn’t do this the first time and it made a crappy “c:\windows\www” folder on the server that was a pain to remove.

-Login into your local mysqladmin (assuming you use that to manage your sql databases) and make a backup of your local Drupal database. I had to set the compatibility to an earlier version of mysql. For some reason at the default options for the export would not work.

-If you installed from cPanel, drop all of the tables in the pre-created Drupal database before you import the backup file. Otherwise, create a sql user and database for Drupal (make sure to add the user to the database access list). Configure the settings.php file in the Drupal files to reflect the database information (there is a connection string in the php file).

-Copy over any themes or modules that you added to your local Drupal install. I didn’t bother with the hassle that is the sites directory… Just put the modules and theme into the standard directory instead.

-Configure your www-root .htaccess file on the server to reflect things like url rewriting, default page (index.php), and standardizing the site path (only allowing www.site.com, not site.com or vice versa). More specifically the properties DirectoryIndex, RewriteEngine, RewriteCond, and RewriteRule

No surprise, the custom CMS was put on the back burner

Friday, March 2nd, 2007

As I mentioned in the previous post I had my custom CMS site reasonably functional. From that time as I worked with the director, I found out that they were using a CMS system already. No point in trying to reinvent the wheel as it’s said. It’s called Sohoadmin and it works pretty well all things considered. I would not call it that user friendly, but seeing as I am doing volunteer work I don’t want to spend 2-3 months creating my own thing that I would also then have to provide technical support to in the future.

It’s been an interesting process so far getting them where they need to be. The person who they were working with before isn’t very motivated to help them and it was showing (he hosts the server too). I wanted to get Sohoadmin upgraded to the newest version, but when he tried to do that it did not work correctly resulting in me sticking some band aids on the site to get it somewhat functional again.

Now we are to the point that we bought our own hosting and are in the process of getting the .org name transfered to us. We have the site completely functional with the .net name, but seeing as the .org has been advertised so much it’s best that we use that instead.

Eventually I foresee this organization I am volunteering at being able to be in control of their own web presence. It should save them a lot of time and heart ache by cutting out the middle man.

I recently started working on my CMS site again. I want to get it completed just for the fun of it. Right now I have the article functionality about half way there. Next up would be the image gallery functionality. Onward!

Getting a Symfony development enviroment running on windows

Thursday, January 11th, 2007

Nothing is ever easy… Anyways, to get a computer ready to start developing with the Symfony PHP framework you have a few options. You can take the long, long route and install apache, php, and a database server. With that you have to do a good deal of configuration to get everything working together. You could also download WAMP or XAMPP, which are just prepackaged versions. I haven’t used XAMPP, so I can’t say how you would get that working.

The best reason for using WAMP is that it is self contained in c:\wamp folder by default. You can also start and stop all of the server processes (Apache & MySQL) anytime by just closing the tray icon. It’s great so you don’t have unnecessary processes running all of the time.

The main problem is that WAMP and Symfony don’t work “out of the box.” No surprise there.

This is the process I used to get WAMP working with Symfony:
Download WAMP (1.6.6) and install:
http://www.wampserver.com/en/

Open a browser to http://localhost/
WAMP should display it’s default webpage with links to phpmyadmin and etc.
If you are going to use MySQL it might be a good idea to change its root account to have a password. You can use phpmyadmin to do that, just click the link to it on the WAMP localhost page.

There will be a problem once you change the password. Phpmyadmin won’t be able to reconnect to MySQL until you edit this file:
C:\wamp\phpmyadmin\config.inc.php:
Search for this line below in the file and put your password between the ”:
$cfg['Servers'][$i]['password'] = ”;

Extract the Symfony Sandbox to c:\wamp\www for testing later:
http://www.symfony-project.com/get/sf_sandbox.tgz
If that link doesn’t work, just go to http://www.symfony-project.com/ and checkout the download page.

Read this tutorial for some needed info:
http://www.symfony-project.com/trac/wiki/SymfonyOnWAMP

It basically said to:
Open c:\wamp\Apache2\bin\php.ini
Search for and remove the comment symbol ‘;’, change ‘On’ to ‘Off’, or just edit the lines to be the same as these below.
extension=php_xsl.dll
magic_quotes_gpc = Off
register_globals = Off
include_path = “.;c:\php\includes;c:\wamp\php\pear”

The magic_quotes_gpc and register_globals parts were not in the tutorial, but WERE NECESSARY for me to change. I was getting a 500 internal server error before I set those values to Off.

Install PEAR for PHP (taken from that Wiki article directly):
1. Start -> Run -> cmd
2. Cd into the PHP directory (e.g. C:\wamp\php)
3. Invoke go-pear.bat. Follow through the options (default should work fine).
If you have a problem when running the bat file like “Warning: Cannot use a scalar value as an array,” in php 5.2.0 the file was broken for windows. You can check out this website for more info. You can just download the new version of the file from svn here.

Open C:\wamp\Apache2\conf
Remove the comment character ‘#’ from this line:
LoadModule rewrite_module modules/mod_rewrite.so

Point a browser to http://localhost/sf_sandbox/web/
You should get a page that says something like:
Congratulations!
If you see this page, it means that the creation of your symfony project on this system was successful….

A few more notes to get the Symfony sandbox example working:
http://www.symfony-project.com/tutorial/my_first_project.html
Edit symfony.bat in the sf_sandbox folder that you extracted to c:\wamp\www
Change the line:
set PHP_COMMAND=php.exe
to
set PHP_COMMAND=c:\wamp\php\php.exe

I’m still having some problems with the example, but it is *kind of* working so far. I had to edit both php.ini files with all of the previous modifications.




The Way Of Coding



 
Scott J. Waldron Photography
Stock Photo Website
Tech Learning Site
Follow me on Twitter

Popular Article Tags

Archives

Pages