CGI stands for Common Gateway Interface, a method (or technically speaking, a protocol) used for communicating data/information between a Web Server and a Computer Program (better referred as a CGI program). CGI is a standard way for web servers to interface with executable programs installed on a server that can process input data and generate results/output in the form of web pages.
CGI programs are also referred as CGI scripts, and can be written in any programming language - the most popular ones being PERL, C and C++. The CGI mechanism enables you to build dynamic and interactive web application.
The concept of CGI can be better understood by looking at the diagram below:
An Internet user sitting on a web browser sends a request for a web page to a web server. The web server understands from the URL that the page request has to be serviced by a CGI program. It then invokes the related CGI program, which resides on the same computer as the web server. The CGI program runs some logic, fetches some data from local disk or a local database, processes the data and returns the result of processing as an HTML formatted output. This output is returned to the web server. The web server, in turn, returns the HTML output to the requesting user's browser.
This entire environment of serving web requests has been well utilized by web developers to serve dynamic pages as well as build interactive web applications. A typical example of the use of a CGI program to serve a dynamic page can be in serving daily production report for a manufacturing firm. A typical example of an interactive application using CGI can be accepting user feedback through a feedback form and storing the user submitted data in a database residing on the web server machine.
Every Web Server machine that hosts websites and web applications, typically run a web server software known as HTTP Server. The HTTP server is always running and listening for requests for web pages on a standard port no. 80. When an internet surfer wants to visit a website page, he/she types the URL of the page in the address box of his/her web browser. The url may typically be like - http://www.xyz.com/somepage.html. The user's web browser will send the request to the concerned server on which the website for www.xyz.com is hosted. The request is received by the http server, which is configured to parse the url components and knows where the requesting page resides in its local disk. It fetches the page from its local disk and sends a copy of it across the internet to the requesting web browser. This entire communication between a web browser and a web server happens using a standard TCP/IP protocol.
Normally, for a static website, the URLs will contain reference to pre-built HTML files and all that the web server is required to do is to locate the file from its disk, read its contents and send it across. However, if a website has dynamic and interactive pages, it may contain URLs to files other than HTML. For instance, there could be a url like http://www.xyz.com/reports/dailyreport.exe?date=21-02-2016. The http server, when parsing such a url, finds that this time the request is for an exe file. It is configured to understand that when request comes for an exe file, instead of the reading the contents of the exe file, it invokes the executable program (CGI program). The http server then waits for the CGI program to complete its execution and send to it the output of running the program. The CGI program, after execution, would write its output to the standard output device of the web server machine, which gets piped to the http server program. The output generated by the CGI program is formatted as HTML so that the user's browser can read it. The http server then sends the cgi program's output to the requesting web browser.
While CGI programs can reside on any of the directories and sub-directories of the web server machine, for security reasons, it is a better practice to put them all under one common directory. As a standard convention a directory by the name cgi-bin, residing under the public_html directory is setup to host all CGI programs. In such a situation URL for a page that is served through a cgi program may look like - http://www.xyz.com/cgi-bin/dailyreport.exe?date=21-02-2016.
Another popular convention is to use filename extensions. For instance, the CGI program to serve daily reports may be compiled to have a file name as dailyreport.cgi instead of dailyreport.exe and stored under public_html/cgi-bin directory. All other CGI programs can also be compiled to have extension .cgi. The HTTP server can then be configured to interpret all such files with .cgi extension as CGI programs.
You would have noticed in our URL example that we have appended a date parameter as date=21-02-2016. We will learn later how parameters can be passed to a CGI program written in C. At this stage you should understand that fundamentally, CGI is simply programming with input data provided in a special way and output results generated according to a strict formatting rule. Everything in between is just programming. So, once you understand the methods of transferring input-output, you can take advantage of your skills in C programming or any other programming language to build complex web applications with ease.
A drawback with CGI is that each time a CGI program is called, it results in invocation of a newly created process on the server. After the CGI program has done its task, the process is then destroyed. Creation and destroying of processes entail significant computer resource and hence can load the server considerably. This can further get enhanced if the CGI program is a non-compiled one such as a PERL script, in which case there will be additional load due to the script being compiled/interpreted on the fly to serve every request. For busy Web sites, where several CGI programs are invoked frequently and multiple times, this can slow down the server significantly.
The overhead involved in interpretation may be reduced by using compiled CGI programs, such as those in C/C++, rather than using Perl or other interpreted languages. The overhead involved in process creation can be reduced by techniques such as FastCGI that pre-fork interpreter processes, or by running the application code entirely within the web server, using extension mechanisms such as ISAPI or NSAPI.
In subsequent articles we will learn how to use our knowledge of C programming and take advantage of the CGI paradigm to build useful web applications.
Rajeev Kumar is the primary author of How2Lab. He is a B.Tech. from IIT Kanpur with several years of experience in IT education and Software development. He has taught a wide spectrum of people including fresh young talents, students of premier engineering colleges & management institutes, and IT professionals.
Rajeev has founded Computer Solutions & Web Services Worldwide. He has hands-on experience of building variety of websites and business applications, that include - SaaS based erp & e-commerce systems, and cloud deployed operations management software for health-care, manufacturing and other industries.