Wednesday, 8 August 2012

Proxy Servers Using Squid

Proxy Servers Using Squid

Linux has become a synonym for Networking. It is being used both in office and home environments as file, print, e-mail, application server and also it is increasingly being used as Proxy server.
A proxy-server provides Internet access to different users at same time i.e. by sharing a single Internet connection. A good proxy server also provides for caching of the requests, which helps to access data from local resources rather fetching the data from web thus reducing access time and bandwidth. Squid is one such software which supports proxy, caching of HTTP, ftp, gopher, etc.. It also supports SSL, access controls, caching of DNS and maintains a full log of all the requests.
Configuring Squid
The working and behavior of the Squid is controlled by the configuration details given in it's configuration file i.e. squid.conf; this file is usually found in directory the /etc/squid. The configuration file squid.conf is a mile long affair, it just keeps on going for pages after pages, but the good point is that it has all options listed out clearly with explanation.
 The first thing that has to be edited is the http_port, which specifies the socket address where the Squid will listen to the client's request; by default this is set to 3128, but can be changed to a user defined value also. Along with the port value, one can also give the IP address of the machine on which Squid is running ; this can be changed to:
http_port 192.168.0.1:8080
With above declaration Squid is bounded to the IP address of 192.168.0.1 and port address of 8080. Any port address can be given; but make sure that no other application is running at set port value. With similar configuration lines other service's request ports can also be set.

Access Control
Through access control features the access to Internet can be controlled in terms of access during particular time interval, caching, access to particular or group of sites, etc.. Squid access control has two different components i.e. ACL elements and access list. An access
list infact allows or deny the access to the service.
A few important type of ACL elements are listed below
· src : Source i.e. client's IP addresses
· dst : Destination i.e. server's IP addresses
· srcdomain : Source i.e. client's domain name
· dstdomain : Destination i.e. server's domain name
· time : Time of day and day of week
· url_regex : URL regular expression pattern matching
· urlpath_regex: URL-path regular expression pattern matching, leaves out the protocol and hostname
· proxy_auth : User authentication through external processes
· maxconn : Maximum number of connections limit from a single client IP address
To apply the controls, one has to first define set of ACL and then apply rules on them.

The format of an ACL statement is
acl acl_element_name type_of_acl_element values_to_acl
Note :
1. acl_element_name can be any user defined name given to an ACL element.
2. No two ACL elements can have the same name.
3. Each ACL consists of list of values. When checking for a match, the multiple
values use OR logic. In other words, an ACL element is matched when any one of its values matches.
4. Not all of the ACL elements can be used with all types of access lists.
5. Different ACL elements are given on different lines and Squid combines them together into one list.
A number of different access lists are available. The ones which we are going to use here are listed below
· http_access: Allows HTTP clients to access the HTTP port. This is the primary
access control list.
· no_cache: Defines the caching of request's responses
An access list rule consists of keywords like allow or deny; which allows or denies the service to a particular ACL element or to a group of them.
Note:
1. The rules are checked in the order in which they are written and it terminates as soon as rule is matched.
2. An access list can consists of multiple rules.
3. If none of the rules is matched, then the default action is opposite to the last rule in the list; thus it is good to be explicit with the default action.
4. All elements of an access entry are AND'ed together and executed in following manner
http_access Action statement1 AND statement2 AND statement OR.
http_access Action statement3
Multiple http_access statements are OR'ed whereas elements of an access entry are AND'ed together
5. Do remember that rules are always read from top to bottom.
Setting up of normal proxy server

By default, Squid will not give any access to clients and access controls have to modify for this purpose. One has to list out one's own rules to allow the access. Scroll down in the squid.conf and enter the following lines just above the http_access deny all line
acl mynetwork 192.168.0.1/255.255.255.0
http_access allow mynetwork
mynetwork is the acl name and the next line is the rule applicable to a particular acl i.e. mynetwork. 192.168.0.1 Refers to the address of the network whose netmask is 255.255.255.0..
Installation and configuration in Linux

Squid is one of the best web proxy servers in the world. This explains how to install and configure Squid Proxy in Linux rhel5 / centos system.
Software = SQUID
Version = squid-2.6.STABLE6-3.el5
Installing SQUID PROXY SERVER:
[root@vm1 ~]# yum install squid*
If  you  dont have a yum server then use rpm.

Starting the service:
1st set a fully qualified domain name. Otherwise squid may fail.
#/etc/init.d/squid start

Defaultly SQUID binds to port 3128. But usually it is changed to 8080.

ACCESS CONTROL
In squid we configure things by writing rules. They are known as ACL rules.
A simple ACL rule:
acl aclname acltype string1
http_access allow|deny aclname
aclname = name of the rule
acltype = the type of string we are using eg:src, dst
string = can be IPs, networks, URLs etc
acl mynetwork src 192.168.0.0/255.255.255.0
http_access allow mynetwork

NOTE: Specify the rules before the line
# http_access deny all
Its because the rules are parsed from top to bottom.

To Check to which port the proxy currently binds to:
[root@vm1 ~]# nmap 192.168.0.21
Starting Nmap 4.11 ( http://www.insecure.org/nmap/ ) at 2010-04-16 16:59 IST
Interesting ports on 192.168.0.21:
Not shown: 1672 closed ports
PORT STATE SERVICE
22/tcp open ssh
[output truncated]
3128/tcp open squid-http
Nmap finished: 1 IP address (1 host up) scanned in 0.448 seconds


To block internet usage from a particular IP address:

Write this rule:
#acl block_ip src 192.168.0.66
#http_access deny block_ip

It should be above these lines
#acl mynetwork src 192.168.0.0/255.255.255.0
#http_access allow mynetwork

Else the rule will be cancelled because of the above rule[mynetwork]. Always keep in mind that Squid interprets rules from topto bottom.


To block internet usage from two or more IP addresses By ACL Lists:

Write this rule:
#acl block_ips src IP1 IP1
#http_access deny block_ips

Eg:
#acl block_ips src 192.168.0.21 192.168.0.22
#http_access deny block_ips

Or you can define rules like this:
#acl block_ips src 192.168.0.21
#acl block_ips src 192.168.0.22
#http_access deny block_ips
To block a particular URL:
For blocking the URL www.yahoo.com
For blocking only one URL use the acl_type dst.

#acl block_yahoo dst www.yahoo.com
#http_access deny block_yahoo
You can see that site yahoomail is still accessible. So it blocks single URL only.

To block only one domain:
Eg for blocking all systems from accessing orkut.com

#acl block_orkut dstdomain .orkut.com
#http_access deny block_orkut


To block a list of sites from / specified in a file:
First we have to create a file and save all the URLs we want to block in that.
In this example file is saved in /etc/squid/block_list.txt. And its given the read permission for all.

[root@vm1 ~]# cat /etc/squid/block_list.txt
www.hotmail.com
www.ibm.com
www.hp.com

#acl block_list url_regex "/etc/squid/block_list.txt"
#http_access deny block_list

Blocking web access by time:

The syntax is as follows:
#acl aclname time [day-abbrevs] [h1:m1-h2:m2]
h1:m1 must be less than h2:m2
Day-abbrevs:
S - Sunday
M - Monday
T – Tuesday
W - Wednesday
H - Thursday
F - Friday
A - Saturday

We are going to block all systems of mynetwork from accessing web at lunch time. Where lunch time is 02:32-03:00

#acl mynetwork src 192.168.0.0/255.255.255.0
#acl lunch time MTWHFA 02:32-03:00
#http_access deny mynetwork lunch
mynetwork must be defined before.
Log files of squid:
/var/log/squid/access.log
/var/log/squid/cache.log -> Memory and CPU informations
/var/log/squid/store.log
/var/log/squid/squid.conf -> Basic system informations.

/var/log/squid/access.log
You can check your requests were a HIT or MISS in this log file. MISS means it was not taken from cache and HIT means it was taken from the cache.
Syntax / Fields in /var/log/squid/access.log

1st field is Request_time shown in unix epoch format appended with milli sec.
2nd feild is elapsed time[ in ms] of page/object delivery.
3rd field is remote host ip.
4th field is code(squid actions)/status(http errors)
5th field is bytes delivered to the client.
6th field is the method that is used to retrieve the page.
7th field is URL.
8th field is IDENT identification.
9th field is heirarchy (eg: DIRECT/IP).
10th field is MIME type.

/var/log/squid/store.log
store.log stores the objects in the cache.
Syntax / Feilds in /var/log/squid/store.log:
1st field is time in unix epoch format.
2nd field is action (Release, Create, swapout, swapin)
swapout - the object is moved from memory to disk
released – neither in disk nor in memory
all objects are in /var/spool/squid
3rd field is location on disk
4th field is HTTP status
5th field is HTTP date
6th field is last modified
7th field is expiration of content
8th field is MIME type
9th field is size of the content. content_lenght/actual size of content.
10th field is Method ( Get,Post,Connect)
11th field is URL
To turn on the Common Log Format (CLF):

#vi /etc/squid/squid.conf

#Default:
# emulate_httpd_log off
emulate_httpd_log on

To Check the speed or the time squid taking to load a page:

[root@vm1 ~]# squidclient -g http://www.google.com
2010-04-17 12:59:11 [1]: 0.142 secs, 6.567727 KB/s (1KB)
2010-04-17 12:59:12 [2]: 0.145 secs, 6.431843 KB/s (1KB)
Interrupted.
2 requests, round-trip (secs) min/avg/max = 0.142/0.143/0.145

If you are checking from a remote machine, then:
#squidclient -h Squid_server_ip -g URL
[root@server ~]# squidclient -h 192.168.0.21 -g http://www.google.com
2010-04-17 13:12:47 [1]: 0.128 secs, 7.286072 KB/s (1KB)
2010-04-17 13:12:48 [2]: 0.115 secs, 8.109715 KB/s (1KB)
Interrupted.
3 requests, round-trip (secs) min/avg/max = 0.115/0.132/0.153

How to set proxy through bash shell:

[root@server ~]# export http_proxy=URL/IP_of_Squid_Server
[root@server ~]# export http_proxy=http://192.168.0.21:3128
Now we are on the machine 192.168.0.99 and set the proxy with above command. Now while accessing net, see its going through the proxy site.
[root@server ~]# wget http://www.yale.edu/index.html
--13:42:35-- http://www.yale.edu/index.html
Connecting to 192.168.0.21:3128... connected.
Proxy request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `index.html'
[ <=> ] 17,208 33.3K/s in 0.5s
13:42:36 (33.3 KB/s) - `index.html' saved [17208]

Cache Managing Script:
We have a cache managing script associated with squid: cachemanager.cgi
We access this script through web interface and can get lots of cache informations.

First we have to install and start httpd in our system. Then we have to copy the cgi script to the apache cgi-bin directory.
[root@vm1 ~]# rpm -ql squid | grep cgi
/usr/lib/squid/cachemgr.cgi
/usr/share/doc/squid-2.6.STABLE6/cachemgr.cgi.8
/usr/share/doc/squid-2.6.STABLE6/cachemgr.cgi.8.in
/usr/share/man/man8/cachemgr.cgi.8.gz
[root@vm1 ~]# cp /usr/lib/squid/cachemgr.cgi /var/www/cgi-bin/

Now take the browser. In address bar and give http://IP_proxy/cgi-bin/cachemgr.cgi
You will be getting a cache managemnt page.Click continue.You will get cache manager menu.

To change the default squid proxy port:

Now we'll change the default squid port 3128 to 8080
#vi /etc/squid/squid.conf
Find the following variable.
http_port 3128
Now change it to
http_port 8080

Restart the service
#service squid reload

Now its changed.
[root@vm1 ~]# netstat -tlpn | grep squid
tcp 0 0 0.0.0.0:8080 0.0.0.0:* LISTEN 5228/(squid)

How to write combined rules/ACLs in Squid:

Syntax is as follows:
#acl acl_name1 acl_type string
#acl acl_name2 acl_type string
#acl acl_name3 acl_type string
#http_access deny acl_name1 acl_name2 acl_name3

Suppose you want to block the web access of a group of users at lunch time.
#acl lunch time MTWHF 09:00-17:00
#acl spys src 192.168.0.66 192.168.0.99
#http_access deny spys lunch

To block the URLs contains the word "word":


We will use a acl_type url_regex for blcking regular expressions in url. The rule is as follows.

#acl block_word_url url_regex sex
#http_access deny block_word_url

Reload the service.
Now you wont be getting any urls contain the word sex.

If u give -i it will become case insensitive.
#acl block_word_url url_regex -i sex
To prevent downloading files:

Suppose we are going to block download of .exe files.
#acl block_exe url_regex .*\.exe$
#http_access deny block_exe

Reload the service. Now you wont be able to download any .exe files. If you want to block a lot of formats, you can specify all in a file and give the file name in the string field.

#acl block_exe url_regex "/etc/squid/block_downloads.txt"

To block access to some TLDs:

Suppose you want to block access to some TLDs (.uk, .pk etc )
#acl block_tld dstdom_regex \.pk$
#http_access deny block_tld 
To setup squid as a non-caching proxy server:

The following rule tells squid not to cache from any hosts in any networks.
#acl non_caching_hosts src 0.0.0.0/0.0.0.0
#no_cache deny non_caching_hosts

To not to cache from a particular IP
#acl non_caching_hosts src 192.168.0.66
#no_cache deny non_caching_hosts

To disable caching of specified sites:

#acl no_cache_sites dstdomain .blogspot.com
#no_cache deny no_cache_sites


The above rule will not let squid to cache contents from any .blogspot.com pages.
You can see MISS in access.log
How to configure load balancing with squid and DNS:
DNS has a special feature: Round Robin mechanism.
in zone file if we gave like this

cache.abc.come IN A 192.168.0.21
cache.abc.come IN A 192.168.0.22

The first request coming to DNS for cache.abc.come, DNS will answer with IP 192.168.0.21
for the second request it will give 192.168.0.22, for the thrid again 192.168.0.21 and it will toggle the IP one after one.

Using this property we can achive load balancing for squid upto 50%.

Step1:
suppose your squid proxy server is configured in 192.168.0.21. Now you take another system 192.168.0.22 and install squid exaclty as 192.168.0.21. You can copy the same if you want.
Start the service in both systems.
Step2:
In the DNS server, add two entries as shown below.

cache.abc.come IN A 192.168.0.21
cache.abc.come IN A 192.168.0.22

Step3:
in the browser change the proxy name to cache.abc.come. So for the first request from the browser will goto 192.168.0.21 and the second request will goto 192.168.0.22 and it will toggle for the further requests. But there is no guarantee that it'll operate as expected.

Bandwidth Management using Delay Pools:

Delay pools have three different classes.

    Class one allows us to restrict the rate for large downloads. Now we are going to restrict downloading speed of files > 75mb to less than or equal to 200kbps. For all machines in network 192.168.0.0/24

acl worker_bees src 192.168.0.0/24 #defining worker_bees
#delay_pools Number_of_Delay_pools
delay_pools 1 #the number of delay pools we are using
#delay_class Number_of_delaypool The_class _number
delay_class 1 1 #setup class based on pool number 1 and class number 1
delay_parameters pool_number restore_rate/max_rate
delay_parameters 1 200000/15000 # pool 1 20kbps/15kb
delay_access pool_number allow worker_bees #allowing worker_bees
delay_access 1 allow worker_bees #allowing worker_bees

Eg:
acl worker_bees src 192.168.0.0/24
delay_pools 1
delay_class 1 1
delay_parameters 1 200000/75000000
delay_access 1 allow worker_bees

Before:
We created a 100mb file proxy_test_100mb in 192.168.0.99 and downloading it.

[root@vm1 ~]# wget http://192.168.0.99/proxy_test_100mb
--15:18:55-- http://192.168.0.99/proxy_test_100mb
Connecting to 192.168.0.21:8080... connected.
Proxy request sent, awaiting response... 200 OK
Length: 102400000 (98M) [text/plain]
Saving to: `proxy_test_100mb'
100%[=======================================================================>] 102,400,000 12.2M/s in 7.2s
15:19:03 (13.5 MB/s) - `proxy_test_100mb' saved [102400000/102400000]

After:
After adding the acls and restarting squid.

[root@vm1 ~]# wget http://192.168.0.99/proxy_test_100mb
--15:20:42-- http://192.168.0.99/proxy_test_100mb
Connecting to 192.168.0.21:8080... connected.
Proxy request sent, awaiting response... 200 OK
Length: 102400000 (98M) [text/plain]
Saving to: `proxy_test_100mb'
100%[=======================================================================>] 102,400,000 189K/s in 8m 25s
15:29:07 (198 KB/s) - `proxy_test_100mb' saved [102400000/102400000]

Bandwidth Management with Aggragate Rate:

Using class one we can only limit the rate at which all downloads. What if we want to use aggragate restriction and per user restriction. Thats where we use class two. Suppose maximum speed of our connection is 2mbps [2097152 bytes]. We'll allow aggragate of 0.5mbps [524288bytes] for 192.168.0.0/24 nestwork. And per user we will resrtict to 0.05mbps [52428.8bytes]. Thats 10 users can use net with the speed of 0.05mbps [51kbps] at a time at an aggragate of 0.5mbps [510kbps].

Eg:
#acl worker_bees src 192.168.0.0/24
#delay_pools 1
#delay_class 1 2
#delay_parameters 1 524288/524288 52428/52428
#delay_access 1 allow worker_bees

After:

[root@vm1 ~]# wget http://192.168.0.99/proxy_test_10mb
--16:05:31-- http://192.168.0.99/proxy_test_10mb
Connecting to 192.168.0.21:8080... connected.
Proxy request sent, awaiting response... 200 OK
Length: 10240000 (9.8M) [text/plain]
Saving to: `proxy_test_10mb'
100%[=======================================================================>] 10,240,000 42.9K/s in 3m 49s
16:09:20 (43.6 KB/s) - `proxy_test_10mb' saved [10240000/10240000]

        43.6KB/s < 51 KB/s


Bandwidth Management with Aggragate Rate With per Subnet limits:

It can be achived using the class 3 of delay pool.

delay_pools pool_num
delay_class pool_num class_num
delay_parameters pool_num agg_max/agg_max For_each_nw/ For_each_nw Per_user/Per_user
delay_access pool_num allow all # all networks

agg_max = Max speed squid can allow all together.
For_each_nw = Max speed each subnet can get.
Per_user = Max speed each user can get.

#delay_pools 1
#delay_class 1 3
#delay_parameters 1 524288/524288 262144/262144 100000/100000
#delay_access 1 allow all

[root@vm1 ~]# wget http://192.168.0.99/proxy_test_10mb
--17:00:27-- http://192.168.0.99/proxy_test_10mb
Connecting to 192.168.0.21:8080... connected.
Proxy request sent, awaiting response... 200 OK
Length: 10240000 (9.8M) [text/plain]
Saving to: `proxy_test_10mb'
100%[=======================================================================>] 10,240,000 89.4K/s in 1m 50s
17:02:17 (91.0 KB/s) - `proxy_test_10mb' saved [10240000/10240000]

        91.0 KB/s < 100 KB/s
How to limit the number of connections per client:
We have to use a new acl_type named maxconn for this. The rule is as follows
#acl con_limit maxconn 2
#http_access deny con_limit all
It will restrict allclient from making more than 2 conncetions per client.
Starting Squid
[root@bigboy tmp]# chkconfig squid on
To start, stop, and restart squid after booting use the service command:

[root@bigboy tmp]# service squid start
[root@bigboy tmp]# service squid stop
[root@bigboy tmp]# service squid restart

To determine whether squid is running you can issue either of these two commands. The first will give a status message. The second will return the process ID numbers of the squid daemons.

[root@bigboy tmp]# service squid status
[root@bigboy tmp]# pgrep spam

Installing and configuring Dansguardian with Squid proxy in linux rhel5 or centos5
 Installing and configuring squid internet proxy can be found in the following link.
http://dansguardian.org/downloads/2/Stable/dansguardian-2.10.0.1.tar.gz
After configuring the Squid proxy, You can install and configure Dansguardian.

Dansguardian is an add-on for squid. DG is having lots of list files to which we can add and remove entries easily. No need to write complicated ACL rules in squid.


DG acts as a filter point before the squid. We have to configure two things in DG. A filter port and filter IP. The web request will be hitting this port first. Then only it'll goto squid.
The data flows is as shown below.



From web browser request will hit iptables rules first. Then the DG and then Squid.

Installation:
The package can be downloaded from site dansguardian.org.Before installation check whether all the pre-requisites are met.

Pre-requisistes

    gcc
    rpm-build
    kernel-devel
    pcre – [perl compatible regular expressions] , pcre-devel

Steps:
Download the package from dansguardian.org
#wget http://dansguardian.org/downloads/2/Stable/dansguardian-2.10.0.3.tar.gz


Extract the package:
#tar zxvf dansguardian-2.10.0.3.tar.gz

Change Directory:
#cd dansguardian-2.10.0.3

Configure the software:
We are going to install the DG in the location /usr/local/dans
#./configure --prefix=/usr/local/dans

Compile:
#make
Install:
#make install

Installation is over. Now we can see four directories created under /usr/local/dans/
etc - configuration files
sbin - deamons
share - language and display settings
var - log files

Edit the configuation file:
#vi /usr/local/dans/etc/dansguardian/dansguardian.conf

filterip = 192.168.0.21
filterport = 9999
proxyip = 192.168.0.21
proxyport = 8080

Give the proper permissions and ownership:
#chown -R root:squid /usr/local/dans/var/log/dansguardian/
#chmod -R 777 /usr/local/dans/var/log/dansguardian/

Now start the server.
# /usr/local/dans/sbin/dansguardian
Change the proxy IP & Port in browser to filter IP and Port

Important files:
#cd /usr/local/dans/etc/dansguardian/lists/
All the files in this directory are the Access control files..

For example:
in bannedsitelist if you add youtube.com, and restart the DG as below
# /usr/local/dans/sbin/dansguardian -Q
You can no more access youtube.com

in bannedphraselist if you add and restart the DG
Then you can no more view any page which contains the word football.

Use and syntax is explained with each and every file in lists











 

No comments:

Post a Comment