Mostra menu

Dr. Roberta Bruzzone

“Spider” software as a support for child-pornography related investigations on Internet

Introduction: the new criminal scenario and Internet

The development of computer science and networks has pushed the renewal of several kind of criminality. We can consider, for example, international crimes such as prostitution, drugs, weapons, etc where there must be an international leader-group in a nation ad a set of cells distributed in a lot of other nations to ensure the illegal exchange of material, people, etc.. The fast and effective communication among different cells is a key to success and Internet seems to be a very good channel to perform this kind of coordination.
The organized crime understood that very well and uses Internet as an effective way for money laundering and for a lot of other illegal activities. In the same way we can find several religious and/or sub-cultural sects that adopted Internet as the best mean to contact people, adepts, etc.. From this point of view there are web-sites related to child-pornography, violence, suicide, racial hatred etc (Strano M., 2000, Rossi A., Innamorati M., 2004). New kind of terrorism threats the nations: the information war against the main telecommunication nodes that links all the parts of the world. We can speak about cyberterrorists. They use Internet to make publicity and to spread their ideas. Anyway the best use of Internet they can do is to coordinate real actions such as terrorist attacks (Strano A., Neigre B., Galdieri P, 2002).
In this environment the child-abuse and child-pornography found new dimensions. We can understand this looking at the increase of this phenomena in the last years (Strano 2000). Internet makes the paedophile able to communicate in a easy and safe way with a lot of other criminals in the world to share information, pictures, videos, etc.. Child-pornography material is of course the main target for paedophile on Internet but we can find also web-sites that tries to present paedophilia as a legitimate thing.

New Investigation techniques on Internet

To be effective against the mentioned new criminal “digital” strategies we have to implement a strong supervision of digital communications (mostly Chat, emails and web-sites). We worked on Spider software that is able to scan the Internet looking for crime-related areas to develop a kind of web-intelligence. The software looks for conversations, strings, etc implementing an high level textual analysis to localize relevant information among thousands of sites, newsgroups, etc on the web.

Web-inteligence on Internet using the software spider ICAA
The ICAA research team for tecno-intelligence developed the software to search for cultural paedophilia, racial hatred and pseudo-religions related web sites. This program is made of several parts:

Web crawler. It uses the semantic analysis of the natural language to summarize the content of web-sites. The iperlinks found on the web-sites are then stored on a database to be used in other analysis.

Method Database. The web-crawler is a kind of export system and this component is the knowledge base. It’s a set of rules (semantic/syntactic) and data (dictionary) that can be used by the crawler in its analysis process.

Link Database. This database includes the iperlinks the crawler found on the web-sites. The following fields are stored:

  • 1. Data creation/updating of the web-page;
  • 2. Type of iperlink: towards the same server or towards external servers;
  • 3. Programming language used (asp, php, jsp)
  • 4. Name, dimension and date of the linked file (if it is);
  • 5. If the iperlink is an email address the software submit it to google (research-engine) to localize it in other areas on Internet.

Enterprise Manager. The administration tool. It allows the operator to check the actions of the crawler and to change the content of the databases.

Front End d’Analisi. User interface that allows to access the software obtaining high level reports or plan specific analysis/researches.

Google Interface. Interface between the software and the search engine Google.

Image of Web-Intelligence ICCA Spider Software

We need to analyze a racial hatred related web-site. First of all we introduce the url of the web-site inside the Link Database then we will establish the rules in the Method Database to classify the pages such as:

  • Analyze only this web-site or all the linked ones;
  • Study the email addresses you find using Google;
  • Search for special words (in the dictionary);
  • Set the analysis depth ;
  • Create a new dictionary;
  • Frequency analysis of the terms;
  • Create the connection cluster ;
  • analyze the technical features of the web-site;
  • analyze the circular-links;

These pieces of data will be submited to the Enterprise Manager. The web Crawler (o programma spider) now can start the analysis process. After the elaboration the Front End will present:

  • 1. The clusters
  • 2. the most used terms;
  • 3. the used technology;
  • 4. Statistics about the resource involved;
  • 5. email address list;

As mentioned the ICAA W.I. system is able to analyze in depth web-areas looking for specific subjects (crime related) and localizing interconnections with other web-sites.
The use of this kind of approach reduces the time to search of the classic web-intelligence (human-based searching). The constant monitoring could be easily and automatically implementing based on this software. The system can be used to present periodical reports that a specialised human analyst could evacuate and submit to the police forces.

Riferimenti bibliografici

  • (Telematic Journal of Clinical Criminology)
  • Galdieri P., Giustozzi C., Strano M, Sicurezza e privacy in azienda, Apogeo editore, Milano, 2001.
  • Innamorati M., Rossi A., La Rete dell’Odio, E. Walter Casini, Roma, 2004.
  • Strano M., Computer crime, Ed. Apogeo, Milano, 2000
  • Strano M., Neigre B., Galdieri P., Cyberterrorismo, Jackson Libri, Milano, 2002

  • 1 International Crime Analysis Association
  • 2  The analyzed web-sites are represented by a graphical tree. The depth is the number of levels of the tree. Greater is the number of allowed levels to analyze greater is the elaboration effort.
  • 3 Set of sites that are linked one another;