INGEOTEC is a research group formed by researchers of two research centers: CentroGEO and INFOTEC; and Cátedras CONACYT.

INGEOTEC research interest is text categorization seen as a supervised learning problem, that is, as a classification task. In this problem, we have developed two text modeling techniques that represent the text in a vector space model and use a Support Vector Machine as a classifier. These techniques are B4MSA which is a sentiment analysis classifier and microTC a general text classifier. In addition this, we have been working on novel classifiers based on Genetic Programming EvoDAG.

In sentiment analysis, author profiling and text-image matching problem, we have participated in a number of competitions such as:

• RedICA Text-Image Matching (RICATIM) Challenge. I3GO+ obtained the 1st place in the development and final phase (see Results).
• TASS'17 (Spanish). INGEOTEC obtained the 1st place (11 teams) in Task 1 (General Corpus of TASS) (see Proceedings).
• PAN'17 (Arabic, English, Portuguese and Spanish). INGEOTEC (Tellez et al.) obtained the 3rd place (22 participants) in global ranking (see Results)
• SemEval'17 (English and Arabic). INGEOTEC obtained the 6th place (69 participants) in English (see Results) and 4th (18 participants) in Arabic (see Results).
• SENTIPOLC'16 (Italian). INGEOTEC obtained 5th place (15 participants) in subjective classification and 9th (15 participants) in polarity classification (see Proceeding).
• TASS'16 (Spanish). INGEOTEC obtained the 3rd place in 3 and 5 polarity levels (see Proceedings).
• TASS'15 (Spanish). This is our first competition where it was obtained 12th (17 participants) in 5 polarity levels and 10th (17 participants) in 3 polarity levels (see Proceedings)).

### PhD Students

• M.C. Abel Coronado Iruegas.
• M.C. Pablo López Ramírez.
• M.C. José Ortiz Bejar. Scholar Google
• M.C. Claudia Nallely Sánchez Gómez.
• M.C. Sergio Martín Nava Muñoz.
• M.C. José Manuel Aguilera López.

## Software

In order to facilitate and encourage the reproducibility of our research, we have decided to make the software available with an open source license. We have decided to implement our developments in Python following some continuous integration techniques (using travis-ci.org), unit testing (using nose), and coverage (using Coveralls).

### A Baseline for Multilingual Sentiment Analysis (B4MSA)

B4MSA is a Python Sentiment Analysis Classifier for Twitter-like short texts. It can be used to create a first approximation to a sentiment classifier on any given language. It is almost language-independent, but it can take advantage of the particularities of a language.

It is written in Python making use of NTLK, scikit-learn and gensim to create simple but effective sentiment classifiers.

### microTC

microTC follows a minimalistic approach to text classification. It is designed to tackle text-classification problems in an agnostic way, being both domain and language independent.
Currently, we only produce single-label classifiers; but support for multi-labeled problems is in the roadmap.

microTC is intentionally simple, so only a small number of features where implemented. However, it uses a some complex tools from gensimnumpy and scikit-learn.

### Evolving Directed Acyclic Graph (EvoDAG)

Evolving Directed Acyclic Graph (EvoDAG) is a steady-state Genetic Programming system with tournament selection. The main characteristic of EvoDAG is that the genetic operation is performed at the root. EvoDAG was inspired by the geometric semantic crossover proposed by Alberto Moraglio et al. and the implementation performed by Leonardo Vanneschi et al.

EvoDAG is described in the following conference paper EvoDAG: A semantic Genetic Programming Python library Mario Graff, Eric S. Tellez, Sabino Miranda-Jiménez, Hugo Jair Escalante. 2016 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC) pp 1-6. A pre-print version can be download from here.

## Products

Most of the work done at INGEOTEC is applied research, consequently, it is possible to create some prototypes that can provide some notion of application of our research. We have developed two demo, the first one is a web-service (SWAP) to perform polarity analysis, and the second one uses SWAP to produce a graph of Mexico positiveness; and one software development AGEI.

### Automata Inteligente en Internet (AGEI)

Knowledge Discovery on Big Data requires the development of Advanced Geographic Information Systems as tools for the acquisition, transmission, storage, analysis and visualization of large amounts of information. In recent years INGEOTEC has been working on an online platform for researchers and key decision makers that require real time information and analysis of public data transmitted over the internet (social media and other open sources). This project, called Autómata Geointeligente en Internet (AGEI), focuses on the automated extraction of knowledge on Big Data through advanced machine learning and statistical techniques (traditional and state of the art) for its application on novel researches and decision making processes, leading Knowledge Discovery on georeferenced data into Geointelligence.

### Servicio Web de Análisis de Polaridad (SWAP)

It was decided to make available our sentiment analyzer – built on B4MSA – using a REST architecture implemented on Falcon Web Framework. The service receives a json with the list of text to be analyzed and returns the polarity of the text. The service can be tested using curl as follows:

curl -d '{"language":"es_mx", "version": "1.2", "auth":"demo",  "data":[{"text": "buenos dias Aguascalientes "}]}' dev.ingeotec.mx/sentiment

### Mexico Positiveness

Since December 2015, we have been collecting tweets from the general Twitter stream and those with GEO tags. Mexico Positiveness is the result of the following exercise; in which all the GEO tweets collected were analyzed with SWAP. Then, tweets were gather by day and user, and only those with positive emotion were kept. For each user, it was compute its user positiveness which is the average of the decision function of all its positive tweets. For each day, it was calculated the average of all the user positiveness. Furthermore, to provide some insight of the positive and negative events of a particular day, it was produced a tweets cloud, using the positive and negative tweets. The figure can be seen at Mexico Positiveness and the tweets cloud can be seen by positioning the mouse over a particular day.

It is important to note that Mexico Positiveness provides a different view of the experimental statistic Estado de ánimo de los tuiteros en México done by INEGI using a previous version of SWAP to label the tweets.

### CentroGEO

117 Circuito Tecnopolo Norte Col. Tecnopolo Pocitos II, C.P. 20313, Aguascalientes, Ags, México.

Tel. +52 (449) 994 51 50 Ext. 5251 and 5230

### INFOTEC

112 Circuito Tecnopolo Norte Col. Tecnopolo Pocitos II, C.P. 20313, Aguascalientes, Ags, México.

Tel. +52 (555) 624 28 00 Ext. 6315, 6353, 6313 and 6384